NCTU CS: Compilers (FALL 2011)

Course Information

Instructor

Wuu Yang, <wuuyang@cis.nctu.edu.tw>, EC332B, 03-5131247, 03-5712121 ext. 56614.

Teaching Assistants

李原嘉 email: sautosun@gmail.com and 郭政錡 e-mail: kenkillerkuo@gmail.com; Phone: 56649.

Lectures

3CD5G, ED202.

Office Hours

2CD, EC332B.

Course URL

http://www.cis.nctu.edu.tw/~wuuyang/Lecture/lecture.compiler.html

Course Discussion Board

http://plaslab24.cis.nctu.edu.tw/phpBB2/viewforum.php?f=3

Prerequisites

programming in a conventional language (such as C, C++, or Pascal) and data structure

Course Description

This course is an introduction to development tools for embedded systems.  Our focus will be on the compiler construction, from the most basic to some introduction to optimization, object code format, and some topics related to embedded systems.

In the introduction to the design and implementation of compilers for an imperative language, we will study various components in a compiler, including scanners, parsers, symbol tables, semantic analyzers, and code generators, and the overall organization of a compiler. In the scanners, we will study regular expressions and the transformation of a regular expression into a deterministic finite state machine. In the parsers, we will study LL and LR parsing methods and the construction of various parsing tables. There are many tools that can generate scanners and parsers from formal specifications. We will also study these tools (lex and yacc in particular) and use them to generate the scanner and parser for our project.

We will study and implement the code generation components of a compiler. The symbol tables are a data structure shared by most components in the compiler. We will study how to process variables, types, expressions, records, arrays, pointers, and simple control structures. Procedures and packages will also be studied. Finally, we will study techniques for code generation and local optimization. If time permits, we will study global optimization techniques in compilers.

This course will include an integrated programming project. We will give you a detailed specification of a simplified Pascal language. It will be our target language. We will also supply the documents of a scanner generator (lex) and of a parser generator (yacc). You will implement a scanner, a parser, and other components used to generate code for a machine of your choice.  You can use X86 assembly, MIPS assembly, Java bytecode, ARM code, etc. To help you make progress, we will set three check points during the semester and a final demonstration at the end of the semester. You will need to finish a scanner, a parser, and a primitive code generator at the checkpoints, respectively. At the end of the semester, you will need to show (off) a complete compiler that generate code. In the project, you will need to implement a minimum set of features, which will be announced later in the class. If you implement extra features, you will earn extra points.

Textbook

C.N. Fischer, R.K. Cytron, and R.J. LeBlanc, Jr., Crafting a Compiler, Addison-Wesley, 2010. (table of contents)

(reference only) A.V. Aho, M.S. Lam, R. Sethi, and J.D. Ullman, Compilers, Principles, Techniques, and Tools, 2nd Ed., Addison-Wesley, 2007. Chapters 9-10.(table of contents)

(reference only) C.N. Fischer and R.L. LeBlanc, Jr., Crafting a Compiler with C, Benjamin/Cummings, 1991. (table of contents)

(reference only) A.V. Aho, R. Sethi, and J.D. Ullman, Compilers, Principles, Techniques, and Tools, Addison-Wesley, 1985. (table of contents)

(reference only) Andrew W. Appel, Modern Compiler Implementation In Java, Cambridge Univ. Press, 1997.

(reference only) K. Dowd, High-performance Computing (chapters 2 and 4), O'Relly, 1992 ???. RISC vs CISC, what an optimizing compiler does

(reference only) David H. D. Warren, Logic Programming and Compiler Writing, Software Practice and Experience, Vol. 10, pp. 97-125, 1980.

(reference only) J. R. Levine, Linkers and Loaders, Morgan Kaufmann Publishers, San Francisco, 2000.

 


Grading

There will be two exams (35% each) and a course project (30%).

Semester Project

The compiler design course is closely associated with a project. Three checkpoints are set up to help you make progress. At the final demonstration, you are required to show off a complete compiler. We will provide you with a grading sheet that details the grading policy. We urge to you to make continual progress in your project since this is one of the best ways to learn the course thoroughly.

Class material is available on www. See below. There is a README file explaining how to use the class material.

First Checkpoint: The scanner. Due date: October 5, 2011, Wednesday, beginning of class (4th week)

At the first checkpoint, you will need to finish the scanner, using the lex tool. You need to demonstrate your scanner with all the reserved words and symbols plus 10 correct names and comments. In addition, you need to use 10 incorrect symbols to demonstrate that your scanner can handle the incorrect names properly (e.g., issue some error messages). The output of your scanner should list each token in the input, together with the line number and the position of the first character of the token. In case the token is an identifier or a number, the string representing the token must also be printed. You do NOT need to print a copy of the output nor the program in order to save some paper. We will use on-line demonstration in the first checkpoint.

If the input is "354+126*(920-27/190)", the output from the scanner should be the sequence of tokens: "34", "+", "126", "*", "(", "920", "-", "27", "/", "190" and ")".

There are several test cases with which you may try your scanner. You may also implement a pragma (pragma ListOn and pragma ListOff) to turn on or off the listing of the source program.

You may use lex (or flex) to implement the scanner. There is a file for token definitions. You may modify the token definitions for your project. A flex manual is also included in this web page.

The following issues should be addressed:

  1. How does the scanner handle the string 10..20 ?
  2. How does the scanner handle the string "abc""def" ?
  3. How does the scanner handle run-away strings (or cross-line strings) and run-away comments? Note that in Ada/CS, there is no run-away comments since every comment is terminated by end-of-line.
  4. How does the scanner handle the string 10.3E+5, 10.3E+BC, 10.3E+2.34, and 10.3EBC ? An acceptable solution to 10.3EBC could be two tokens: a real number 10.3 and an identifier EBC. But you may make your own decision.
  5. How does the scanner handle a very long identifier? How long can an identifier be? What if the maximum is exceeded?
  6. How does the scanner handle a very long string constant? How long can a string be? What if the maximum is exceeded?
  7. How does the scanner handle reserved words? How do you use a table for this purpose?
  8. How does the scanner handle illegal characters, such as !, [ and ]?
  9. How does the scanner handle the string '''' ?
  10. How does the scanner handle the ' sign as in a character constant 'a' and in an attribute such as d'First?
  11. How does the scanner handle end-of-file?
  12. How does the scanner recover from errors? What characters are deleted when errors occur?
  13. Can the scanner handle nested comments? T
Second Checkpoint: The parser. Due date:  November 2, 2011, Wednesday, beginning of class (8th week)

NOTE: It seems inhumane to ask the students to build parse trees for such a large language as Ada/CS. Therefore, no parse trees need be built in this checkpoint.

At the second checkpoint, you will need to finish the parser. You need to demonstrate your scanner with the test programs we will provide. As in the first checkpoint, you do NOT need to print a copy of the output nor the program. We will use on-line demonstration.

You may use yacc or bison to implement the parser. There is a file for syntax definitions. You may modify the syntax definitions. A bison manual is also included in this web page.

For this checkpoint, your parser should answer accept for correct test file and reject for incorrect test files. For incorrect input, the positions of the errors should be indicated. The parser should try to parse to the end of the input and catch as many errors as possible. Your parser should generate a parse tree or abstract syntax tree for correct input.

You need to address the following issues:

  1. Does the parser issue good error messages, such as line numbers and character positions of errors and explanations of errors?
  2. How does the parser recover from errors?
  3. Are there any optimizations to the parse table, size reduction or speed improvement?
Third Checkpoint: The symbol table. Due date: November 30, 2011, Wednesday, beginning of class (12th week)

At the third checkpoint, you will need to finish the symbol table and related routines. You need to show that your symbol table can correctly handle block entry and exit, procedure names, and parameters. As usual, do not print a copy of the program or the output. On-line demonstration is enough.

You will need about 5 days to implement a primitive working symbol table.

You may use C++ STL 'map' .  The class is essentially a red-black tree.

You need to address the following issues:

  1. Explain how the symbol table is organized, a hashed table, binary search list, or an unordered list, etc. Do you use one big table or one small table per scope?
  2. What does the compiler do when it encounters an undeclared variable, an undeclared type name, an undeclared function or procedure, an undeclared enumeration literal, an undeclared constant, etc.? The compiler should try to avoid spurious error messages.
  3. Do you use action-controlled semantic stack or parser-controlled semantic stack?
  4. You need to add action routines to print a message whenever a new scope is created or closed.
  5. You need to add action routines to print a message whenever a global simple variable (that is, variables of type integer or real) is declared.
  6. You need to add action routines to print a message whenever a local simple variable (that is, variables of type integer or real) is declared in a function.
  7. You need to add action routines to print a message whenever a loop index variable is implicitly declared.
  8.  Do you check duplicate declaration of the same name in the same scope?
  9. Do you check type compatibility?
  10. Do you use overload resolution during code generation?
  11. Do you build type descriptors for records and arrays?
Final Checkpoint: A complete compiler. Due date: January 11, 2012, Wednesday, beginning of class (18th week)

At the last checkpoint, you will have the chance to show off your complete compiler. We will schedule an interview with each students and test the features that you have implemented. You need to bring your grade sheet to the interview. Please do not print your program or output. We will give you a grade for the project during the interview. In the project, you will need to implement a minimum set of features, which will be announced later in the class. If you implement extra features, you will earn extra points.

Here is a list of semantic checks:

  1. All identifiers are declared.
  2. No identifiers are multiply-defined.
  3. Types in expressions are correct with respect to the type rules.
  4. Imported names are handled properly. Name conflicts are resolved correctly.
  5. Inheritance relation among classes are properly formed in OO languages.
  6. Methods in a class are not multiply defined.
  7. All declared methods are implemented.
  8. Abstract methods are not implemented in abstract classes.
  9. Public/protected/private rules are observed.

The target machine may be X86, MIPS, ARM, Java Virtual Machine, etc.  A manual of Java virtual machine is included in the next section, and use any web browser to run it.


Exams

IT IS RECOMMENDED THAT YOU USE CHINESE AND ENGLISH TO ANSWER QUESTIONS IN THE EXAMS.  DO NOT USE ANY OTHER LANGUAGES.

Midterm: November 9, 2011, Wednesday.

Final: November 11, 2012, Wednesday.

The grading standard for the compiler project:

See the project material.

Class material


Local On-Line Information

Information on the Net


Wuu Yang, <wuuyang@cis.nctu.edu.tw>, EC332B, 03-5712121 ext. 56614.