¡@
This course is an introduction to development tools for embedded systems. Our focus will be on the compiler construction, from the most basic to some introduction to optimization, object code format, and some topics related to embedded systems.
In the introduction to the design and implementation of compilers for an imperative language, we will study various components in a compiler, including scanners, parsers, symbol tables, semantic analyzers, and code generators, and the overall organization of a compiler. In the scanners, we will study regular expressions and the transformation of a regular expression into a deterministic finite state machine. In the parsers, we will study LL and LR parsing methods and the construction of various parsing tables. There are many tools that can generate scanners and parsers from formal specifications. We will also study these tools and use them to generate the scanner and parser for our project.
We will study and implement the code generation components of a compiler. The symbol tables are a data structure shared by most components in the compiler. We will study how to process variables, types, expressions, records, arrays, pointers, and simple control structures. Procedures and packages will also be studied. Finally, we will study techniques for code generation and local optimization. If time permits, we will study global optimization techniques in compilers.
This course will include an integrated programming project. We will give you a detailed specification of a simplified language that is very similar to Ada. It will be our target language. We will also supply the documents of a scanner generator and of a parser generator. A target language and its simulator will also be provided. You will implement a scanner, a parser, and other components used to generate code for a hypothetical machine. To help you make progress, we will set three check points during the semester and a final demonstration at the end of the semester. You will need to finish a scanner, a parser, and a primitive code generator at the checkpoints, respectively. At the end of the semester, you will need to show (off) a complete compiler that generate code for the hypothetical machine. We will provide a simulator for the machine to run your object code. In the project, you will need to implement a minimum set of features, which will be announced later in the class. If you implement extra features, you will earn extra points.
C.N. Fischer and R.L. LeBlanc, Jr., Crafting a Compiler with C, Benjamin/Cummings, 1991. (table of contents)
(reference only) A.V. Aho, R. Sethi, and J.D. Ullman, Compilers, Principles, Techniques, and Tools, Addison-Wesley, 1985. (table of contents)
(reference only) Andrew W. Appel, Modern Compiler Implementation In Java, Cambridge Univ. Press, 1997.
(reference only) K. Dowd, High-performance Computing (chapters 2 and 4), O'Relly, 1992 ???. RISC vs CISC, what an optimizing compiler does
(reference only) David H. D. Warren, Logic Programming and Compiler Writing, Software Practice and Experience, Vol. 10, pp. 97-125, 1980.
(reference only) J. R. Levine, Linkers and Loaders, Morgan Kaufmann Publishers, San Francisco, 2000.
¡@
The compiler design course is closely associated with a project. Three checkpoints are set up to help you make progress. At the final demonstration, you are required to show off a complete compiler. We will provide you with a grading sheet that details the grading policy. We urge to you to make continual progress in your project since this is one of the best ways to learn the course thoroughly.
Class material is available on www. See below. There is a README file explaining how to use the class material.
First Checkpoint: The scanner. Due date: October 9, 2008, Thursday, beginning of class (4th week)At the first checkpoint, you will need to finish the scanner, using the scangen tool. You need to demonstrate your scanner with all the reserved words and symbols plus 10 correct names and comments. In addition, you need to use 10 incorrect symbols to demonstrate that your scanner can handle the incorrect names properly (e.g., issue some error messages). The output of your scanner should list each token in the input, together with the line number and the position of the first character of the token. In case the token is an identifier or a number, the string representing the token must also be printed. You do NOT need to print a copy of the output nor the program in order to save some paper. We will use on-line demonstration in the first checkpoint.
If the input is "354+126*(920-27/190)", the output from the scanner should be the sequence of tokens: "34", "+", "126", "*", "(", "920", "-", "27", "/", "190" and ")".
There are several test cases with which you may try your scanner. These are the files .../tests/stests[1-7].Adacs and .../scanner.test.adacs. You may also implement a pragma (pragma ListOn and pragma ListOff) to turn on or off the listing of the source program.
You may use scangen to implement the scanner. There is a file for token definitions. If you choose to use lex or flex, you will have to modify the token definitions. A flex manual is also included in this web page.
The following issues should be addressed:
NOTE: It seems inhumane to ask the students to build parse trees for such a large language as Ada/CS. Therefore, no parse trees need be built in this checkpoint.
At the second checkpoint, you will need to finish the parser. You need to demonstrate your scanner with the test programs we will provide. As in the first checkpoint, you do NOT need to print a copy of the output nor the program. We will use on-line demonstration.
You may use llgen to implement the parser. There is a file for syntax definitions. If you choose to use yacc or bison, you will have to modify the syntax definitions. A bison manual is also included in this web page.
Note in the input syntax specification (i.e. the lladacs.bnf file), every
terminal specification is followed by two numbers, such as
* 2
4
Here * is a token and 2 and 4 are used for
error correction. The explanation is in the fmq manual (i.e. the fmq.doc.Z
file). Read it if you are interested. Otherwise, simply ignore the numbers.
Note that the tables for the scanner and the parser are generated by independent tools (scangen and llgen, respectively). The numbers for the tokens must be consistent. In the scanner, token numbers are the major numbers in the adacs.scan file. In the parser, token numbers are assigned by llgen based on the order tokens appear in the *terminals section in the lladacs.bnf file. You have to make sure that tokens are represented by the same numbers in the tables generated by the two tools.
For this checkpoint, your parser should answer accept for correct test file and reject for incorrect test files. For incorrect input, the positions of the errors should be indicated. The parser should try to parse to the end of the input and catch as many errors as possible.
Ignore this paragraph The product of the parser is the parse tree of the input program. The parse tree is constructed with pointers and dynamically allocated nodes. The parse tree should be dumped to the output file in an understandable format. Finally, you need to add a compiler directive "pragma BuildParseTree", which instructs the compiler to build the parse tree, and another directive "pragma DumpParseTree", which instructs the compiler to dump the parse tree built so far to a designated output file. In the absence of the pragmas, the compiler should not build or dump the parse tree.
You need to address the following issues:
At the third checkpoint, you will need to finish the symbol table and related routines. You need to show that your symbol table can correctly handle block entry and exit, procedure names, and parameters. As usual, do not print a copy of the program or the output. On-line demonstration is enough.
You will need about 5 days to implement a primitive working symbol table.
You may use C++ STL 'map' . The class is essentially a red-black tree.
You need to address the following issues:
At the last checkpoint, you will have the chance to show off your complete compiler. We will schedule an interview with each students and test the features that you have implemented. You need to bring your grade sheet to the interview. Please do not print your program or output. We will give you a grade for the project during the interview. In the project, you will need to implement a minimum set of features, which will be announced later in the class. If you implement extra features, you will earn extra points.
Here is a list of semantic checks:
The target machine may be the pseudo machine defined in the appendix of the textbook. There is a simulator called testit in the distribution directory. You may also generate code for the Java virtual machine, a manual of which is included in the next section, and use any web browser to run it.