CMSC 129

Principles of Compiler Design

Created by TJ Monserrat

Brief introduction of Designing Compilers

Structure of Compiler

Compilers

For the whole semester: We will learn how to design and implement compilers.

What are Compilers?

Programs that can turn everyday human language to things that can be executed by computers

Type of Compilers

Type of Compilers

Type of Compilers

What will we be doing for the project?

Parts of a Compiler

Parts of a Compiler

Parts of a Compiler

Parts of a Compiler

Parts of a Compiler

Parts of a Compiler

Parts of a Compiler

Parts of a Compiler

Parts of a Compiler

Parts of a Compiler

Parts of a Compiler

Parts of a Compiler

Parts of a Compiler

Parts of a Compiler

Parts of a Compiler

What will we be doing for the project?

What will we be doing for the project?

What will we be doing for the project?

Parts of Compiler: Lexical Analysis

Parts of Compiler: Lexical Analysis

Takes in Source Code or Character Stream
and turns it into an array of tokens or Lexemes

Parts of Compiler: Lexical Analysis

Each token is in the form of:

  • token-name - abstract symbol or key identifier that is used during syntax analysis
  • attribute-value - Points to an entry in the symbol table for this token.
    • That's why we need Symbol Tables...
    • Information from the symbol-table entry is needed for semantic analysis and code generation.

These tokens will be passed to the Syntax Analyzer

Parts of Compiler: Lexical Analysis

For Example

  • position is a lexeme that can mapped to a token of [id, 0].
    • id is an abstract symbol for it being an identifier or a variable
    • 0 points to the index of the symbol-table entry for the word position

Parts of Compiler: Lexical Analysis

For Example

  • = is a lexeme that can mapped to a token of [eq].
    • Since this token doesn't need an attribute, we can omit the second component
    • We could use any other token-name like '=' or 'assign'... as long as it serves the purpose: to identify that this is an assign statement.

Parts of Compiler: Lexical Analysis

For Example

  • initial is a lexeme that can mapped to a token of [id, 1].
    • 1 points to the index of the symbol-table entry for the word initial
  • + is a lexeme that can mapped to a token of [add].
  • rate is a lexeme that can mapped to a token of [id, 2].
    • 2 points to the index of the symbol-table entry for the word rate

Parts of Compiler: Lexical Analysis

For Example

  • * is a lexeme that can mapped to a token of [mul].
  • 60 is a lexeme that can mapped to a token of [number, 3].
    • 3 points to the index of the symbol-table entry for the value 60

Parts of Compiler: Lexical Analysis

This makes the example:

into...

Parts of Compiler: Syntax Analysis

Parts of Compiler: Syntax Analysis

The Lexemes...

...will need to be fed to the syntax analyzer to produce a syntax tree like this...

  [eq]
    |--------
    |       |
  [id, 0] [add]
            |--------
            |       |
         [id, 2]  [mul]
                    |---------
                    |        |
                 [id, 3]  [number, 60]
            

Parts of Compiler: Syntax Analysis

To be able to do this...

  [eq]
    |--------
    |       |
  [id, 0] [add]
            |--------
            |       |
         [id, 2]  [mul]
                    |---------
                    |        |
                 [id, 3]  [number, 60]
            

We need to know how to parse them using Context-free grammars.

Parts of Compiler: Intermediate Code Generation

Parts of Compiler: Intermediate Code Generation

  [eq]
    |--------
    |       |
  [id, 0] [add]
            |--------
            |       |
         [id, 2]  [mul]
                    |---------
                    |        |
                 [id, 3]  [number, 60]
            

The system can create an intermediate code before translating it into the target program code.

Parts of Compiler: Intermediate Code Generation

We need to transform parse trees into a three-address code sequence. Note that three-address code sequences will be of the following...

  • It has AT MOST ONE OPERATOR on the right side
  • The compiler must generate a temporary name to hold the value computed by a three-address instruction.
  • Some "three-address instructions" have fewer than three operands.

Parts of Compiler: Code Generation

Parts of Compiler: Code Generation

We can then transform the intermediate code generation to the target code that can be understood by the system

Parts of Compiler: Code Generation

But for our class, we need to transform the intermediate code above to our pseudo-ASM that we will discuss next time.

Parts of Compiler: Symbol Table Management

Parts of Compiler: Symbol Table Management

An essential function of a compiler is to record the variable names used in the source program and collect information about various attributes of each name.

Parts of Compiler: Symbol Table Management

These attributes may provide information about the storage allocated for:

  • name
  • type
  • scope (where in the program its value may be used)
  • in the case of procedure names:
    • the number of arguments
    • types of each of the arguments
    • the method of passing each argument
    • the type returned

Programming Language Basics

Things you need to ponder when designing a language:

Programming Language Basics - Static/Dynamic Distinction

  • Static - a language uses a policy that allows the compiler to decide an issue
  • Dynamic - a policy that only allows a decision to be made when we execute the program

Programming Language Basics - Environments and States

  • Environment - mapping from names to locations in the store.
  • State - mapping from locations in store to their values.
    the state maps left-values to their corresponding right-values (x = 3)

Programming Language Basics - Static Scope and Block Structure

We should consider static-scope rules for a language with blocks, where a block is a grouping of declarations and statements.

Programming Language Basics - Static Scope and Block Structure

Programming Language Basics - Static Scope and Block Structure

Programming Language Basics - Static Scope and Block Structure

Programming Language Basics - Static Scope and Block Structure

Programming Language Basics - Static Scope and Block Structure

Programming Language Basics - Static Scope and Block Structure

Programming Language Basics - Static Scope and Block Structure

Programming Language Basics - Static Scope and Block Structure

Programming Language Basics - Static Scope and Block Structure

Note that Scoping affects the Symbol Table structure.

Programming Language Basics - Parameter Passing

  • Pass-by-value - the actual parameter is evaluated (if it is an expression) or copied (if it is a variable)
  • Pass-by-reference - the address of the actual parameter is passed to the callee as the value of the corresponding formal parameter.

Exercises

Every exercise will have an effect on your project.

Exercises

  • Grammar Creation
    • 1. Create the Grammar of the Intermediate Code
    • 2. Create your own Source Code Grammar
  • Lexical Analysis
    • 3. DFA of the Intermediate Code
    • 4. DFA of your own Source Code
    • 5. Token list creation of the Intermediate Code
    • 6. Token list creation of your own Source Code

Exercises

  • Syntax Analysis
    • 7. Error Reporting of the Intermediate Code
    • 8. Error Reporting of your own Source Code
    • 9. Syntax Tree creation of the Intermediate Code
    • 10. Syntax Tree creation of your own Source Code

Exercises

  • Intermediate Code Generation
    • 11. Three-address Code generation from the Syntax Tree of Intermediate Code
    • 12. Three-address Code generation from the Syntax Tree of your own Source Code

Exercises

All exercises will building towards the project, Except the Target-Code Generation, because...

Project

Translate own Source code to Target-code generation, passing through all of the phases:

  • Lexical Analysis: Given Source Code, Show Lexemes
  • Syntax Analysis: Given Lexemes, Show Parse Tree
  • Intermediate Code Generation: Given Parse Tree, Show Three-Address Code
  • Target Code Generation: Given Three-Address Code, Show Target-Code generation
  • Run Target-Code Generation: Given input, show output

New Grading System

  • 1.0 - All 14 exercises done and working, Project working with documentation

New Grading System - Subtractions

  • Did 8-11 projects, all working: -0.25
  • Did 5-7 projects, all working: -0.5
  • Did 1-4 projects, all working: -0.75
  • Did 0 projects, all working: -1.0

New Grading System - Subtractions

  • Did not do final project: -1.0
  • Did final project but has major bugs: -0.5
  • Did not do documentation: -0.5
  • Documentation lacking in substance: -0.25

New Grading System - Notes to live by...

  • If after the grade subtraction, grade is < 3.0 then grade is INC

Target-Code: Pseudo-ASM Language

Target-Code: Pseudo-ASM Language

Target-Code: Pseudo-ASM Language

Target-Code: Pseudo-ASM Language

Target-Code: Pseudo-ASM Language

Target-Code: Pseudo-ASM Language

Target-Code: Pseudo-ASM Language

Target-Code: Pseudo-ASM Language

Target-Code: Pseudo-ASM Language

How to get the slides or Class Specs

Google Docs: https://goo.gl/SLMzVB

Slides: https://github.com/tjmonsi/cmsc129-slide2