Unit 2 | Compiler Design Notes | AKTU Notes


Unit 2 | Compiler Design Notes | AKTU Notes




Basic Parsing Techniques in Compiler Design

What is Parsing?

Parsing is the process of analyzing the structure of a program based on its grammar rules. It is done in the syntax analysis phase of a compiler. The component that performs parsing is called a parser.

A parser takes tokens (produced by the lexical analyzer) and checks if they follow the correct syntax of the programming language.

Types of Parsers

Parsers are mainly divided into two categories:

1. Top-Down Parsing – Starts from the highest-level rule and breaks it down into smaller components.


2. Bottom-Up Parsing – Starts with the input and gradually builds up to the highest-level rule.

Shift-Reduce Parsing (Bottom-Up Parsing Method)

- Shift: Reads input and pushes it onto a stack.

- Reduce: Matches a sequence of symbols in the stack with a grammar rule and replaces them with a non-terminal.

- This continues until the entire input is reduced to a single valid start symbol.

Example of Shift-Reduce Parsing

For the grammar:

E → E + E  
E → id

For input id + id:

1. Shift id onto stack

2. Reduce id to E (because E → id)

3. Shift + onto stack

4. Shift id onto stack

5. Reduce id to E

6. Reduce E + E to E

7. Accept (since the stack contains only E)

Operator Precedence Parsing

- Used for languages where operators have different precedence levels (e.g., +, *).

- It works by associating precedence rules with operators.

- Example: 3 + 4 * 5 should be evaluated as 3 + (4 * 5), since * has higher precedence than +.

Steps in Operator Precedence Parsing:

1. Identify operator precedence and associativity (left-to-right or right-to-left).

2. Create an operator precedence table.

3. Parse the input based on precedence rules.

Top-Down Parsing

- Parses from start symbol and expands using grammar rules.

- Example method: Recursive Descent Parsing (uses recursive functions for each grammar rule).

Predictive Parsing (LL(1) Parsing)

- A type of top-down parsing that does not need backtracking.

- Uses a predictive parsing table to decide which rule to apply.

- Works efficiently if the grammar is left-factored and non-ambiguous.

Example of Predictive Parsing

For the grammar:

E → TE'  
E' → +TE' | ε  
T → id

To parse id + id:

1. E → TE'

2. T → id

3. E' → +TE'

4. T → id

5. E' → ε

6. Accept (since input is fully matched)

Automatic Construction of Efficient Parsers

Creating a parser manually for large grammars is difficult. Instead, automatic parser generators (like YACC) can be used.

1. LR Parsers (Bottom-Up Parsers)

LR Parsers are powerful bottom-up parsers that handle most programming languages.

- L → Left to right scanning.

- R → Rightmost derivation in reverse.

- k → Number of lookahead tokens used (e.g., LR(1) means 1 lookahead token).

Types of LR Parsers:

1. SLR (Simple LR) – Basic and easiest to implement.

2. CLR (Canonical LR) – More powerful, uses a complete set of LR(1) items.

3. LALR (Lookahead LR) – A simplified version of CLR, commonly used in practice.

Canonical Collection of LR(0) Items

- LR(0) items are used to construct parsing tables.

- A canonical collection is a set of all possible LR(0) items for a given grammar.

- Helps in creating parsing tables efficiently.


Example of an LR(0) Item for S → A B:

S → . A B (Dot represents the parsing position)
S → A . B
S → A B .


Constructing SLR Parsing Tables

SLR (Simple LR) Parsing Table is created in two steps:

1. Construct LR(0) items and the state transition table.

2. Create Action and Goto tables based on shift and reduce rules.

SLR uses Follow() sets to resolve conflicts.

Constructing Canonical LR Parsing Tables

- Canonical LR is an improved version of SLR.

- Uses LR(1) items (i.e., includes a lookahead token).

- Helps resolve shift-reduce conflicts that SLR cannot handle.




Constructing LALR Parsing Tables

- LALR (Lookahead LR) is a combination of SLR and Canonical LR.

- It reduces the size of the parsing table without losing accuracy.

- Used in YACC (Yet Another Compiler Compiler) for real-world parsing.

Using Ambiguous Grammars in Parsing

- A grammar is ambiguous if a string has more than one parse tree.

- Ambiguous grammars can cause shift-reduce or reduce-reduce conflicts.

Example of Ambiguous Grammar

E → E + E | E * E | id

For input id + id * id:

1. id + (id * id)

2. (id + id) * id

Both are valid parse trees → Ambiguous grammar!
Solution: Use operator precedence and associativity rules.

Automatic Parser Generator

- Instead of manually creating parsing tables, tools like YACC generate them automatically.

- They take a BNF grammar as input and produce a parser as output.

Implementation of LR Parsing Tables

Steps in LR Parsing Table Implementation:

1. Build the state machine using LR(0) or LR(1) items.

2. Create the parsing table (Action and Goto tables).

3. Use a parsing algorithm (Shift-Reduce method).



- sX → Shift and go to state X.

- rX → Reduce using production X.

- accept → Parsing is successful.

Conclusion

Parsing techniques are crucial for checking syntax correctness in a compiler.

- Top-down parsers (like Predictive Parsing) work well for simple grammars.

- Bottom-up parsers (like LR Parsers) are more powerful and widely used in real-world compilers.

- Automatic tools (YACC, LEX) help in generating efficient parsers.


By understanding parsing tables, operator precedence, and LR methods, you can easily analyze and design efficient parsers.

No comments:

Post a Comment