UNIT III: Grammars and Parsing
3.1 What is a Grammar?
A grammar is a set of rules that defines which sentences are valid (correct) in a language.
In NLP, grammars help computers understand the structure of sentences.
3.2 Grammars and Sentence Structure
The most commonly used grammar in NLP is Context-Free Grammar (CFG).
CFG consists of:
- Terminal symbols: Actual words — "dog", "runs", "the"
- Non-terminal symbols: Grammatical categories — S, NP, VP, N, V, Det
- Production rules: Rules that expand non-terminals
- Start symbol: Usually S (Sentence)
Example CFG:
- S → NP VP
- NP → Det N
- VP → V NP
- Det → "the" | "a"
- N → "dog" | "cat" | "bone"
- V → "chased" | "ate"
Parsing Example: "The dog chased a cat"
- S
- NP: [Det: the] [N: dog]
- VP: [V: chased] [NP: [Det: a] [N: cat]]
This tree is called a Parse Tree.
3.3 Top-Down Parsing
In Top-Down Parsing, we start from the Start Symbol (S) and try to expand rules until we match the input sentence.
Process:
- Start with S.
- Apply grammar rules to expand S.
- Keep expanding until you reach terminal words.
- Check if they match the input.
Example: Input: "The dog runs"
- Start: S
- Expand: S → NP VP
- Expand: NP → Det N → "The" "dog"
- Expand: VP → V → "runs"
- Match! ✓
Advantage: Simple to understand and implement.
Disadvantage:
- Can get into infinite loops if grammar has left-recursive rules.
- May try many wrong paths before finding the right one (backtracking).
3.4 Bottom-Up Parsing
In Bottom-Up Parsing, we start from the input words and try to combine them using grammar rules until we reach S.
Process:
- Start with input words (terminals).
- Apply grammar rules in reverse to combine them into larger structures.
- Continue until you get S.
Example: Input: "The dog runs"
- "The" → Det
- "dog" → N
- Det + N → NP
- "runs" → V → VP
- NP + VP → S ✓
Advantage:
- Works well with ambiguous grammars.
- More efficient in many cases.
Disadvantage:
- Can generate many irrelevant structures before finding the correct one.
3.5 Transition Network Grammars (TNG)
A Transition Network is a type of finite state machine (like a flowchart) used to represent grammar rules.
How it works:
- The network has states (nodes) and transitions (arcs).
- Each arc is labeled with a word category (N, V, Det, etc.).
- To parse a sentence, you follow the arcs matching the input words.
- If you reach a final state, the sentence is valid.
Example (simple sentence):
- START → [Det] → [N] → [V] → END
- For "The dog runs": Start → "The" matches Det → "dog" matches N → "runs" matches V → END ✓
Limitation: Simple transition networks cannot handle complex sentences with embedded clauses.
3.6 Recursive Transition Networks (RTN)
RTN extends TNG by allowing recursive calls — one network can call another network.
This allows handling of complex, nested structures like:
- "The dog that chased the cat ran away."
Here, "that chased the cat" is a nested clause. RTN can handle this by recursively calling the sentence network inside the noun phrase network.
3.7 Top-Down Chart Parsing
Chart Parsing is a more efficient parsing technique that avoids redoing the same work by storing intermediate results.
Key Concept — The Chart:
- A chart is a data structure that records all partial parse results.
- Each entry (called an edge) records: which rule is being applied, how much has been matched, and the position in the sentence.
Two types of edges:
- Active Edge: A rule that has been partially matched (waiting for more).
- Inactive Edge: A rule that has been fully matched.
Algorithm:
- Initialize chart with input words.
- Apply grammar rules.
- When a rule is complete, add it as an inactive edge.
- Use inactive edges to complete active edges.
- Continue until S is found spanning the whole sentence.
Advantage: Very efficient — never repeats the same computation. This is because of dynamic programming (memoization).
3.8 Feature Systems and Augmented Grammars
Simple CFGs cannot handle all aspects of language, like agreement between subject and verb.
Problem: CFG cannot easily prevent "The dog run" (should be "runs").
Solution: Use Feature Systems — attach features (like number, person, tense) to grammar symbols.
Example Features:
- Number: singular / plural
- Person: 1st / 2nd / 3rd
- Tense: past / present / future
- Gender: masculine / feminine
Feature Agreement:
- Subject and verb must agree in number and person.
- "The dog runs" (singular) ✓
- "The dogs run" (plural) ✓
- "The dog run" ✗ (mismatch — caught by feature checking)
Augmented Grammar:
- A grammar where each rule has additional feature constraints.
- Only rules where features match are applied.
3.9 Basic Feature System for English
For English, the basic features needed are:
| Feature | Values |
|---|---|
| Number (NUM) | singular (sg), plural (pl) |
| Person (PER) | 1st, 2nd, 3rd |
| Tense (TNS) | past, present, future |
| Case | nominative, accusative |
Example rule with features:
- S → NP[NUM=?n] VP[NUM=?n]
- This means: NP and VP must have the same number (?n is a variable).
3.10 Morphological Analysis and the Lexicon
Morphology is the study of word structure — how words are formed from smaller units called morphemes.
Types of Morphemes:
- Root/Stem: The base meaning — "play", "run", "happy"
- Prefix: Added before root — "un" in "unhappy"
- Suffix: Added after root — "ness" in "happiness", "ed" in "played"
- Infix: Added inside a word (rare in English)
Morphological Analysis in NLP involves:
- Breaking a word into its morphemes.
- Identifying root form (called lemmatization).
- Identifying grammatical features.
Examples:
- "running" → run + ing (verb, present participle)
- "dogs" → dog + s (noun, plural)
- "unhappiness" → un + happy + ness (adjective with prefix and suffix)
The Lexicon:
- A lexicon is essentially a dictionary used by an NLP system.
- It stores: word forms, root form, part of speech, features, meaning.
3.11 Parsing with Features
When parsing with features, the parser:
- Looks up each word in the lexicon to get its features.
- Applies grammar rules only when features match.
- Builds a parse tree that respects both structure and feature constraints.
Example: Input: "She runs fast."
- "She" → Pronoun [NUM=sg, PER=3rd, CASE=nominative]
- "runs" → Verb [NUM=sg, PER=3rd, TNS=present]
- NUM and PER match → Valid! ✓
3.12 Augmented Transition Networks (ATN)
An ATN is an advanced version of RTN that includes feature checks and actions on each arc.
Features of ATN:
- Each arc can test conditions (features) before being followed.
- Arcs can have actions — storing information in registers (like subject, object, verb).
- This allows complex grammatical structures to be parsed.
Example registers: SUBJECT, VERB, OBJECT, MODIFIER
When parsing "The cat chased the dog":
- SUBJECT = "The cat"
- VERB = "chased"
- OBJECT = "the dog"
ATNs are powerful enough to handle most English sentence patterns.

No comments:
Post a Comment