UNIT IV: Grammars for Natural Language
4.1 Introduction
This unit goes deeper into specific grammatical phenomena in natural language that pose challenges for NLP systems — like auxiliary verbs, movement, questions, and uncertainty.
4.2 Auxiliary Verbs
Auxiliary verbs (also called helping verbs) are verbs that are used alongside a main verb to express tense, mood, voice, or aspect.
Common English Auxiliaries:
- Be: is, am, are, was, were, been, being
- Have: have, has, had
- Do: do, does, did
- Modal auxiliaries: can, could, will, would, shall, should, may, might, must
Examples:
- "She is running." (be + present participle = present continuous)
- "They have finished." (have + past participle = present perfect)
- "He can swim." (modal = ability)
- "You must leave." (modal = obligation)
Importance in NLP:
- Auxiliaries signal tense, mood, and voice.
- They are crucial for understanding the time and certainty of an action.
- Grammars must correctly handle sequences like: "She might have been sleeping." (modal + have + been + verb-ing)
Verb Phrases with Auxiliaries:
- VP → Aux VP
- VP → Modal VP
- VP → have + VP[past participle]
- VP → be + VP[present participle]
4.3 Verb Phrases
A Verb Phrase (VP) is the part of a sentence that contains the verb and everything that relates to it.
Structure of VP:
- VP → V (intransitive): "The dog runs."
- VP → V + NP (transitive): "She ate the apple."
- VP → V + PP: "He sat on the chair."
- VP → V + NP + PP: "She gave the book to him."
- VP → V + S (sentential complement): "He said that she left."
Verb Types:
- Intransitive: Does not take an object. "She sleeps."
- Transitive: Takes a direct object. "She reads a book."
- Ditransitive: Takes two objects. "He gave her a gift."
- Linking: Connects subject to description. "She is happy."
4.4 Movement Phenomena in Language
Movement refers to the phenomenon where elements of a sentence appear to have "moved" from their original position.
This is an important concept from transformational grammar (Chomsky).
Types of Movement:
1. Wh-Movement (Question Formation):
- In a statement: "She bought a book."
- In a question: "What did she buy __?"
- "What" has moved to the front, leaving a gap (__) in the original position.
2. Passive Movement:
- Active: "The dog bit the man."
- Passive: "The man was bitten by the dog."
- The object "the man" has moved to the subject position.
3. Topicalization:
- "This book, I really like." (object moved to front for emphasis)
Why it matters in NLP:
- To understand the meaning of a question or passive sentence, the parser must recognize movement and reconstruct the original relationship.
- Gap-filling is needed: understand that "What did she buy?" means she bought what → find the answer.
4.5 Handling Questions in Context-Free Grammars
Questions in English are of several types:
1. Yes/No Questions:
- Formed by inverting subject and auxiliary.
- "She is coming." → "Is she coming?"
- Grammar rule: Q → Aux NP VP
2. Wh-Questions:
- Use question words: what, where, when, who, why, how.
- "What did she buy?"
- Grammar rule: Q → Wh-word Aux NP VP
3. Tag Questions:
- "She is coming, isn't she?"
Challenges for CFG:
- Standard CFG cannot easily handle the "gap" left by moved elements in wh-questions.
- Solution: Use Gap Threading or Slash Categories in grammar.
Slash Category:
- A special notation: NP/NP means "an NP with an NP missing inside it."
- Allows tracking where the gap is.
- Example: "What did she buy __?" → Parse as: S/NP → Q/NP with "what" filling the NP slot.
4.6 Human Preferences in Parsing
When a sentence is ambiguous, humans tend to prefer certain interpretations over others. NLP systems should model these preferences.
Garden Path Sentences:
- Sentences that mislead the reader into one interpretation before forcing a correction.
- Example: "The horse raced past the barn fell."
- First reading: "The horse raced past the barn" (seems complete).
- Correct reading: "The horse [that was] raced past the barn fell."
Human Parsing Preferences:
1. Minimal Attachment:
- Humans prefer the parse with fewer nodes in the parse tree (simpler structure).
- Example: "I saw the man with a telescope."
- Preferred: I saw [the man] [with a telescope] (PP attaches to VP — simpler)
- Alternative: I saw [the man [with a telescope]] (PP attaches to NP)
2. Late Closure (Right Association):
- Humans prefer to attach new material to the most recent phrase.
- Example: "She said that he left yesterday."
- Preferred: "yesterday" modifies "left" (recent VP), not "said."
Why it matters:
- NLP parsers use these preferences to choose among multiple parse trees.
- Probabilistic parsers assign probabilities based on these preferences.
4.7 Encoding Uncertainty
Language is full of uncertainty. A sentence can have multiple valid interpretations. NLP systems need to handle this uncertainty.
Sources of Uncertainty:
- Lexical ambiguity: "I went to the bank." (river bank or financial bank?)
- Structural ambiguity: "I saw the man with binoculars." (who has the binoculars?)
- Referential ambiguity: "John told Peter that he was late." (who is "he"?)
Ways to Encode Uncertainty:
1. Multiple Parse Trees:
- Store all possible parse trees and let later processing choose.
- Problem: Too many trees for complex sentences.
2. Probabilistic Methods:
- Assign a probability to each possible interpretation.
- Choose the highest probability interpretation.
3. Packed Representations:
- Efficiently store multiple interpretations in a compact structure.
- Example: Shared forests where common subtrees are not repeated.
4.8 Deterministic Parsing
A Deterministic Parser makes only one decision at each step — it never backtracks.
Key Idea:
- Instead of trying all possibilities and backtracking, the parser uses clever strategies to always make the right choice.
Marcus Parser (1980):
- A well-known deterministic parser for English.
- Uses a small buffer (window of words) to look ahead and make correct decisions.
- Based on the observation that most English sentences can be parsed deterministically with limited look-ahead.
Advantages:
- Very fast — no backtracking.
- Psychologically plausible — humans also seem to parse deterministically most of the time.
Disadvantages:
- Cannot handle all ambiguous sentences.
- Breaks down on garden path sentences.

No comments:
Post a Comment