Unit 4 | Natural Language Processing Notes

UNIT IV: Grammars for Natural Language

4.1 Introduction

This unit goes deeper into specific grammatical phenomena in natural language that pose challenges for NLP systems — like auxiliary verbs, movement, questions, and uncertainty.

4.2 Auxiliary Verbs

Auxiliary verbs (also called helping verbs) are verbs that are used alongside a main verb to express tense, mood, voice, or aspect.

Common English Auxiliaries:

Be: is, am, are, was, were, been, being
Have: have, has, had
Do: do, does, did
Modal auxiliaries: can, could, will, would, shall, should, may, might, must

Examples:

"She is running." (be + present participle = present continuous)
"They have finished." (have + past participle = present perfect)
"He can swim." (modal = ability)
"You must leave." (modal = obligation)

Importance in NLP:

Auxiliaries signal tense, mood, and voice.
They are crucial for understanding the time and certainty of an action.
Grammars must correctly handle sequences like: "She might have been sleeping." (modal + have + been + verb-ing)

Verb Phrases with Auxiliaries:

VP → Aux VP
VP → Modal VP
VP → have + VP[past participle]
VP → be + VP[present participle]

4.3 Verb Phrases

A Verb Phrase (VP) is the part of a sentence that contains the verb and everything that relates to it.

Structure of VP:

VP → V (intransitive): "The dog runs."
VP → V + NP (transitive): "She ate the apple."
VP → V + PP: "He sat on the chair."
VP → V + NP + PP: "She gave the book to him."
VP → V + S (sentential complement): "He said that she left."

Verb Types:

Intransitive: Does not take an object. "She sleeps."
Transitive: Takes a direct object. "She reads a book."
Ditransitive: Takes two objects. "He gave her a gift."
Linking: Connects subject to description. "She is happy."

4.4 Movement Phenomena in Language

Movement refers to the phenomenon where elements of a sentence appear to have "moved" from their original position.

This is an important concept from transformational grammar (Chomsky).

Types of Movement:

1. Wh-Movement (Question Formation):

In a statement: "She bought a book."
In a question: "What did she buy __?"
"What" has moved to the front, leaving a gap (__) in the original position.

2. Passive Movement:

Active: "The dog bit the man."
Passive: "The man was bitten by the dog."
The object "the man" has moved to the subject position.

3. Topicalization:

"This book, I really like." (object moved to front for emphasis)

Why it matters in NLP:

To understand the meaning of a question or passive sentence, the parser must recognize movement and reconstruct the original relationship.
Gap-filling is needed: understand that "What did she buy?" means she bought what → find the answer.

4.5 Handling Questions in Context-Free Grammars

Questions in English are of several types:

1. Yes/No Questions:

Formed by inverting subject and auxiliary.
"She is coming." → "Is she coming?"
Grammar rule: Q → Aux NP VP

2. Wh-Questions:

Use question words: what, where, when, who, why, how.
"What did she buy?"
Grammar rule: Q → Wh-word Aux NP VP

3. Tag Questions:

"She is coming, isn't she?"

Challenges for CFG:

Standard CFG cannot easily handle the "gap" left by moved elements in wh-questions.
Solution: Use Gap Threading or Slash Categories in grammar.

Slash Category:

A special notation: NP/NP means "an NP with an NP missing inside it."
Allows tracking where the gap is.
Example: "What did she buy __?" → Parse as: S/NP → Q/NP with "what" filling the NP slot.

4.6 Human Preferences in Parsing

When a sentence is ambiguous, humans tend to prefer certain interpretations over others. NLP systems should model these preferences.

Garden Path Sentences:

Sentences that mislead the reader into one interpretation before forcing a correction.
Example: "The horse raced past the barn fell."
First reading: "The horse raced past the barn" (seems complete).
Correct reading: "The horse [that was] raced past the barn fell."

Human Parsing Preferences:

1. Minimal Attachment:

Humans prefer the parse with fewer nodes in the parse tree (simpler structure).
Example: "I saw the man with a telescope."
Preferred: I saw [the man] [with a telescope] (PP attaches to VP — simpler)
Alternative: I saw [the man [with a telescope]] (PP attaches to NP)

2. Late Closure (Right Association):

Humans prefer to attach new material to the most recent phrase.
Example: "She said that he left yesterday."
Preferred: "yesterday" modifies "left" (recent VP), not "said."

Why it matters:

NLP parsers use these preferences to choose among multiple parse trees.
Probabilistic parsers assign probabilities based on these preferences.

4.7 Encoding Uncertainty

Language is full of uncertainty. A sentence can have multiple valid interpretations. NLP systems need to handle this uncertainty.

Sources of Uncertainty:

Lexical ambiguity: "I went to the bank." (river bank or financial bank?)
Structural ambiguity: "I saw the man with binoculars." (who has the binoculars?)
Referential ambiguity: "John told Peter that he was late." (who is "he"?)

Ways to Encode Uncertainty:

1. Multiple Parse Trees:

Store all possible parse trees and let later processing choose.
Problem: Too many trees for complex sentences.

2. Probabilistic Methods:

Assign a probability to each possible interpretation.
Choose the highest probability interpretation.

3. Packed Representations:

Efficiently store multiple interpretations in a compact structure.
Example: Shared forests where common subtrees are not repeated.

4.8 Deterministic Parsing

A Deterministic Parser makes only one decision at each step — it never backtracks.

Key Idea:

Instead of trying all possibilities and backtracking, the parser uses clever strategies to always make the right choice.

Marcus Parser (1980):

A well-known deterministic parser for English.
Uses a small buffer (window of words) to look ahead and make correct decisions.
Based on the observation that most English sentences can be parsed deterministically with limited look-ahead.

Advantages:

Very fast — no backtracking.
Psychologically plausible — humans also seem to parse deterministically most of the time.

Disadvantages:

Cannot handle all ambiguous sentences.
Breaks down on garden path sentences.

Unit 4 | Natural Language Processing Notes | AKTU Notes

UNIT IV: Grammars for Natural Language

4.1 Introduction

4.2 Auxiliary Verbs

4.3 Verb Phrases

4.4 Movement Phenomena in Language

4.5 Handling Questions in Context-Free Grammars

4.6 Human Preferences in Parsing

4.7 Encoding Uncertainty

4.8 Deterministic Parsing

No comments:

Post a Comment

Advertisement

SEARCH

LATEST

FOLLOW ME

SECCIONS

ABOUT

Popular

Latest courses

Categories

Quick Links

Comments

About

Top Links Menu

Unit 4 | Natural Language Processing Notes | AKTU Notes

UNIT IV: Grammars for Natural Language

4.1 Introduction

4.2 Auxiliary Verbs

4.3 Verb Phrases

4.4 Movement Phenomena in Language

4.5 Handling Questions in Context-Free Grammars

4.6 Human Preferences in Parsing

4.7 Encoding Uncertainty

4.8 Deterministic Parsing

No comments:

Post a Comment

Advertisement

SEARCH

LATEST

FOLLOW ME

SECCIONS

ABOUT

Popular

Latest courses

Categories

Quick Links

Comments

About