Unit 4 | Natural Language Processing Notes | AKTU Notes



    UNIT IV: Grammars for Natural Language

    4.1 Introduction

    This unit goes deeper into specific grammatical phenomena in natural language that pose challenges for NLP systems — like auxiliary verbs, movement, questions, and uncertainty.

    4.2 Auxiliary Verbs

    Auxiliary verbs (also called helping verbs) are verbs that are used alongside a main verb to express tense, mood, voice, or aspect.

    Common English Auxiliaries:

    • Be: is, am, are, was, were, been, being
    • Have: have, has, had
    • Do: do, does, did
    • Modal auxiliaries: can, could, will, would, shall, should, may, might, must

    Examples:

    • "She is running." (be + present participle = present continuous)
    • "They have finished." (have + past participle = present perfect)
    • "He can swim." (modal = ability)
    • "You must leave." (modal = obligation)

    Importance in NLP:

    • Auxiliaries signal tense, mood, and voice.
    • They are crucial for understanding the time and certainty of an action.
    • Grammars must correctly handle sequences like: "She might have been sleeping." (modal + have + been + verb-ing)

    Verb Phrases with Auxiliaries:

    • VP → Aux VP
    • VP → Modal VP
    • VP → have + VP[past participle]
    • VP → be + VP[present participle]

    4.3 Verb Phrases

    A Verb Phrase (VP) is the part of a sentence that contains the verb and everything that relates to it.

    Structure of VP:

    • VP → V (intransitive): "The dog runs."
    • VP → V + NP (transitive): "She ate the apple."
    • VP → V + PP: "He sat on the chair."
    • VP → V + NP + PP: "She gave the book to him."
    • VP → V + S (sentential complement): "He said that she left."

    Verb Types:

    • Intransitive: Does not take an object. "She sleeps."
    • Transitive: Takes a direct object. "She reads a book."
    • Ditransitive: Takes two objects. "He gave her a gift."
    • Linking: Connects subject to description. "She is happy."

    4.4 Movement Phenomena in Language

    Movement refers to the phenomenon where elements of a sentence appear to have "moved" from their original position.

    This is an important concept from transformational grammar (Chomsky).

    Types of Movement:

    1. Wh-Movement (Question Formation):

    • In a statement: "She bought a book."
    • In a question: "What did she buy __?"
    • "What" has moved to the front, leaving a gap (__) in the original position.

    2. Passive Movement:

    • Active: "The dog bit the man."
    • Passive: "The man was bitten by the dog."
    • The object "the man" has moved to the subject position.

    3. Topicalization:

    • "This book, I really like." (object moved to front for emphasis)

    Why it matters in NLP:

    • To understand the meaning of a question or passive sentence, the parser must recognize movement and reconstruct the original relationship.
    • Gap-filling is needed: understand that "What did she buy?" means she bought what → find the answer.

    4.5 Handling Questions in Context-Free Grammars

    Questions in English are of several types:

    1. Yes/No Questions:

    • Formed by inverting subject and auxiliary.
    • "She is coming." → "Is she coming?"
    • Grammar rule: Q → Aux NP VP

    2. Wh-Questions:

    • Use question words: what, where, when, who, why, how.
    • "What did she buy?"
    • Grammar rule: Q → Wh-word Aux NP VP

    3. Tag Questions:

    • "She is coming, isn't she?"

    Challenges for CFG:

    • Standard CFG cannot easily handle the "gap" left by moved elements in wh-questions.
    • Solution: Use Gap Threading or Slash Categories in grammar.

    Slash Category:

    • A special notation: NP/NP means "an NP with an NP missing inside it."
    • Allows tracking where the gap is.
    • Example: "What did she buy __?" → Parse as: S/NP → Q/NP with "what" filling the NP slot.

    4.6 Human Preferences in Parsing

    When a sentence is ambiguous, humans tend to prefer certain interpretations over others. NLP systems should model these preferences.

    Garden Path Sentences:

    • Sentences that mislead the reader into one interpretation before forcing a correction.
    • Example: "The horse raced past the barn fell."
    • First reading: "The horse raced past the barn" (seems complete).
    • Correct reading: "The horse [that was] raced past the barn fell."

    Human Parsing Preferences:

    1. Minimal Attachment:

    • Humans prefer the parse with fewer nodes in the parse tree (simpler structure).
    • Example: "I saw the man with a telescope."
    • Preferred: I saw [the man] [with a telescope] (PP attaches to VP — simpler)
    • Alternative: I saw [the man [with a telescope]] (PP attaches to NP)

    2. Late Closure (Right Association):

    • Humans prefer to attach new material to the most recent phrase.
    • Example: "She said that he left yesterday."
    • Preferred: "yesterday" modifies "left" (recent VP), not "said."

    Why it matters:

    • NLP parsers use these preferences to choose among multiple parse trees.
    • Probabilistic parsers assign probabilities based on these preferences.

    4.7 Encoding Uncertainty

    Language is full of uncertainty. A sentence can have multiple valid interpretations. NLP systems need to handle this uncertainty.

    Sources of Uncertainty:

    • Lexical ambiguity: "I went to the bank." (river bank or financial bank?)
    • Structural ambiguity: "I saw the man with binoculars." (who has the binoculars?)
    • Referential ambiguity: "John told Peter that he was late." (who is "he"?)

    Ways to Encode Uncertainty:

    1. Multiple Parse Trees:

    • Store all possible parse trees and let later processing choose.
    • Problem: Too many trees for complex sentences.

    2. Probabilistic Methods:

    • Assign a probability to each possible interpretation.
    • Choose the highest probability interpretation.

    3. Packed Representations:

    • Efficiently store multiple interpretations in a compact structure.
    • Example: Shared forests where common subtrees are not repeated.

    4.8 Deterministic Parsing

    A Deterministic Parser makes only one decision at each step — it never backtracks.

    Key Idea:

    • Instead of trying all possibilities and backtracking, the parser uses clever strategies to always make the right choice.

    Marcus Parser (1980):

    • A well-known deterministic parser for English.
    • Uses a small buffer (window of words) to look ahead and make correct decisions.
    • Based on the observation that most English sentences can be parsed deterministically with limited look-ahead.

    Advantages:

    • Very fast — no backtracking.
    • Psychologically plausible — humans also seem to parse deterministically most of the time.

    Disadvantages:

    • Cannot handle all ambiguous sentences.
    • Breaks down on garden path sentences.

    No comments:

    Post a Comment