Unit 5 | Natural Language Processing Notes | AKTU Notes



    UNIT V: Ambiguity Resolution

    5.1 What is Ambiguity?

    Ambiguity occurs when a word, phrase, or sentence has more than one possible interpretation.

    Ambiguity is one of the biggest challenges in NLP. Human brains resolve it easily using context and world knowledge, but for computers it is very hard.

    Types of Ambiguity:

    • Lexical: Word has multiple meanings.
    • Structural/Syntactic: Sentence can be parsed in multiple ways.
    • Semantic: Sentence meaning is unclear.
    • Referential: It is unclear what a pronoun refers to.

    5.2 Statistical Methods for Ambiguity Resolution

    Statistical NLP uses data (large amounts of text) to resolve ambiguity by finding the most probable interpretation.

    Core Idea:

    • Given an ambiguous input, choose the interpretation that is most likely based on patterns in training data.

    Probability Rule:

    • P(interpretation | input) — Probability of each interpretation given the input.
    • Choose the interpretation with the highest probability.

    Training Data:

    • Large collections of text (called corpora) are used.
    • Example: Wikipedia, news articles, books.

    5.3 Probabilistic Language Processing

    Probabilistic language models assign probabilities to sequences of words.

    N-gram Models:

    • An N-gram is a sequence of N words.
    • Unigram: Single word — P("dog")
    • Bigram: Two words — P("dog" | "the") — probability of "dog" after "the"
    • Trigram: Three words — P("chased" | "dog", "the")

    Formula (Bigram):

    • P(sentence) = P(w₁) × P(w₂|w₁) × P(w₃|w₂) × ...

    Application:

    • Speech recognition: Which word sequence is most likely?
    • Machine translation: Which translation sounds most natural?
    • Auto-complete: What word is most likely to come next?

    Smoothing:

    • Problem: If a word pair never appeared in training data, P = 0.
    • Solution: Add a small count to all pairs (Laplace smoothing).

    5.4 Estimating Probabilities

    Probabilities are estimated from training data using Maximum Likelihood Estimation (MLE).

    Formula:

    • P(word₂ | word₁) = Count(word₁, word₂) / Count(word₁)

    Example:

    • From training data: "the dog" appears 100 times, "the cat" appears 50 times, "the" appears 200 times.
    • P("dog" | "the") = 100/200 = 0.5
    • P("cat" | "the") = 50/200 = 0.25

    Problem with MLE:

    • If a word pair never appeared, probability = 0.
    • This causes problems (multiplying by 0 kills the whole probability).
    • Solution: Smoothing techniques (Laplace, Kneser-Ney, Good-Turing).

    5.5 Part-of-Speech (POS) Tagging

    POS Tagging is the process of assigning a grammatical category (noun, verb, adjective, etc.) to each word in a sentence.

    Why is it hard?

    • Many words can be multiple parts of speech.
    • "Run" can be a noun ("a run in the park") or a verb ("I run daily").
    • "Back" can be noun, verb, adjective, or adverb.

    Common POS Tags (Penn Treebank):

    TagMeaning
    NNNoun, singular
    NNSNoun, plural
    VBVerb, base form
    VBDVerb, past tense
    JJAdjective
    RBAdverb
    DTDeterminer
    INPreposition
    CCCoordinating conjunction

    Methods for POS Tagging:

    1. Rule-Based:

    • Use hand-written rules to assign tags.
    • Example: "If word ends in '-ing' and is after 'is', tag as VBG."

    2. Statistical (HMM-based):

    • Use Hidden Markov Models (HMM) — probabilistic model.
    • Uses two probabilities:
    • Emission Probability: P(word | tag) — how likely is this word given this tag?
    • Transition Probability: P(tag₂ | tag₁) — how likely is this tag after previous tag?
    • Uses Viterbi Algorithm to find the best sequence of tags.

    3. Machine Learning based:

    • Train a classifier (like Maximum Entropy, SVM, or Neural Network) on tagged data.
    • Modern: Use BERT, transformers for state-of-the-art accuracy.

    Example:

    • Input: "The dog runs fast."
    • Output: "The/DT dog/NN runs/VBZ fast/RB"

    5.6 Obtaining Lexical Probabilities

    Lexical probabilities are probabilities associated with words — how likely a word is to appear in a certain context or with a certain meaning.

    Word Sense Disambiguation (WSD):

    • The task of figuring out which meaning of a word is intended in context.
    • Example: "I went to the bank." → financial bank (if context is money-related) or river bank (if context is nature-related)?

    Methods:

    1. Dictionary-based:

    • Use WordNet or a dictionary to find all senses of a word.
    • Choose the sense whose definition overlaps most with surrounding words.

    2. Supervised ML:

    • Train a classifier on examples where word senses are labeled.
    • Features: surrounding words, POS tags, syntactic structure.

    3. Unsupervised (Clustering):

    • Group similar usages of a word together without labeled data.

    Selectional Restrictions:

    • Words constrain what types of objects they go with.
    • "Eat" requires an edible object → "She ate a sandwich." ✓ / "She ate a rock." ✗ (unusual)
    • These restrictions help resolve ambiguity.

    5.7 Probabilistic Context-Free Grammars (PCFG)

    A PCFG is a CFG where each grammar rule has a probability associated with it.

    Format:

    • Rule: S → NP VP [probability: 1.0]
    • Rule: VP → V NP [probability: 0.6]
    • Rule: VP → V [probability: 0.4]

    Properties:

    • The probabilities of all rules expanding the same non-terminal must sum to 1.
    • Example: P(VP→VNP) + P(VP→V) = 0.6 + 0.4 = 1.0 ✓

    How it works:

    • For an ambiguous sentence with multiple parse trees, compute probability of each tree.
    • The probability of a parse tree = product of probabilities of all rules used.
    • Choose the tree with the highest probability.

    Training PCFGs:

    • Use a Treebank (collection of sentences with annotated parse trees).
    • Count how often each rule is used.
    • Estimate probabilities using MLE.

    Example:

    • Parse 1 probability = 0.6 × 0.8 × 0.7 = 0.336
    • Parse 2 probability = 0.4 × 0.9 × 0.5 = 0.180
    • Choose Parse 1 ✓

    5.8 Best-First Parsing

    When there are many possible parses, it is inefficient to compute all of them. Best-First Parsing uses heuristics to explore the most promising parses first.

    Idea:

    • Assign a score to each partial parse.
    • Always expand the partial parse with the highest score.
    • Stop when a complete parse is found.

    Analogy: Like A* search algorithm — guided by both actual cost and estimated future cost.

    Advantages:

    • Much faster than exhaustive parsing.
    • Usually finds the best parse quickly.

    Used in:

    • Large-scale parsing systems.
    • Speech recognition decoding.

    5.9 Semantics and Logical Form

    Logical Form (LF) is a formal representation of the meaning of a sentence, usually in predicate logic or lambda calculus.

    Goal: Convert English sentences to logical expressions that can be reasoned about.

    Example:

    • Sentence: "Every student passed the exam."
    • Logical Form: ∀x: student(x) → passed(x, exam)

    Example 2:

    • Sentence: "John loves Mary."
    • Logical Form: loves(John, Mary)

    Compositional Semantics:

    • Meaning is built up step by step, mirroring the parse tree.
    • Each rule in the grammar has a corresponding semantic rule.

    Lambda Calculus:

    • Used to represent meanings of phrases that are not yet complete.
    • Example: "loves Mary" = λx. loves(x, Mary) — "something that loves Mary"
    • When applied to "John": loves(John, Mary) ✓

    5.10 Word Senses and Ambiguity

    Word sense = one specific meaning of a word.

    Most content words in English have multiple senses:

    • "Run": to move fast / to manage / a score in cricket / a run of bad luck
    • "Light": not heavy / a source of illumination / a light color

    WordNet:

    • A large electronic dictionary organized by meaning (not alphabetically).
    • Groups words into synsets (sets of synonyms representing one concept).
    • Connects synsets with relationships: hypernym (is-a), hyponym, meronym (part-of), antonym.

    Example WordNet structure:

    • dog → canine → mammal → animal → living thing

    5.11 Encoding Ambiguity in Logical Form

    When a sentence is ambiguous at the semantic level, the logical form must capture this ambiguity.

    Scope Ambiguity:

    • Sentence: "Every teacher loves some student."
    • Reading 1: For every teacher, there is some student they love. (∀t ∃s: loves(t,s))
    • Reading 2: There is some student that every teacher loves. (∃s ∀t: loves(t,s))

    Methods to handle:

    1. Underspecification:

    • Instead of choosing one interpretation, represent the ambiguity formally without resolving it.
    • Use constraints to represent all possible readings at once.
    • Resolution happens only when more context is available.

    2. Multiple Representations:

    • Generate all possible logical forms.
    • Use context or world knowledge to select the right one.

    3. Quasi-Logical Form:

    • A partially interpreted representation that deliberately leaves scope ambiguities unresolved.
    • Later modules resolve them using pragmatic information.

    Summary Table — All 5 Units

    UnitTopicKey Takeaway
    IIntroduction to NLUNLP teaches computers to understand human language; language has multiple levels from phonology to discourse
    IISemantics & Knowledge RepresentationMeaning is represented using logic, semantic networks, frames; applied in machine translation and database interfaces
    IIIGrammars & ParsingCFG defines sentence structure; Top-down and bottom-up parsers; features handle agreement; ATNs are powerful parsers
    IVGrammars for Natural LanguageAuxiliaries, movement, questions, and uncertainty are key challenges; deterministic parsers are fast but limited
    VAmbiguity ResolutionStatistical methods, POS tagging, PCFGs, and logical forms are used to resolve ambiguity in language

    No comments:

    Post a Comment