UNIT V: Ambiguity Resolution
5.1 What is Ambiguity?
Ambiguity occurs when a word, phrase, or sentence has more than one possible interpretation.
Ambiguity is one of the biggest challenges in NLP. Human brains resolve it easily using context and world knowledge, but for computers it is very hard.
Types of Ambiguity:
- Lexical: Word has multiple meanings.
- Structural/Syntactic: Sentence can be parsed in multiple ways.
- Semantic: Sentence meaning is unclear.
- Referential: It is unclear what a pronoun refers to.
5.2 Statistical Methods for Ambiguity Resolution
Statistical NLP uses data (large amounts of text) to resolve ambiguity by finding the most probable interpretation.
Core Idea:
- Given an ambiguous input, choose the interpretation that is most likely based on patterns in training data.
Probability Rule:
- P(interpretation | input) — Probability of each interpretation given the input.
- Choose the interpretation with the highest probability.
Training Data:
- Large collections of text (called corpora) are used.
- Example: Wikipedia, news articles, books.
5.3 Probabilistic Language Processing
Probabilistic language models assign probabilities to sequences of words.
N-gram Models:
- An N-gram is a sequence of N words.
- Unigram: Single word — P("dog")
- Bigram: Two words — P("dog" | "the") — probability of "dog" after "the"
- Trigram: Three words — P("chased" | "dog", "the")
Formula (Bigram):
- P(sentence) = P(w₁) × P(w₂|w₁) × P(w₃|w₂) × ...
Application:
- Speech recognition: Which word sequence is most likely?
- Machine translation: Which translation sounds most natural?
- Auto-complete: What word is most likely to come next?
Smoothing:
- Problem: If a word pair never appeared in training data, P = 0.
- Solution: Add a small count to all pairs (Laplace smoothing).
5.4 Estimating Probabilities
Probabilities are estimated from training data using Maximum Likelihood Estimation (MLE).
Formula:
- P(word₂ | word₁) = Count(word₁, word₂) / Count(word₁)
Example:
- From training data: "the dog" appears 100 times, "the cat" appears 50 times, "the" appears 200 times.
- P("dog" | "the") = 100/200 = 0.5
- P("cat" | "the") = 50/200 = 0.25
Problem with MLE:
- If a word pair never appeared, probability = 0.
- This causes problems (multiplying by 0 kills the whole probability).
- Solution: Smoothing techniques (Laplace, Kneser-Ney, Good-Turing).
5.5 Part-of-Speech (POS) Tagging
POS Tagging is the process of assigning a grammatical category (noun, verb, adjective, etc.) to each word in a sentence.
Why is it hard?
- Many words can be multiple parts of speech.
- "Run" can be a noun ("a run in the park") or a verb ("I run daily").
- "Back" can be noun, verb, adjective, or adverb.
Common POS Tags (Penn Treebank):
| Tag | Meaning |
|---|---|
| NN | Noun, singular |
| NNS | Noun, plural |
| VB | Verb, base form |
| VBD | Verb, past tense |
| JJ | Adjective |
| RB | Adverb |
| DT | Determiner |
| IN | Preposition |
| CC | Coordinating conjunction |
Methods for POS Tagging:
1. Rule-Based:
- Use hand-written rules to assign tags.
- Example: "If word ends in '-ing' and is after 'is', tag as VBG."
2. Statistical (HMM-based):
- Use Hidden Markov Models (HMM) — probabilistic model.
- Uses two probabilities:
- Emission Probability: P(word | tag) — how likely is this word given this tag?
- Transition Probability: P(tag₂ | tag₁) — how likely is this tag after previous tag?
- Uses Viterbi Algorithm to find the best sequence of tags.
3. Machine Learning based:
- Train a classifier (like Maximum Entropy, SVM, or Neural Network) on tagged data.
- Modern: Use BERT, transformers for state-of-the-art accuracy.
Example:
- Input: "The dog runs fast."
- Output: "The/DT dog/NN runs/VBZ fast/RB"
5.6 Obtaining Lexical Probabilities
Lexical probabilities are probabilities associated with words — how likely a word is to appear in a certain context or with a certain meaning.
Word Sense Disambiguation (WSD):
- The task of figuring out which meaning of a word is intended in context.
- Example: "I went to the bank." → financial bank (if context is money-related) or river bank (if context is nature-related)?
Methods:
1. Dictionary-based:
- Use WordNet or a dictionary to find all senses of a word.
- Choose the sense whose definition overlaps most with surrounding words.
2. Supervised ML:
- Train a classifier on examples where word senses are labeled.
- Features: surrounding words, POS tags, syntactic structure.
3. Unsupervised (Clustering):
- Group similar usages of a word together without labeled data.
Selectional Restrictions:
- Words constrain what types of objects they go with.
- "Eat" requires an edible object → "She ate a sandwich." ✓ / "She ate a rock." ✗ (unusual)
- These restrictions help resolve ambiguity.
5.7 Probabilistic Context-Free Grammars (PCFG)
A PCFG is a CFG where each grammar rule has a probability associated with it.
Format:
- Rule: S → NP VP [probability: 1.0]
- Rule: VP → V NP [probability: 0.6]
- Rule: VP → V [probability: 0.4]
Properties:
- The probabilities of all rules expanding the same non-terminal must sum to 1.
- Example: P(VP→VNP) + P(VP→V) = 0.6 + 0.4 = 1.0 ✓
How it works:
- For an ambiguous sentence with multiple parse trees, compute probability of each tree.
- The probability of a parse tree = product of probabilities of all rules used.
- Choose the tree with the highest probability.
Training PCFGs:
- Use a Treebank (collection of sentences with annotated parse trees).
- Count how often each rule is used.
- Estimate probabilities using MLE.
Example:
- Parse 1 probability = 0.6 × 0.8 × 0.7 = 0.336
- Parse 2 probability = 0.4 × 0.9 × 0.5 = 0.180
- Choose Parse 1 ✓
5.8 Best-First Parsing
When there are many possible parses, it is inefficient to compute all of them. Best-First Parsing uses heuristics to explore the most promising parses first.
Idea:
- Assign a score to each partial parse.
- Always expand the partial parse with the highest score.
- Stop when a complete parse is found.
Analogy: Like A* search algorithm — guided by both actual cost and estimated future cost.
Advantages:
- Much faster than exhaustive parsing.
- Usually finds the best parse quickly.
Used in:
- Large-scale parsing systems.
- Speech recognition decoding.
5.9 Semantics and Logical Form
Logical Form (LF) is a formal representation of the meaning of a sentence, usually in predicate logic or lambda calculus.
Goal: Convert English sentences to logical expressions that can be reasoned about.
Example:
- Sentence: "Every student passed the exam."
- Logical Form: ∀x: student(x) → passed(x, exam)
Example 2:
- Sentence: "John loves Mary."
- Logical Form: loves(John, Mary)
Compositional Semantics:
- Meaning is built up step by step, mirroring the parse tree.
- Each rule in the grammar has a corresponding semantic rule.
Lambda Calculus:
- Used to represent meanings of phrases that are not yet complete.
- Example: "loves Mary" = λx. loves(x, Mary) — "something that loves Mary"
- When applied to "John": loves(John, Mary) ✓
5.10 Word Senses and Ambiguity
Word sense = one specific meaning of a word.
Most content words in English have multiple senses:
- "Run": to move fast / to manage / a score in cricket / a run of bad luck
- "Light": not heavy / a source of illumination / a light color
WordNet:
- A large electronic dictionary organized by meaning (not alphabetically).
- Groups words into synsets (sets of synonyms representing one concept).
- Connects synsets with relationships: hypernym (is-a), hyponym, meronym (part-of), antonym.
Example WordNet structure:
- dog → canine → mammal → animal → living thing
5.11 Encoding Ambiguity in Logical Form
When a sentence is ambiguous at the semantic level, the logical form must capture this ambiguity.
Scope Ambiguity:
- Sentence: "Every teacher loves some student."
- Reading 1: For every teacher, there is some student they love. (∀t ∃s: loves(t,s))
- Reading 2: There is some student that every teacher loves. (∃s ∀t: loves(t,s))
Methods to handle:
1. Underspecification:
- Instead of choosing one interpretation, represent the ambiguity formally without resolving it.
- Use constraints to represent all possible readings at once.
- Resolution happens only when more context is available.
2. Multiple Representations:
- Generate all possible logical forms.
- Use context or world knowledge to select the right one.
3. Quasi-Logical Form:
- A partially interpreted representation that deliberately leaves scope ambiguities unresolved.
- Later modules resolve them using pragmatic information.
Summary Table — All 5 Units
| Unit | Topic | Key Takeaway |
|---|---|---|
| I | Introduction to NLU | NLP teaches computers to understand human language; language has multiple levels from phonology to discourse |
| II | Semantics & Knowledge Representation | Meaning is represented using logic, semantic networks, frames; applied in machine translation and database interfaces |
| III | Grammars & Parsing | CFG defines sentence structure; Top-down and bottom-up parsers; features handle agreement; ATNs are powerful parsers |
| IV | Grammars for Natural Language | Auxiliaries, movement, questions, and uncertainty are key challenges; deterministic parsers are fast but limited |
| V | Ambiguity Resolution | Statistical methods, POS tagging, PCFGs, and logical forms are used to resolve ambiguity in language |

No comments:
Post a Comment