Unit 5 | Natural Language Processing Notes

UNIT V: Ambiguity Resolution

5.1 What is Ambiguity?

Ambiguity occurs when a word, phrase, or sentence has more than one possible interpretation.

Ambiguity is one of the biggest challenges in NLP. Human brains resolve it easily using context and world knowledge, but for computers it is very hard.

Types of Ambiguity:

Lexical: Word has multiple meanings.
Structural/Syntactic: Sentence can be parsed in multiple ways.
Semantic: Sentence meaning is unclear.
Referential: It is unclear what a pronoun refers to.

5.2 Statistical Methods for Ambiguity Resolution

Statistical NLP uses data (large amounts of text) to resolve ambiguity by finding the most probable interpretation.

Core Idea:

Given an ambiguous input, choose the interpretation that is most likely based on patterns in training data.

Probability Rule:

P(interpretation | input) — Probability of each interpretation given the input.
Choose the interpretation with the highest probability.

Training Data:

Large collections of text (called corpora) are used.
Example: Wikipedia, news articles, books.

5.3 Probabilistic Language Processing

Probabilistic language models assign probabilities to sequences of words.

N-gram Models:

An N-gram is a sequence of N words.
Unigram: Single word — P("dog")
Bigram: Two words — P("dog" | "the") — probability of "dog" after "the"
Trigram: Three words — P("chased" | "dog", "the")

Formula (Bigram):

P(sentence) = P(w₁) × P(w₂|w₁) × P(w₃|w₂) × ...

Application:

Speech recognition: Which word sequence is most likely?
Machine translation: Which translation sounds most natural?
Auto-complete: What word is most likely to come next?

Smoothing:

Problem: If a word pair never appeared in training data, P = 0.
Solution: Add a small count to all pairs (Laplace smoothing).

5.4 Estimating Probabilities

Probabilities are estimated from training data using Maximum Likelihood Estimation (MLE).

Formula:

P(word₂ | word₁) = Count(word₁, word₂) / Count(word₁)

Example:

From training data: "the dog" appears 100 times, "the cat" appears 50 times, "the" appears 200 times.
P("dog" | "the") = 100/200 = 0.5
P("cat" | "the") = 50/200 = 0.25

Problem with MLE:

If a word pair never appeared, probability = 0.
This causes problems (multiplying by 0 kills the whole probability).
Solution: Smoothing techniques (Laplace, Kneser-Ney, Good-Turing).

5.5 Part-of-Speech (POS) Tagging

POS Tagging is the process of assigning a grammatical category (noun, verb, adjective, etc.) to each word in a sentence.

Why is it hard?

Many words can be multiple parts of speech.
"Run" can be a noun ("a run in the park") or a verb ("I run daily").
"Back" can be noun, verb, adjective, or adverb.

Common POS Tags (Penn Treebank):

Tag	Meaning
NN	Noun, singular
NNS	Noun, plural
VB	Verb, base form
VBD	Verb, past tense
JJ	Adjective
RB	Adverb
DT	Determiner
IN	Preposition
CC	Coordinating conjunction

Methods for POS Tagging:

1. Rule-Based:

Use hand-written rules to assign tags.
Example: "If word ends in '-ing' and is after 'is', tag as VBG."

2. Statistical (HMM-based):

Use Hidden Markov Models (HMM) — probabilistic model.
Uses two probabilities:
Emission Probability: P(word | tag) — how likely is this word given this tag?
Transition Probability: P(tag₂ | tag₁) — how likely is this tag after previous tag?
Uses Viterbi Algorithm to find the best sequence of tags.

3. Machine Learning based:

Train a classifier (like Maximum Entropy, SVM, or Neural Network) on tagged data.
Modern: Use BERT, transformers for state-of-the-art accuracy.

Example:

Input: "The dog runs fast."
Output: "The/DT dog/NN runs/VBZ fast/RB"

5.6 Obtaining Lexical Probabilities

Lexical probabilities are probabilities associated with words — how likely a word is to appear in a certain context or with a certain meaning.

Word Sense Disambiguation (WSD):

The task of figuring out which meaning of a word is intended in context.
Example: "I went to the bank." → financial bank (if context is money-related) or river bank (if context is nature-related)?

Methods:

1. Dictionary-based:

Use WordNet or a dictionary to find all senses of a word.
Choose the sense whose definition overlaps most with surrounding words.

2. Supervised ML:

Train a classifier on examples where word senses are labeled.
Features: surrounding words, POS tags, syntactic structure.

3. Unsupervised (Clustering):

Group similar usages of a word together without labeled data.

Selectional Restrictions:

Words constrain what types of objects they go with.
"Eat" requires an edible object → "She ate a sandwich." ✓ / "She ate a rock." ✗ (unusual)
These restrictions help resolve ambiguity.

5.7 Probabilistic Context-Free Grammars (PCFG)

A PCFG is a CFG where each grammar rule has a probability associated with it.

Format:

Rule: S → NP VP [probability: 1.0]
Rule: VP → V NP [probability: 0.6]
Rule: VP → V [probability: 0.4]

Properties:

The probabilities of all rules expanding the same non-terminal must sum to 1.
Example: P(VP→VNP) + P(VP→V) = 0.6 + 0.4 = 1.0 ✓

How it works:

For an ambiguous sentence with multiple parse trees, compute probability of each tree.
The probability of a parse tree = product of probabilities of all rules used.
Choose the tree with the highest probability.

Training PCFGs:

Use a Treebank (collection of sentences with annotated parse trees).
Count how often each rule is used.
Estimate probabilities using MLE.

Example:

Parse 1 probability = 0.6 × 0.8 × 0.7 = 0.336
Parse 2 probability = 0.4 × 0.9 × 0.5 = 0.180
Choose Parse 1 ✓

5.8 Best-First Parsing

When there are many possible parses, it is inefficient to compute all of them. Best-First Parsing uses heuristics to explore the most promising parses first.

Idea:

Assign a score to each partial parse.
Always expand the partial parse with the highest score.
Stop when a complete parse is found.

Analogy: Like A* search algorithm — guided by both actual cost and estimated future cost.

Advantages:

Much faster than exhaustive parsing.
Usually finds the best parse quickly.

Used in:

Large-scale parsing systems.
Speech recognition decoding.

5.9 Semantics and Logical Form

Logical Form (LF) is a formal representation of the meaning of a sentence, usually in predicate logic or lambda calculus.

Goal: Convert English sentences to logical expressions that can be reasoned about.

Example:

Sentence: "Every student passed the exam."
Logical Form: ∀x: student(x) → passed(x, exam)

Example 2:

Sentence: "John loves Mary."
Logical Form: loves(John, Mary)

Compositional Semantics:

Meaning is built up step by step, mirroring the parse tree.
Each rule in the grammar has a corresponding semantic rule.

Lambda Calculus:

Used to represent meanings of phrases that are not yet complete.
Example: "loves Mary" = λx. loves(x, Mary) — "something that loves Mary"
When applied to "John": loves(John, Mary) ✓

5.10 Word Senses and Ambiguity

Word sense = one specific meaning of a word.

Most content words in English have multiple senses:

"Run": to move fast / to manage / a score in cricket / a run of bad luck
"Light": not heavy / a source of illumination / a light color

WordNet:

A large electronic dictionary organized by meaning (not alphabetically).
Groups words into synsets (sets of synonyms representing one concept).
Connects synsets with relationships: hypernym (is-a), hyponym, meronym (part-of), antonym.

Example WordNet structure:

dog → canine → mammal → animal → living thing

5.11 Encoding Ambiguity in Logical Form

When a sentence is ambiguous at the semantic level, the logical form must capture this ambiguity.

Scope Ambiguity:

Sentence: "Every teacher loves some student."
Reading 1: For every teacher, there is some student they love. (∀t ∃s: loves(t,s))
Reading 2: There is some student that every teacher loves. (∃s ∀t: loves(t,s))

Methods to handle:

1. Underspecification:

Instead of choosing one interpretation, represent the ambiguity formally without resolving it.
Use constraints to represent all possible readings at once.
Resolution happens only when more context is available.

2. Multiple Representations:

Generate all possible logical forms.
Use context or world knowledge to select the right one.

3. Quasi-Logical Form:

A partially interpreted representation that deliberately leaves scope ambiguities unresolved.
Later modules resolve them using pragmatic information.

Summary Table — All 5 Units

Unit	Topic	Key Takeaway
I	Introduction to NLU	NLP teaches computers to understand human language; language has multiple levels from phonology to discourse
II	Semantics & Knowledge Representation	Meaning is represented using logic, semantic networks, frames; applied in machine translation and database interfaces
III	Grammars & Parsing	CFG defines sentence structure; Top-down and bottom-up parsers; features handle agreement; ATNs are powerful parsers
IV	Grammars for Natural Language	Auxiliaries, movement, questions, and uncertainty are key challenges; deterministic parsers are fast but limited
V	Ambiguity Resolution	Statistical methods, POS tagging, PCFGs, and logical forms are used to resolve ambiguity in language

Unit 5 | Natural Language Processing Notes | AKTU Notes

UNIT V: Ambiguity Resolution

5.1 What is Ambiguity?

5.2 Statistical Methods for Ambiguity Resolution

5.3 Probabilistic Language Processing

5.4 Estimating Probabilities

5.5 Part-of-Speech (POS) Tagging

5.6 Obtaining Lexical Probabilities

5.7 Probabilistic Context-Free Grammars (PCFG)

5.8 Best-First Parsing

5.9 Semantics and Logical Form

5.10 Word Senses and Ambiguity

5.11 Encoding Ambiguity in Logical Form

Summary Table — All 5 Units

No comments:

Post a Comment

Advertisement

SEARCH

LATEST

FOLLOW ME

SECCIONS

ABOUT

Popular

Latest courses

Categories

Quick Links

Comments

About

Top Links Menu

Unit 5 | Natural Language Processing Notes | AKTU Notes

UNIT V: Ambiguity Resolution

5.1 What is Ambiguity?

5.2 Statistical Methods for Ambiguity Resolution

5.3 Probabilistic Language Processing

5.4 Estimating Probabilities

5.5 Part-of-Speech (POS) Tagging

5.6 Obtaining Lexical Probabilities

5.7 Probabilistic Context-Free Grammars (PCFG)

5.8 Best-First Parsing

5.9 Semantics and Logical Form

5.10 Word Senses and Ambiguity

5.11 Encoding Ambiguity in Logical Form

Summary Table — All 5 Units

No comments:

Post a Comment

Advertisement

SEARCH

LATEST

FOLLOW ME

SECCIONS

ABOUT

Popular

Latest courses

Categories

Quick Links

Comments

About