UNIT I: Introduction to Natural Language Understanding
1.1 What is Natural Language?
Natural language is the language that humans use to communicate with each other in everyday life — like English, Hindi, French, etc. It is different from programming languages (like Python, C++) which are formal and have strict rules.
Examples of natural language:
- "Please book me a flight to Delhi tomorrow."
- "What is the weather like today?"
- "Show me the nearest restaurant."
1.2 What is Natural Language Processing (NLP)?
NLP is a branch of Artificial Intelligence (AI) that deals with teaching computers to understand, interpret, and generate human language.
In simple words: NLP = Making computers read, understand, and respond to human language.
Why is NLP hard?
- Human language is ambiguous (one word can have many meanings).
- Language has sarcasm, idioms, slang, and context.
- Different people speak the same language differently.
1.3 The Study of Language
To understand NLP, we first need to understand how language works. Language has several levels:
1. Phonology – Study of sounds in a language.
- Example: The word "cat" has 3 sounds — /k/, /æ/, /t/
2. Morphology – Study of the structure of words.
- Example: "unhappiness" = un + happy + ness (prefix + root + suffix)
3. Syntax – Study of grammar and sentence structure.
- Example: "The dog bit the man" is correct. "Bit dog the man the" is incorrect.
4. Semantics – Study of meaning of words and sentences.
- Example: "Bank" can mean a river bank or a financial bank.
5. Pragmatics – Study of how context affects meaning.
- Example: "Can you pass the salt?" — This is a request, not a yes/no question.
6. Discourse – Study of how sentences connect to form meaningful paragraphs or conversations.
1.4 Applications of NLP
NLP is used in many real-world applications:
| Application | Example |
|---|---|
| Machine Translation | Google Translate |
| Speech Recognition | Siri, Alexa, Google Assistant |
| Chatbots | Customer service bots |
| Sentiment Analysis | Analyzing product reviews |
| Text Summarization | News summarizers |
| Spell/Grammar Check | MS Word, Grammarly |
| Information Retrieval | Google Search |
| Question Answering | IBM Watson |
| Named Entity Recognition | Finding names, dates in text |
1.5 Evaluating Language Understanding Systems
How do we know if an NLP system is good? We evaluate it using different methods:
1. Turing Test:
- A machine passes if a human cannot distinguish between machine response and human response in a conversation.
2. Task-based Evaluation:
- Check how well the system performs a specific task.
- Example: Translation accuracy, question answering accuracy.
3. Metrics used:
- Precision – Out of all answers given, how many were correct?
- Recall – Out of all correct answers, how many did the system find?
- F1-Score – Balance between precision and recall.
- BLEU Score – Used to evaluate machine translation quality.
1.6 Different Levels of Language Analysis
When a computer processes a sentence, it goes through multiple levels:
- Input Text
- Phonological Analysis (sounds)
- Morphological Analysis (word structure)
- Syntactic Analysis (grammar/sentence structure)
- Semantic Analysis (meaning)
- Pragmatic Analysis (context)
- Discourse Analysis (full conversation/paragraph)
- Output / Understanding
Each level feeds information to the next. This is called the pipeline approach in NLP.
1.7 Representations and Understanding
For a computer to "understand" language, it needs to represent language in a mathematical or logical form.
Types of representations:
1. String Representation:
- Simple — text is just a sequence of characters.
- Example: "dog" = ['d', 'o', 'g']
2. Bag of Words (BoW):
- Represents a document by counting how many times each word appears.
- Ignores grammar and word order.
- Example: "I love NLP. I love coding." → {I:2, love:2, NLP:1, coding:1}
3. TF-IDF (Term Frequency – Inverse Document Frequency):
- Measures how important a word is in a document.
- Common words (like "the", "is") get low score. Rare, important words get high score.
4. Word Embeddings (Word2Vec, GloVe):
- Words are represented as vectors (lists of numbers) in a multi-dimensional space.
- Similar words have similar vectors.
- Example: Vector of "king" – "man" + "woman" ≈ Vector of "queen"
5. Logical Form:
- Language is converted to formal logic.
- Example: "Every dog is an animal" → ∀x: dog(x) → animal(x)
1.8 Organization of Natural Language Understanding Systems
A complete NLU system has the following components working together:
- Lexicon (Dictionary): Stores words and their meanings, grammar category, etc.
- Grammar Rules: Defines how words combine to form valid sentences.
- Parser: Analyzes sentence structure using grammar rules.
- Semantic Interpreter: Extracts meaning from the parsed structure.
- Discourse Module: Handles references and connections across sentences.
- Pragmatic Module: Interprets meaning in context.
- Knowledge Base: Stores world knowledge needed to understand sentences.
1.9 Linguistic Background: An Outline of English Syntax
Syntax is the set of rules that define how words are arranged to form grammatically correct sentences.
Basic Building Blocks:
Parts of Speech (POS):
- Noun (N): person, place, thing — "dog", "city"
- Verb (V): action — "run", "eat"
- Adjective (Adj): describes noun — "big", "red"
- Adverb (Adv): describes verb — "quickly", "slowly"
- Preposition (P): "in", "on", "at"
- Determiner (Det): "a", "an", "the"
- Pronoun: "he", "she", "it"
- Conjunction: "and", "but", "or"
Phrases:
- Noun Phrase (NP): Det + Adj + N → "the big dog"
- Verb Phrase (VP): V + NP → "ate the food"
- Prepositional Phrase (PP): P + NP → "on the table"
Sentence Structure:
- S → NP + VP
- NP → Det + N
- VP → V + NP
Example:
- "The cat sat on the mat."
- S = [NP: The cat] + [VP: sat + [PP: on the mat]]
This kind of analysis is used by parsers in NLP systems.

No comments:
Post a Comment