Unit 1 | Natural Language Processing Notes | AKTU Notes



    UNIT I: Introduction to Natural Language Understanding

    1.1 What is Natural Language?

    Natural language is the language that humans use to communicate with each other in everyday life — like English, Hindi, French, etc. It is different from programming languages (like Python, C++) which are formal and have strict rules.

    Examples of natural language:

    • "Please book me a flight to Delhi tomorrow."
    • "What is the weather like today?"
    • "Show me the nearest restaurant."

    1.2 What is Natural Language Processing (NLP)?

    NLP is a branch of Artificial Intelligence (AI) that deals with teaching computers to understand, interpret, and generate human language.

    In simple words: NLP = Making computers read, understand, and respond to human language.

    Why is NLP hard?

    • Human language is ambiguous (one word can have many meanings).
    • Language has sarcasm, idioms, slang, and context.
    • Different people speak the same language differently.

    1.3 The Study of Language

    To understand NLP, we first need to understand how language works. Language has several levels:

    1. Phonology – Study of sounds in a language.

    • Example: The word "cat" has 3 sounds — /k/, /æ/, /t/

    2. Morphology – Study of the structure of words.

    • Example: "unhappiness" = un + happy + ness (prefix + root + suffix)

    3. Syntax – Study of grammar and sentence structure.

    • Example: "The dog bit the man" is correct. "Bit dog the man the" is incorrect.

    4. Semantics – Study of meaning of words and sentences.

    • Example: "Bank" can mean a river bank or a financial bank.

    5. Pragmatics – Study of how context affects meaning.

    • Example: "Can you pass the salt?" — This is a request, not a yes/no question.

    6. Discourse – Study of how sentences connect to form meaningful paragraphs or conversations.

    1.4 Applications of NLP

    NLP is used in many real-world applications:

    ApplicationExample
    Machine TranslationGoogle Translate
    Speech RecognitionSiri, Alexa, Google Assistant
    ChatbotsCustomer service bots
    Sentiment AnalysisAnalyzing product reviews
    Text SummarizationNews summarizers
    Spell/Grammar CheckMS Word, Grammarly
    Information RetrievalGoogle Search
    Question AnsweringIBM Watson
    Named Entity RecognitionFinding names, dates in text

    1.5 Evaluating Language Understanding Systems

    How do we know if an NLP system is good? We evaluate it using different methods:

    1. Turing Test:

    • A machine passes if a human cannot distinguish between machine response and human response in a conversation.

    2. Task-based Evaluation:

    • Check how well the system performs a specific task.
    • Example: Translation accuracy, question answering accuracy.

    3. Metrics used:

    • Precision – Out of all answers given, how many were correct?
    • Recall – Out of all correct answers, how many did the system find?
    • F1-Score – Balance between precision and recall.
    • BLEU Score – Used to evaluate machine translation quality.

    1.6 Different Levels of Language Analysis

    When a computer processes a sentence, it goes through multiple levels:

    • Input Text
    • Phonological Analysis (sounds)
    • Morphological Analysis (word structure)
    • Syntactic Analysis (grammar/sentence structure)
    • Semantic Analysis (meaning)
    • Pragmatic Analysis (context)
    • Discourse Analysis (full conversation/paragraph)
    • Output / Understanding

    Each level feeds information to the next. This is called the pipeline approach in NLP.

    1.7 Representations and Understanding

    For a computer to "understand" language, it needs to represent language in a mathematical or logical form.

    Types of representations:

    1. String Representation:

    • Simple — text is just a sequence of characters.
    • Example: "dog" = ['d', 'o', 'g']

    2. Bag of Words (BoW):

    • Represents a document by counting how many times each word appears.
    • Ignores grammar and word order.
    • Example: "I love NLP. I love coding." → {I:2, love:2, NLP:1, coding:1}

    3. TF-IDF (Term Frequency – Inverse Document Frequency):

    • Measures how important a word is in a document.
    • Common words (like "the", "is") get low score. Rare, important words get high score.

    4. Word Embeddings (Word2Vec, GloVe):

    • Words are represented as vectors (lists of numbers) in a multi-dimensional space.
    • Similar words have similar vectors.
    • Example: Vector of "king" – "man" + "woman" ≈ Vector of "queen"

    5. Logical Form:

    • Language is converted to formal logic.
    • Example: "Every dog is an animal" → ∀x: dog(x) → animal(x)

    1.8 Organization of Natural Language Understanding Systems

    A complete NLU system has the following components working together:

    • Lexicon (Dictionary): Stores words and their meanings, grammar category, etc.
    • Grammar Rules: Defines how words combine to form valid sentences.
    • Parser: Analyzes sentence structure using grammar rules.
    • Semantic Interpreter: Extracts meaning from the parsed structure.
    • Discourse Module: Handles references and connections across sentences.
    • Pragmatic Module: Interprets meaning in context.
    • Knowledge Base: Stores world knowledge needed to understand sentences.

    1.9 Linguistic Background: An Outline of English Syntax

    Syntax is the set of rules that define how words are arranged to form grammatically correct sentences.

    Basic Building Blocks:

    Parts of Speech (POS):

    • Noun (N): person, place, thing — "dog", "city"
    • Verb (V): action — "run", "eat"
    • Adjective (Adj): describes noun — "big", "red"
    • Adverb (Adv): describes verb — "quickly", "slowly"
    • Preposition (P): "in", "on", "at"
    • Determiner (Det): "a", "an", "the"
    • Pronoun: "he", "she", "it"
    • Conjunction: "and", "but", "or"

    Phrases:

    • Noun Phrase (NP): Det + Adj + N → "the big dog"
    • Verb Phrase (VP): V + NP → "ate the food"
    • Prepositional Phrase (PP): P + NP → "on the table"

    Sentence Structure:

    • S → NP + VP
    • NP → Det + N
    • VP → V + NP

    Example:

    • "The cat sat on the mat."
    • S = [NP: The cat] + [VP: sat + [PP: on the mat]]

    This kind of analysis is used by parsers in NLP systems.

    No comments:

    Post a Comment