Unit 2 | Natural Language Processing Notes | AKTU Notes



    UNIT II: Semantics and Knowledge Representation

    2.1 What is Semantics?

    Semantics is the study of meaning — what words, phrases, and sentences actually mean.

    While syntax deals with structure ("Is the sentence grammatically correct?"), semantics deals with meaning ("Does the sentence make sense? What does it mean?").

    Example:

    • "Colorless green ideas sleep furiously." — Syntactically correct but semantically meaningless.
    • "The dog bit the man." — Both syntactically and semantically correct.

    2.2 Levels of Meaning

    1. Lexical Semantics:

    • Meaning of individual words.
    • Deals with synonyms, antonyms, homonyms, polysemy.
    • Synonyms: big = large
    • Antonyms: hot ↔ cold
    • Homonyms: "bank" (river bank / money bank) — same word, different meanings
    • Polysemy: "head" — head of a person, head of an organization (one word, related meanings)

    2. Compositional Semantics:

    • Meaning of a sentence is built from meanings of its parts.
    • Example: "The red ball" = meaning of "red" + meaning of "ball"

    3. Pragmatic Meaning:

    • Meaning depends on context and speaker's intention.

    2.3 Knowledge Representation

    For a computer to understand language, it needs to store and use knowledge about the world. This is called Knowledge Representation.

    What kind of knowledge is needed?

    • Facts: "Paris is the capital of France."
    • Relationships: "A dog is a type of animal."
    • Rules: "If it rains, the ground gets wet."

    2.4 Methods of Knowledge Representation

    1. Semantic Networks:

    • A graph where nodes represent concepts and edges represent relationships.
    • Easy to visualize.
    • Example: [Dog] --is-a--> [Animal], [Dog] --has--> [Tail], [Animal] --can--> [Breathe]

    2. Frames:

    • A frame is like a template (similar to a class in programming).
    • It stores a concept along with its properties (called slots).
    • Example Frame for "Car": Color: Red, Brand: Toyota, Speed: 120 km/h, Fuel: Petrol

    3. First-Order Logic (FOL) / Predicate Logic:

    • Uses formal logic to represent knowledge.
    • Very precise and can be used for logical reasoning.
    • "All humans are mortal" → ∀x: human(x) → mortal(x)
    • "Socrates is a human" → human(Socrates)
    • Therefore: "Socrates is mortal" → mortal(Socrates) ✓

    4. Ontologies:

    • A formal and structured representation of knowledge in a domain.
    • Defines concepts, their properties, and relationships.
    • Example: WordNet — a large database of English words organized by meaning and relationships.

    5. Production Rules:

    • IF-THEN rules to represent knowledge.
    • Example: IF temperature > 100°C THEN water boils.

    2.5 Application: Machine Translation

    Machine Translation (MT) is the automatic translation of text from one language to another using a computer.

    Types of Machine Translation:

    1. Rule-Based MT:

    • Uses a large set of grammar rules and bilingual dictionaries.
    • Translate word by word or phrase by phrase using rules.
    • Problem: Language has too many exceptions and ambiguities.

    2. Statistical MT (SMT):

    • Learns translation patterns from large amounts of translated texts (parallel corpora).
    • Picks the most statistically likely translation.
    • Example: Google Translate (older version)

    3. Neural MT (NMT):

    • Uses deep learning (neural networks) for translation.
    • Much better quality, handles context well.
    • Example: Google Translate (current version), DeepL

    Challenges in MT:

    • Ambiguity: "I saw her duck" — did she dodge, or did I see her pet duck?
    • Idioms: "It's raining cats and dogs" should not be translated literally.
    • Grammar differences: Word order varies by language.

    2.6 Application: Database Interface (Natural Language Interface)

    A Natural Language Interface to Database (NLIDB) allows users to query a database using plain English instead of SQL.

    Example:

    • User types: "Show me all employees who earn more than 50,000 rupees."
    • System converts to SQL: SELECT * FROM employees WHERE salary > 50000;
    • Result is returned to user in readable form.

    Steps involved:

    • Parse the English question.
    • Map words to database fields (semantic mapping).
    • Generate equivalent database query (SQL).
    • Execute query and return results.

    Advantages:

    • Non-technical users can query databases easily.
    • No need to learn SQL.

    Challenges:

    • Handling complex questions.
    • Resolving ambiguous terms.
    • Mapping domain-specific words to database fields.

    No comments:

    Post a Comment