(SEM VII) THEORY EXAMINATION 2022-23 NATURAL LANGUAGE PROCESSING
SECTION A – Short Answers (2 Marks Each)
(a) Language Modelling
Language modelling is the process of assigning probabilities to sequences of words. It helps predict the next word in a sentence using statistical or neural methods.
(b) Issues and problems in NLP
Major issues include ambiguity, lack of context understanding, polysemy, synonymy, large vocabulary size, data sparsity, and computational complexity.
(c) Treebank Corpus
A Treebank Corpus is a linguistically annotated text corpus that includes syntactic or semantic sentence structures in the form of parse trees.
(d) Syntax vs Semantics
Syntax deals with sentence structure and grammatical rules, while semantics deals with meaning and interpretation of words and sentences.
(e) Context Free Grammars (CFG)
CFGs define grammatical rules where a single non-terminal symbol is replaced by a sequence of terminals and/or non-terminals.
(f) Spelling correction
Spelling correction works by detecting misspelled words and suggesting correct words using edit distance, probability models, or dictionary matching.
(g) Frequency and Amplitude
Frequency represents the number of oscillations per second (Hz), while amplitude represents the strength or loudness of a signal.
(h) Transcription
Transcription is the process of converting spoken language into written text using phonetic or orthographic symbols.
(i) Log-Spectral Distance
Log-Spectral Distance measures the difference between two speech spectra in the logarithmic domain and is used in speech quality assessment.
(j) TF and IDF
TF (Term Frequency) measures how often a term appears in a document, while IDF (Inverse Document Frequency) measures the importance of a term across documents.
SECTION B – Long Answers (10 Marks Each)
(a) Bigram language model with Laplace smoothing
Given corpus:
<s> I am Sam </s>
<s> Sam I am </s>
<s> I am Sam </s>
<s> I do not like green eggs and Sam </s>
Count occurrences:
Bigram (am, Sam) = 2
Total occurrences of “am” = 3
Vocabulary size V = 11
Using Laplace smoothing:
P(Sam∣am)=2+13+11=314P(Sam|am) = \frac{2 + 1}{3 + 11} = \frac{3}{14}P(Sam∣am)=3+112+1=143
(b) Context Free Grammar and parse tree
CFG consists of terminals, non-terminals, start symbol, and production rules.
Sentence:
“I need to fly between New Delhi and Mumbai”
A parse tree is constructed by expanding grammar rules starting from sentence symbol S, breaking into noun phrase (NP) and verb phrase (VP).
(c) Nearest neighbour algorithm using contextual embeddings
Nearest neighbour algorithm finds words with similar embeddings based on distance metrics like cosine similarity.
Example: Using BERT embeddings, “bank” in “river bank” is closer to “shore” than “money”.
(d) Short notes
(i) Short-Time Fourier Transform (STFT):
STFT converts time-domain signals into time-frequency representation using sliding windows.
(ii) Linear Predictive Coding (LPC):
LPC represents speech signals by estimating current samples from past samples, widely used in speech compression.
(e) Viterbi Search Algorithm
Viterbi algorithm finds the most probable state sequence in Hidden Markov Models.
It uses dynamic programming to compute best paths step-by-step and is widely used in speech recognition and POS tagging.
SECTION C – Long Answers (10 Marks Each)
3(a) Parsing finite-state transducer algorithm
Algorithm steps:
Initialize start state
Read input symbols
Transition between states based on input
Output corresponding symbols
Accept string if final state is reached
Used in morphological analysis and lexical processing.
3(b) Bottom-up vs Top-down parsing
Top-down parsing starts from start symbol and expands rules downward (e.g., Recursive Descent).
Bottom-up parsing starts from input symbols and builds parse tree upward (e.g., Shift-Reduce parsing).
4(a) Tree structure and CFG
Sentence:
“Rachael Ray finds inspiration in cooking her family and her dog”
CFG defines noun phrases, verb phrases, and prepositional phrases.
Tree structure shows hierarchical syntactic relations among words.
4(b) CKY Algorithm
CKY is a dynamic programming algorithm used for parsing sentences using CFG in Chomsky Normal Form.
It fills a triangular table to determine valid parses.
5(a) Word Sense Disambiguation & Distributional Semantics
WSD identifies correct meaning of a word based on context.
Distributional semantics represents word meaning based on surrounding words.
Path length problem arises when semantic distance between concepts is difficult to determine accurately.
5(b) Information Content-Based Similarity Measures
These measures compute similarity using shared information between concepts.
Issues include dependency on corpus size, domain specificity, and sparse data.
6(a) Articulatory Phonetics
Articulatory phonetics studies how speech sounds are produced using vocal organs such as lips, tongue, and vocal cords.
Sounds are classified as vowels, consonants, plosives, fricatives, etc.
6(b) Regular expressions
(i) All alphabetic strings:
[A-Za-z]+
(ii) Lower-case strings ending in b:
[a-z]*b
(iii) a surrounded by b:
(b(ab))*
Used in speech and text processing for pattern matching.
7(a) Feature extraction and pattern comparison
Features include MFCC, LPC, spectral features.
Pattern comparison uses Euclidean distance, DTW, or likelihood methods.
Spectral distortion measures mismatch between spectra.
7(b) Hidden Markov Model with Baum-Welch
HMM uses hidden states and observable outputs.
Baum-Welch algorithm re-estimates model parameters using expectation-maximization.
Issues include convergence time and data requirements.
Related Notes
BASIC ELECTRICAL ENGINEERING
ENGINEERING PHYSICS THEORY EXAMINATION 2024-25
(SEM I) ENGINEERING CHEMISTRY THEORY EXAMINATION...
THEORY EXAMINATION 2024-25 ENGINEERING MATHEMATICS...
(SEM I) THEORY EXAMINATION 2024-25 ENGINEERING CHE...
(SEM I) THEORY EXAMINATION 2024-25 ENVIRONMENT AND...
Need more notes?
Return to the notes store to keep exploring curated study material.
Back to Notes StoreLatest Blog Posts
Best Home Tutors for Class 12 Science in Dwarka, Delhi
Top Universities in Chennai for Postgraduate Courses with Complete Guide
Best Home Tuition for Competitive Exams in Dwarka, Delhi
Best Online Tutors for Maths in Noida 2026
Best Coaching Centers for UPSC in Rajender Place, Delhi 2026
How to Apply for NEET in Gurugram, Haryana for 2026
Admission Process for BTech at NIT Warangal 2026
Best Home Tutors for JEE in Maharashtra 2026
Meet Our Exceptional Teachers
Discover passionate educators who inspire, motivate, and transform learning experiences with their expertise and dedication
Explore Tutors In Your Location
Discover expert tutors in popular areas across India
Discover Elite Educational Institutes
Connect with top-tier educational institutions offering world-class learning experiences, expert faculty, and innovative teaching methodologies