(SEM VII) THEORY EXAMINATION 2022-23 SPEECH PROCESSING

B.Tech Engineering 0 downloads

₹29.00

SECTION A (2 Marks Each)

(a) Speech Signal

A speech signal is a time-varying acoustic signal produced by the human vocal system, used for communication.

(b) Lossless Tube Model of Speech Signal

In the lossless tube model, the vocal tract is modeled as a series of lossless acoustic tubes that shape the speech sound without energy loss.

(c) Speech Spectrogram

A speech spectrogram is a time-frequency representation showing how speech energy varies with time and frequency.

(d) Short-Time Average Zero Crossing Rate

It is the average number of times the speech signal crosses the zero amplitude axis in a short time interval and is used to distinguish voiced and unvoiced sounds.

(e) Pitch Detection

Pitch detection is the process of estimating the fundamental frequency of a voiced speech signal.

(f) Correlation Function

Correlation measures similarity between signals. For speech, autocorrelation compares a signal with its delayed version to find periodicity.

(g) Filter

A filter is a system that selectively allows or suppresses certain frequency components of a signal.

(h) Principle of Linear Predictive Coding (LPC)

LPC predicts the current speech sample as a linear combination of past samples, modeling the vocal tract efficiently.

(i) Complex Cepstrum of Speech

The complex cepstrum is obtained by taking the inverse Fourier transform of the logarithm of the complex spectrum, useful in deconvolution.

(j) Convolution vs Deconvolution of Speech

Convolution combines excitation and vocal tract response, while deconvolution separates excitation from the vocal tract effect.

SECTION B (10 Marks Each – Any Three)

(a) Mechanics of Speech Production and Acoustic Phonics

Speech production involves air from lungs, vibration of vocal cords, and shaping by the vocal tract. Acoustic phonetics studies speech sounds based on frequency, amplitude, and duration. Voiced sounds result from vocal cord vibration, while unvoiced sounds are produced by airflow turbulence.

(b) Short-Time Energy and Average Magnitude

Short-time energy measures signal strength over short intervals using windowing. Average magnitude computes the mean absolute value of the signal. Both help in speech detection and segmentation.

(c) Short-Time Fourier Analysis

STFT analyzes speech in short segments assuming stationarity. Properties include time-frequency trade-off, linearity, and ability to represent non-stationary signals.

(d) Homomorphic System of Convolution

In homomorphic processing, convolution in time domain is converted into addition using logarithm and cepstrum, simplifying separation of speech components.

(e) Frequency Domain Interpretation of Prediction Error

Mean squared prediction error reflects mismatch between actual and predicted speech. It is related to LPC parameters, spectral envelope, pitch, and gain.

SECTION C (10 Marks Each)

Q3

(a) Digital Models for Speech Signals

Digital speech models include source-filter model, LPC model, and tube models. These represent speech using excitation and vocal tract characteristics for analysis and synthesis.

(b) Need for Speech Processing

Speech processing is required for speech recognition, voice assistants, speaker identification, hearing aids, and communication systems.

Q4

(a) Pitch Period Estimation Using Parallel Processing

Parallel processing estimates pitch using time-domain, frequency-domain, and cepstral methods simultaneously to improve accuracy.

(b) Speech vs Silence Discrimination

Factors include energy level, zero crossing rate, and spectral features. Silence has low energy and random zero crossings.

Q5

(a) Filter Bank Summation Method

Speech synthesis is done by passing excitation through multiple band-pass filters and summing outputs to reconstruct speech.

(b) Vocoder and Channel Vocoder

A vocoder analyzes speech parameters and transmits them efficiently. Channel vocoder divides speech into frequency bands and encodes envelope and excitation.