(SEM VIII) THEORY EXAMINATION 2018-19 DATACOMPRESSION
SECTION A (2 × 10 = 20 Marks)
Section A contains short-answer questions, but you should not write one-line answers. Even for 2 marks, you must write definition + explanation + application (where required).
Let us explain each concept clearly.
(a) Huffman Coding – Lossless or Lossy?
Huffman coding is a lossless compression technique. In lossless compression, the original data can be perfectly reconstructed from the compressed data without any loss of information.
Huffman coding assigns variable-length codes to symbols based on their probabilities. Symbols with higher probability get shorter codes, while less frequent symbols get longer codes. This minimizes the average code length and improves compression efficiency.
Applications of Huffman coding include:
ZIP file compression
JPEG image compression (lossless stage)
MP3 compression (entropy coding stage)
Text compression utilities
Since it preserves exact data, it is suitable for text, executable files, and medical data.
(b) Composite Source Model
A composite source model represents a source that consists of multiple sub-sources. Instead of assuming a single probability distribution, the source is modeled as a mixture of several distributions.
For example, in image compression, different regions of an image may have different statistical properties. A composite model handles such variations more efficiently by adapting to local characteristics.
It improves compression efficiency when source statistics change over time.
(c) Prefix Codes
A prefix code is a type of code in which no codeword is a prefix of another codeword. This ensures that the code is uniquely decodable.
For example, the set {0, 10, 110, 111} is a prefix code because none of the codewords begin with another complete codeword.
Huffman coding produces prefix codes. Prefix property ensures instantaneous decoding without ambiguity.
(d) JBIG Standard
JBIG stands for Joint Bi-level Image Experts Group. It is a standard for compressing bi-level images (images with only black and white pixels).
It uses arithmetic coding and context modeling to achieve high compression efficiency. JBIG is widely used in fax transmission and document imaging systems.
(e) Entropy
Entropy is a measure of average information content or uncertainty of a source. It is defined by Shannon as:
H = − ∑ p(x) log₂ p(x)
Entropy represents the theoretical lower bound on the average number of bits required to encode symbols.
Higher entropy means more randomness and less compressibility.
(f) Compression Ratio
Compression ratio is defined as:
Compression Ratio = Original Size / Compressed Size
It indicates how effectively data has been compressed. Higher ratio means better compression efficiency.
(g) Uniquely Decodable Code
Given code: {0, 10, 110, 111}
This is uniquely decodable because no codeword is a prefix of another. Therefore, it satisfies prefix condition and is uniquely decodable.
(h) Compression Technique in Unix “compress”
The Unix "compress" command uses LZW (Lempel-Ziv-Welch) algorithm, which is a dictionary-based lossless compression technique.
(i) Uniform Quantizer
A uniform quantizer divides the range of signal values into equal-sized intervals. Each interval is represented by a fixed quantization level.
It is simple but not optimal when signal distribution is non-uniform.
(j) Entropy Coded Quantization
Entropy coded quantization combines quantization with entropy coding. After quantization, output symbols are encoded using Huffman or arithmetic coding.
It improves overall compression efficiency.
SECTION B (10 × 3 = 30 Marks)
This section requires detailed conceptual explanation.
Vector Quantization vs Scalar Quantization
Scalar quantization quantizes one sample at a time. Vector quantization quantizes blocks of samples together.
Vector quantization exploits correlation between samples. It provides better compression performance but requires larger codebooks and higher computational complexity.
Example:
Instead of quantizing pixels individually, a 2×2 block can be quantized as a vector.
What is Data Compression?
Data compression reduces the number of bits required to represent data. It removes redundancy and irrelevance.
Why needed?
Reduce storage cost
Reduce transmission bandwidth
Improve transmission speed
Compression system has two blocks:
Encoder → removes redundancy
Decoder → reconstructs original or approximate data
Golomb Codes & Tunstall Codes
Golomb codes are used for encoding geometrically distributed data. They are efficient when probabilities decrease exponentially.
Tunstall coding is a variable-to-fixed length coding technique. It produces fixed-length output codes and variable-length input sequences.
Quantization Problem
Quantization converts continuous amplitude values into discrete levels.
Problem:
It introduces distortion.
Need to minimize Mean Square Error (MSE).
Example:
If input range is 0–10 and divided into 5 levels, each level represents a range. Any value inside that range is approximated.
Dictionary Based Coding Techniques
Dictionary-based coding stores repeated sequences in a dictionary.
Main types:
LZ77 (sliding window)
LZ78 (explicit dictionary)
LZW (improved LZ78)
They are widely used in ZIP, GIF, and PNG formats.
SECTION C (10 Marks Each – Analytical)
Lossless vs Lossy Compression
Lossless compression reconstructs exact data. Used for text and documents.
Lossy compression allows some loss of information. Used for images, audio, and video.
Lossless ensures zero distortion. Lossy provides higher compression ratios.
Entropy Calculation
Given:
P(a1) = 1/2
P(a2) = 1/4
P(a3) = 1/8
P(a4) = 1/8
Entropy:
H = − [ (1/2)log₂(1/2) + (1/4)log₂(1/4) + 2(1/8)log₂(1/8) ]
H = − [ (1/2)(−1) + (1/4)(−2) + 2(1/8)(−3) ]
H = 1/2 + 1/2 + 3/4 = 1.75 bits/symbol
Huffman Coding Problem
Steps:
Arrange probabilities in ascending order.
Merge lowest probabilities.
Construct tree.
Assign 0 and 1 to branches.
Then calculate average bits per symbol:
L = ∑ p(x) × code length
LZ77 vs LZ78
LZ77 uses sliding window and pointer-length pairs.
LZ78 builds explicit dictionary.
LZ77 uses past buffer. LZ78 grows dictionary dynamically.
PPM (Prediction by Partial Matching)
PPM predicts next symbol based on context. It uses adaptive probability estimation. Then arithmetic coding encodes output.
Distortion Criteria
Common distortion measures:
Mean Square Error (MSE)
Peak Signal to Noise Ratio (PSNR)
Absolute error
Used to evaluate lossy compression performance.
Scalar vs Vector Quantization
Scalar quantizes single sample.
Vector quantizes group of samples.
Vector gives better performance but more complexity.
LBG Algorithm (Linde-Buzo-Gray)
LBG is used to design optimal codebook in vector quantization.
Steps:
Initialize codebook.
Partition training vectors.
Update centroids.
Repeat until convergence.
Related Notes
BASIC ELECTRICAL ENGINEERING
ENGINEERING PHYSICS THEORY EXAMINATION 2024-25
(SEM I) ENGINEERING CHEMISTRY THEORY EXAMINATION...
THEORY EXAMINATION 2024-25 ENGINEERING MATHEMATICS...
(SEM I) THEORY EXAMINATION 2024-25 ENGINEERING CHE...
(SEM I) THEORY EXAMINATION 2024-25 ENVIRONMENT AND...
Need more notes?
Return to the notes store to keep exploring curated study material.
Back to Notes StoreLatest Blog Posts
Turn Your Teaching Skills into Opportunity: Register as a Tutor on SuGanta Today
The Real Cost of Home Tuition in India (2026): What Every Parent Must Know
Why the World Is Investing Billions in English Language Learning
Confused Between Vedic Maths and Abacus? Here’s What Parents Must Know
The Rising Demand for Online Bengali, Hindi & Marathi Medium Tutors in State Board Ed...
Safe and Verified Tutors: Why Background Checks Matter
Best Digital Boards and Tablets for Home Study