(SEM VIII) THEORY EXAMINATION 2018-19 DATACOMPRESSION

SECTION A (2 × 10 = 20 Marks)

Section A contains short-answer questions, but you should not write one-line answers. Even for 2 marks, you must write definition + explanation + application (where required).

Let us explain each concept clearly.

(a) Huffman Coding – Lossless or Lossy?

Huffman coding is a lossless compression technique. In lossless compression, the original data can be perfectly reconstructed from the compressed data without any loss of information.

Huffman coding assigns variable-length codes to symbols based on their probabilities. Symbols with higher probability get shorter codes, while less frequent symbols get longer codes. This minimizes the average code length and improves compression efficiency.

Applications of Huffman coding include:

ZIP file compression

JPEG image compression (lossless stage)

MP3 compression (entropy coding stage)

Text compression utilities

Since it preserves exact data, it is suitable for text, executable files, and medical data.

(b) Composite Source Model

A composite source model represents a source that consists of multiple sub-sources. Instead of assuming a single probability distribution, the source is modeled as a mixture of several distributions.

For example, in image compression, different regions of an image may have different statistical properties. A composite model handles such variations more efficiently by adapting to local characteristics.

It improves compression efficiency when source statistics change over time.

(c) Prefix Codes

A prefix code is a type of code in which no codeword is a prefix of another codeword. This ensures that the code is uniquely decodable.

For example, the set {0, 10, 110, 111} is a prefix code because none of the codewords begin with another complete codeword.

Huffman coding produces prefix codes. Prefix property ensures instantaneous decoding without ambiguity.

(d) JBIG Standard

JBIG stands for Joint Bi-level Image Experts Group. It is a standard for compressing bi-level images (images with only black and white pixels).

It uses arithmetic coding and context modeling to achieve high compression efficiency. JBIG is widely used in fax transmission and document imaging systems.

(e) Entropy

Entropy is a measure of average information content or uncertainty of a source. It is defined by Shannon as:

H = − ∑ p(x) log₂ p(x)

Entropy represents the theoretical lower bound on the average number of bits required to encode symbols.

Higher entropy means more randomness and less compressibility.

(f) Compression Ratio

Compression ratio is defined as:

Compression Ratio = Original Size / Compressed Size

It indicates how effectively data has been compressed. Higher ratio means better compression efficiency.

(g) Uniquely Decodable Code

Given code: {0, 10, 110, 111}

This is uniquely decodable because no codeword is a prefix of another. Therefore, it satisfies prefix condition and is uniquely decodable.

(h) Compression Technique in Unix “compress”

The Unix "compress" command uses LZW (Lempel-Ziv-Welch) algorithm, which is a dictionary-based lossless compression technique.

(i) Uniform Quantizer

A uniform quantizer divides the range of signal values into equal-sized intervals. Each interval is represented by a fixed quantization level.

It is simple but not optimal when signal distribution is non-uniform.

(j) Entropy Coded Quantization

Entropy coded quantization combines quantization with entropy coding. After quantization, output symbols are encoded using Huffman or arithmetic coding.

It improves overall compression efficiency.

SECTION B (10 × 3 = 30 Marks)

This section requires detailed conceptual explanation.

Vector Quantization vs Scalar Quantization

Scalar quantization quantizes one sample at a time. Vector quantization quantizes blocks of samples together.

Vector quantization exploits correlation between samples. It provides better compression performance but requires larger codebooks and higher computational complexity.

Example:

Instead of quantizing pixels individually, a 2×2 block can be quantized as a vector.

What is Data Compression?

Data compression reduces the number of bits required to represent data. It removes redundancy and irrelevance.

Why needed?

Reduce storage cost

Reduce transmission bandwidth

Improve transmission speed

Compression system has two blocks:

Encoder → removes redundancy
Decoder → reconstructs original or approximate data

Golomb Codes & Tunstall Codes

Golomb codes are used for encoding geometrically distributed data. They are efficient when probabilities decrease exponentially.

Tunstall coding is a variable-to-fixed length coding technique. It produces fixed-length output codes and variable-length input sequences.

Quantization Problem

Quantization converts continuous amplitude values into discrete levels.

Problem:

It introduces distortion.

Need to minimize Mean Square Error (MSE).

Example:

If input range is 0–10 and divided into 5 levels, each level represents a range. Any value inside that range is approximated.

Dictionary Based Coding Techniques

Dictionary-based coding stores repeated sequences in a dictionary.

Main types:

LZ77 (sliding window)

LZ78 (explicit dictionary)

LZW (improved LZ78)

They are widely used in ZIP, GIF, and PNG formats.

SECTION C (10 Marks Each – Analytical)

Lossless vs Lossy Compression

Lossless compression reconstructs exact data. Used for text and documents.

Lossy compression allows some loss of information. Used for images, audio, and video.

Lossless ensures zero distortion. Lossy provides higher compression ratios.

Entropy Calculation

Given:

P(a1) = 1/2
P(a2) = 1/4
P(a3) = 1/8
P(a4) = 1/8

Entropy:

H = − [ (1/2)log₂(1/2) + (1/4)log₂(1/4) + 2(1/8)log₂(1/8) ]

H = − [ (1/2)(−1) + (1/4)(−2) + 2(1/8)(−3) ]

H = 1/2 + 1/2 + 3/4 = 1.75 bits/symbol

Huffman Coding Problem

Steps:

Arrange probabilities in ascending order.

Merge lowest probabilities.

Construct tree.

Assign 0 and 1 to branches.

Then calculate average bits per symbol:

L = ∑ p(x) × code length

LZ77 vs LZ78

LZ77 uses sliding window and pointer-length pairs.

LZ78 builds explicit dictionary.

LZ77 uses past buffer. LZ78 grows dictionary dynamically.

PPM (Prediction by Partial Matching)

PPM predicts next symbol based on context. It uses adaptive probability estimation. Then arithmetic coding encodes output.

Distortion Criteria

Common distortion measures:

Mean Square Error (MSE)

Peak Signal to Noise Ratio (PSNR)

Absolute error

Used to evaluate lossy compression performance.

Scalar vs Vector Quantization

Scalar quantizes single sample.

Vector quantizes group of samples.

Vector gives better performance but more complexity.

LBG Algorithm (Linde-Buzo-Gray)

LBG is used to design optimal codebook in vector quantization.

Steps:

Initialize codebook.

Partition training vectors.

Update centroids.

Repeat until convergence.

(SEM VIII) THEORY EXAMINATION 2018-19 DATACOMPRESSION

(a) Huffman Coding – Lossless or Lossy?

(b) Composite Source Model

(c) Prefix Codes

(d) JBIG Standard

(e) Entropy

(f) Compression Ratio

(g) Uniquely Decodable Code

(h) Compression Technique in Unix “compress”

(i) Uniform Quantizer

(j) Entropy Coded Quantization

Vector Quantization vs Scalar Quantization

What is Data Compression?

Golomb Codes & Tunstall Codes

Quantization Problem

Dictionary Based Coding Techniques

Lossless vs Lossy Compression

Entropy Calculation

Huffman Coding Problem

LZ77 vs LZ78

PPM (Prediction by Partial Matching)

Distortion Criteria

Scalar vs Vector Quantization

LBG Algorithm (Linde-Buzo-Gray)

Download

Related Notes

BASIC ELECTRICAL ENGINEERING

ENGINEERING PHYSICS THEORY EXAMINATION 2024-25

(SEM I) ENGINEERING CHEMISTRY THEORY EXAMINATION...

THEORY EXAMINATION 2024-25 ENGINEERING MATHEMATICS...

(SEM I) THEORY EXAMINATION 2024-25 ENGINEERING CHE...

(SEM I) THEORY EXAMINATION 2024-25 ENVIRONMENT AND...

Need more notes?

Latest Blog Posts

Turn Your Teaching Skills into Opportunity: Register as a Tutor on SuGanta Today

The Real Cost of Home Tuition in India (2026): What Every Parent Must Know

Why the World Is Investing Billions in English Language Learning

Confused Between Vedic Maths and Abacus? Here’s What Parents Must Know

The Rising Demand for Online Bengali, Hindi & Marathi Medium Tutors in State Board Ed...

Safe and Verified Tutors: Why Background Checks Matter

Best Digital Boards and Tablets for Home Study

Finding Tutors for Children with Special Learning Needs