(SEM VIII) THEORY EXAMINATION 2018-19 DATACOMPRESSION

B.Tech Data Structure 0 downloads
₹29.00

SECTION A (2 × 10 = 20 Marks)

 

Section A contains short-answer questions, but you should not write one-line answers. Even for 2 marks, you must write definition + explanation + application (where required).

Let us explain each concept clearly.

 

(a) Huffman Coding – Lossless or Lossy?

 

Huffman coding is a lossless compression technique. In lossless compression, the original data can be perfectly reconstructed from the compressed data without any loss of information.

 

Huffman coding assigns variable-length codes to symbols based on their probabilities. Symbols with higher probability get shorter codes, while less frequent symbols get longer codes. This minimizes the average code length and improves compression efficiency.

 

Applications of Huffman coding include:

 

ZIP file compression

JPEG image compression (lossless stage)

MP3 compression (entropy coding stage)

Text compression utilities

Since it preserves exact data, it is suitable for text, executable files, and medical data.

 

(b) Composite Source Model

 

A composite source model represents a source that consists of multiple sub-sources. Instead of assuming a single probability distribution, the source is modeled as a mixture of several distributions.

 

For example, in image compression, different regions of an image may have different statistical properties. A composite model handles such variations more efficiently by adapting to local characteristics.

 

It improves compression efficiency when source statistics change over time.

 

(c) Prefix Codes

 

A prefix code is a type of code in which no codeword is a prefix of another codeword. This ensures that the code is uniquely decodable.

 

For example, the set {0, 10, 110, 111} is a prefix code because none of the codewords begin with another complete codeword.

 

Huffman coding produces prefix codes. Prefix property ensures instantaneous decoding without ambiguity.

 

(d) JBIG Standard

 

JBIG stands for Joint Bi-level Image Experts Group. It is a standard for compressing bi-level images (images with only black and white pixels).

 

It uses arithmetic coding and context modeling to achieve high compression efficiency. JBIG is widely used in fax transmission and document imaging systems.

 

(e) Entropy

 

Entropy is a measure of average information content or uncertainty of a source. It is defined by Shannon as:

H = − ∑ p(x) log₂ p(x)

 

Entropy represents the theoretical lower bound on the average number of bits required to encode symbols.

Higher entropy means more randomness and less compressibility.

 

(f) Compression Ratio

 

Compression ratio is defined as:

Compression Ratio = Original Size / Compressed Size

It indicates how effectively data has been compressed. Higher ratio means better compression efficiency.

 

(g) Uniquely Decodable Code

 

Given code: {0, 10, 110, 111}

 

This is uniquely decodable because no codeword is a prefix of another. Therefore, it satisfies prefix condition and is uniquely decodable.

 

(h) Compression Technique in Unix “compress”

 

The Unix "compress" command uses LZW (Lempel-Ziv-Welch) algorithm, which is a dictionary-based lossless compression technique.

 

(i) Uniform Quantizer

 

A uniform quantizer divides the range of signal values into equal-sized intervals. Each interval is represented by a fixed quantization level.

 

It is simple but not optimal when signal distribution is non-uniform.

 

(j) Entropy Coded Quantization

 

Entropy coded quantization combines quantization with entropy coding. After quantization, output symbols are encoded using Huffman or arithmetic coding.

 

It improves overall compression efficiency.

 

SECTION B (10 × 3 = 30 Marks)

 

This section requires detailed conceptual explanation.

Vector Quantization vs Scalar Quantization

 

Scalar quantization quantizes one sample at a time. Vector quantization quantizes blocks of samples together.

 

Vector quantization exploits correlation between samples. It provides better compression performance but requires larger codebooks and higher computational complexity.

 

Example:

 

Instead of quantizing pixels individually, a 2×2 block can be quantized as a vector.

 

What is Data Compression?
 

Data compression reduces the number of bits required to represent data. It removes redundancy and irrelevance.

 

Why needed?

 

Reduce storage cost

Reduce transmission bandwidth

Improve transmission speed

Compression system has two blocks:

Encoder → removes redundancy
Decoder → reconstructs original or approximate data

 

Golomb Codes & Tunstall Codes

Golomb codes are used for encoding geometrically distributed data. They are efficient when probabilities decrease exponentially.

 

Tunstall coding is a variable-to-fixed length coding technique. It produces fixed-length output codes and variable-length input sequences.

 

Quantization Problem

Quantization converts continuous amplitude values into discrete levels.

 

Problem:

It introduces distortion.

Need to minimize Mean Square Error (MSE).

 

Example:

If input range is 0–10 and divided into 5 levels, each level represents a range. Any value inside that range is approximated.

 

Dictionary Based Coding Techniques

Dictionary-based coding stores repeated sequences in a dictionary.

 

Main types:

LZ77 (sliding window)

LZ78 (explicit dictionary)

LZW (improved LZ78)

They are widely used in ZIP, GIF, and PNG formats.

 

SECTION C (10 Marks Each – Analytical)

Lossless vs Lossy Compression

Lossless compression reconstructs exact data. Used for text and documents.

Lossy compression allows some loss of information. Used for images, audio, and video.

Lossless ensures zero distortion. Lossy provides higher compression ratios.

Entropy Calculation

 

Given:

P(a1) = 1/2
P(a2) = 1/4
P(a3) = 1/8
P(a4) = 1/8

Entropy:

H = − [ (1/2)log₂(1/2) + (1/4)log₂(1/4) + 2(1/8)log₂(1/8) ]

H = − [ (1/2)(−1) + (1/4)(−2) + 2(1/8)(−3) ]

H = 1/2 + 1/2 + 3/4 = 1.75 bits/symbol

Huffman Coding Problem

 

Steps:

Arrange probabilities in ascending order.

Merge lowest probabilities.

Construct tree.

Assign 0 and 1 to branches.

Then calculate average bits per symbol:

L = ∑ p(x) × code length

LZ77 vs LZ78

LZ77 uses sliding window and pointer-length pairs.

 

LZ78 builds explicit dictionary.

LZ77 uses past buffer. LZ78 grows dictionary dynamically.

PPM (Prediction by Partial Matching)

PPM predicts next symbol based on context. It uses adaptive probability estimation. Then arithmetic coding encodes output.

Distortion Criteria

 

Common distortion measures:

Mean Square Error (MSE)

Peak Signal to Noise Ratio (PSNR)

 

Absolute error

Used to evaluate lossy compression performance.

Scalar vs Vector Quantization

Scalar quantizes single sample.

Vector quantizes group of samples.

Vector gives better performance but more complexity.

LBG Algorithm (Linde-Buzo-Gray)

LBG is used to design optimal codebook in vector quantization.

 

Steps:

Initialize codebook.

Partition training vectors.

Update centroids.

Repeat until convergence.

File Size
147.61 KB
Uploader
Payal Saini