(SEM VIII) THEORY EXAMINATION 2017-18 PATTERN RECOGNITION

B.Tech Engineering 0 downloads

₹29.00

SECTION A (Brief Explanations)

Law of Total Probability
The law of total probability states that if an event can occur due to several mutually exclusive and exhaustive events, then its total probability is the sum of conditional probabilities with respect to those events. It is written as
P(A) = Σ P(A|Bi)P(Bi).
It is widely used in Bayesian classification.

Dimension Reduction
Dimension reduction means reducing the number of features in a dataset while preserving important information. It helps remove redundancy, reduce computation, and improve classifier performance. Techniques like PCA are commonly used.

Supervised vs Unsupervised Learning
Supervised learning uses labeled data to train a classifier, while unsupervised learning works with unlabeled data to find hidden patterns or clusters. Classification is supervised, whereas clustering is unsupervised.

Performance Evaluation of Classifier
Classifier performance is evaluated using accuracy, precision, recall, F-measure, confusion matrix, and error rate. Cross-validation is often used to estimate performance reliably.

Hidden Markov Model (HMM)
HMM is a probabilistic model used for sequential data where the system states are hidden. It is widely used in speech recognition and time-series modeling.

Discriminant Function
A discriminant function is a mathematical function used to separate classes. It assigns a score to each class, and the sample is classified based on the highest score.

Gaussian Mixture Model (GMM)
GMM represents data as a mixture of multiple Gaussian distributions. It is used for density estimation and clustering and is often trained using the EM algorithm.

Cluster Validation
Cluster validation measures how well clustering results represent data structure. It checks compactness within clusters and separation between clusters.

K-means Algorithm
K-means clustering starts by selecting K initial centroids. Each data point is assigned to the nearest centroid, and centroids are recalculated. This process repeats until cluster assignments stop changing.

Clustering vs Classification
Clustering groups data without prior labels, while classification assigns data to predefined categories using labeled training data.

SECTION B (Medium Length, Clear Explanation)

Learning and Adaptation
Learning refers to improving system performance using training data. Adaptation means updating model parameters when new data arrives. A learning system consists of feature extraction, learning algorithm, classifier, and evaluation unit. The system continuously adjusts to reduce classification error.

Chi-Square Test in Pattern Recognition
The Chi-Square test checks whether a feature is independent of class labels. It compares observed and expected frequencies. In pattern recognition, it helps in feature selection by identifying significant features that contribute to discrimination.

Expectation Maximization (EM)
EM is an iterative algorithm used when data contains hidden variables. In the expectation step, expected values of hidden variables are calculated. In the maximization step, parameters are updated to maximize likelihood. The process repeats until convergence. It is commonly used in GMM.

K-Nearest Neighbor (KNN)
KNN classifies a sample based on the majority class among its K nearest neighbors. Distance measures like Euclidean distance are used. KNN estimation estimates probability density, while KNN rule assigns class labels.

Naïve Bayes Classifier
Naïve Bayes applies Bayes’ theorem with an assumption of feature independence. It computes posterior probabilities for each class and selects the class with maximum probability. It is efficient and suitable for text classification.

SECTION C (10-Mark Style Answers)

Feature Selection for Two-Class Problem (Pen Drive vs Laptop)
To distinguish between pen drive and laptop, useful features may include size, weight, storage capacity, presence of keyboard, power consumption, and shape. For example, laptops are larger, heavier, and include keyboard and screen, while pen drives are small storage devices. Proper feature selection improves classification accuracy.

Bayesian Decision Theory (Two-Class Case)
Bayesian Decision Theory classifies data based on posterior probabilities. For two classes, posterior probabilities are calculated using Bayes’ theorem. The decision rule assigns the sample to the class with higher posterior probability. This approach minimizes classification error when probability distributions are known.

Statistical vs Syntactic Pattern Recognition
Statistical pattern recognition uses numeric feature vectors and probabilistic decision rules. Syntactic pattern recognition uses structural relationships and grammar rules to represent patterns. Statistical methods are widely used for numerical data, while syntactic methods are useful when pattern structure is important.

Maximum Likelihood vs Bayesian Estimation
Maximum Likelihood Estimation finds parameters that maximize likelihood of observed data. Bayesian estimation incorporates prior knowledge and updates parameters using posterior probability. Bayesian methods are more flexible in uncertain conditions.

Fuzzy Decision Making
Fuzzy decision making allows partial membership of data in multiple classes. Instead of strict classification, membership values between 0 and 1 are assigned. It is useful in situations with uncertainty or overlapping classes.

Clustering Techniques and Agglomerative Method
Clustering techniques include K-means, hierarchical clustering, and density-based clustering. Agglomerative clustering is a bottom-up approach where each data point starts as a single cluster and clusters merge step by step based on similarity until one cluster remains.

File Size

98.86 KB

Uploader

SuGanta International