(SEM VI) THEORY EXAMINATION 2017-18 DATA-WAREHOUSING & DATA MINING
Data Warehousing & Data Mining (NCS-066)
Complete Section-Wise Explanation – B.Tech Semester VI
Introduction to the Subject
Data Warehousing and Data Mining focuses on how large volumes of data are stored, organized, processed, and analyzed to extract meaningful knowledge. While a data warehouse is used for systematic storage and analysis of historical data, data mining applies intelligent techniques to discover patterns, trends, and relationships hidden inside the data.
This subject is extremely important for areas like:
Business intelligence Decision support systems
Market basket analysis Customer behavior analysis
Machine learning foundations
The paper is divided into three sections: A, B, and C, each testing a different depth of understanding.
SECTION A – Basic Concepts & Definitions
Pattern:
Attempt all questions
10 questions × 2 marks = 20 marks
Nature of Section A
Section A checks your basic conceptual clarity. Answers must be short but meaningful, focusing on correct definitions and differences. Writing unnecessary explanations here may waste time.
Explanation of Section A Topics
Key Steps of Data Mining
The data mining process begins with data cleaning, followed by data integration, data selection, data transformation, data mining, pattern evaluation, and finally knowledge presentation. These steps ensure that raw data is converted into useful information.
Support and Confidence
Support measures how frequently an itemset appears in the database, while confidence measures how often the rule has been found to be true. These are essential parameters in association rule mining.
Attribute Selection Measures and Drawback of Information Gain
Attribute selection measures decide the best attribute for splitting data in decision trees. Information gain favors attributes with many distinct values, which can lead to biased results.
Classification vs Clustering
Classification is a supervised learning technique where class labels are known, while clustering is an unsupervised technique where data is grouped based on similarity without predefined labels.
Apriori Algorithm Statement
The Apriori principle states that all non-empty subsets of a frequent itemset must also be frequent. It is used to reduce search space in association rule mining.
Drawbacks of K-Means Algorithm
K-means requires the number of clusters in advance, is sensitive to initial centroids, and does not perform well with outliers or non-spherical clusters.
Chi-Square Test
The chi-square test measures the statistical significance of the difference between observed and expected frequencies, commonly used in attribute selection.
Roll-Up vs Drill-Down
Roll-up summarizes data to a higher level of hierarchy, while drill-down provides detailed data at a lower level.
Hierarchical Clustering Methods
Hierarchical clustering methods include agglomerative (bottom-up) and divisive (top-down) approaches.
Features of Genetic Algorithm
Genetic algorithms use selection, crossover, mutation, and fitness functions to find optimal solutions inspired by natural evolution.
SECTION B – Descriptive Theory & Numericals
Pattern:
Attempt any three questions
3 × 10 marks = 30 marks
Nature of Section B
This section requires descriptive answers written in paragraphs. You must explain concepts properly and, where required, solve numericals step by step.
Explanation of Major Questions
Data Mining / Knowledge Extraction Process
The knowledge extraction process involves transforming raw data into valuable knowledge through stages such as data cleaning, integration, transformation, mining, and evaluation. Each stage plays a crucial role in ensuring accuracy and relevance of extracted patterns.
OLAP vs OLTP
OLTP systems are used for day-to-day transaction processing and handle large numbers of short online transactions.
OLAP systems are designed for analysis, supporting complex queries and decision-making using historical data.
Apriori Algorithm Numerical
This question tests practical application of association rule mining. Using the given transaction database and minimum support and confidence, frequent itemsets are generated step by step, followed by rule generation. Proper calculation of support and confidence is essential.
Database Schemas
Database schemas define how data is organized in a warehouse. Common schemas include star schema, snowflake schema, and fact constellation schema. Each differs in complexity, redundancy, and query performance.
Data Backup and Recovery in Data Warehouse
Data backup ensures data safety, while recovery restores data in case of failure. Techniques include full backup, incremental backup, and disaster recovery planning.
SECTION C – Advanced Analysis & Algorithms
Pattern:
Attempt any one part from each question
5 × 10 marks = 50 marks
This section carries the highest weightage and determines overall performance.
Question 3
Three-Tier Data Warehouse Architecture & ETL
The three-tier architecture consists of bottom tier (data warehouse server), middle tier (OLAP server), and top tier (front-end tools).
ETL (Extract, Transform, Load) extracts data from sources, transforms it into a suitable format, and loads it into the warehouse.
Data Cleaning Strategies
Data cleaning removes noise, handles missing values, corrects inconsistencies, and improves data quality for better mining results.
Question 4
Clustering Methods & STING
Clustering methods include partitioning, hierarchical, density-based, and grid-based methods.
STING is a grid-based clustering method that divides data space into cells and stores statistical information for efficient clustering.
Applications of Data Warehousing, Web Mining & Spatial Mining
Data warehousing is used in banking, retail, healthcare, and telecom.
Web mining extracts patterns from web data, while spatial mining analyzes geographical and spatial data.
Question 5
Data Warehouse Definition & Design Strategies
A data warehouse is a subject-oriented, integrated, time-variant, and non-volatile collection of data.
Design strategies include scalability, data quality, security, and performance optimization.
Short Notes
Concept hierarchy organizes data at different abstraction levels.
ROLAP uses relational databases, while MOLAP uses multidimensional databases.
Gain ratio improves information gain by reducing bias.
Classification assigns labels, while clustering groups data based on similarity.
Question 6
Decision Tree Classifier & Information Gain
This question involves constructing a decision tree using given gains and deriving decision rules. Proper understanding of entropy and gain is required.
Laplacian Correction & Bayesian Classification
Laplacian correction avoids zero probability issues in Bayesian classifiers. The given tuple is classified by calculating posterior probabilities.
Question 7
K-Means Algorithm Numerical
This problem tests clustering concepts using Euclidean distance. Starting with given initial centroids, new cluster centers are calculated after the first iteration.
Hierarchical Clustering & BIRCH
Hierarchical clustering builds nested clusters.
BIRCH is a scalable hierarchical clustering method using CF trees for large datasets.
Related Notes
BASIC ELECTRICAL ENGINEERING
ENGINEERING PHYSICS THEORY EXAMINATION 2024-25
(SEM I) ENGINEERING CHEMISTRY THEORY EXAMINATION...
THEORY EXAMINATION 2024-25 ENGINEERING MATHEMATICS...
(SEM I) THEORY EXAMINATION 2024-25 ENGINEERING CHE...
(SEM I) THEORY EXAMINATION 2024-25 ENVIRONMENT AND...
Need more notes?
Return to the notes store to keep exploring curated study material.
Back to Notes StoreLatest Blog Posts
Best Home Tutors for Class 12 Science in Dwarka, Delhi
Top Universities in Chennai for Postgraduate Courses with Complete Guide
Best Home Tuition for Competitive Exams in Dwarka, Delhi
Best Online Tutors for Maths in Noida 2026
Best Coaching Centers for UPSC in Rajender Place, Delhi 2026
How to Apply for NEET in Gurugram, Haryana for 2026
Admission Process for BTech at NIT Warangal 2026
Best Home Tutors for JEE in Maharashtra 2026
Meet Our Exceptional Teachers
Discover passionate educators who inspire, motivate, and transform learning experiences with their expertise and dedication
Explore Tutors In Your Location
Discover expert tutors in popular areas across India
Discover Elite Educational Institutes
Connect with top-tier educational institutions offering world-class learning experiences, expert faculty, and innovative teaching methodologies