(SEM VIII) THEORY EXAMINATION 2023-24 DATA WAREHOUSING & DATA MINING
SECTION A
(Attempt all | 2 × 10 = 20 Marks)
a. Define Data Warehousing
A data warehouse is a subject-oriented, integrated, time-variant, and non-volatile collection of data used to support decision making.
b. Discuss Fact Constellation
Fact constellation is a schema with multiple fact tables sharing common dimension tables. It is also called a galaxy schema.
c. Explain Distributed DBMS implementation
Distributed DBMS stores data across multiple locations connected by a network, allowing data sharing, reliability, and parallel processing.
d. Define Warehousing Software
Warehousing software is used to extract, transform, load (ETL) data and manage storage, querying, and analysis of warehouse data.
e. Are all patterns interesting?
No. Only patterns that are useful, valid, novel, and understandable are considered interesting.
f. Binary symmetric vs asymmetric attributes
Symmetric attributes treat both values equally (e.g., gender).
Asymmetric attributes treat one value as more important (e.g., disease presence).
g. Mode of dataset & advantage
Dataset: 12, 13, 34, 32, 21, 29, 40, 11, 39, 23
All values occur once → No mode.
Advantage: Mode is not affected by extreme values.
h. Manhattan distance
Objects: (22, 2, 45, 10) and (20, 10, 26, 2)
∣22−20∣+∣2−10∣+∣45−26∣+∣10−2∣=2+8+19+8=37|22−20|+|2−10|+|45−26|+|10−2| = 2+8+19+8 = \boxed{37}∣22−20∣+∣2−10∣+∣45−26∣+∣10−2∣=2+8+19+8=37
i. Temporal Mining
Temporal mining discovers patterns from time-related data, such as trends, sequences, and periodic patterns.
j. Data Visualization
Data visualization represents data using graphs, charts, plots, and dashboards to identify patterns and insights.
SECTION B
(Attempt any THREE | 10 × 3 = 30 Marks)
2(a) Knowledge Discovery Process & Snowflake Schema
Steps of Knowledge Discovery in Data (KDD):
Data cleaning Data integration
Data selection Data transformation
Data mining Pattern evaluation
Knowledge presentation
Snowflake Schema:
It is an extension of star schema where dimension tables are normalized into multiple related tables.
Advantages: Reduced redundancy
Disadvantages: Complex queries and joins
2(b) Market Basket Analysis
Market Basket Analysis identifies relationships between items purchased together using association rules.
Example:
If customers buy bread, they also buy butter.
It uses support, confidence, and lift measures and is widely used in retail and e-commerce.
2(c) Box-and-Whisker Plot
Dataset is sorted, and quartiles (Q1, Q2, Q3) are calculated.
The box represents interquartile range, median is shown inside the box, and whiskers show minimum and maximum values.
It helps detect spread, skewness, and outliers.
2(d) K-Means Clustering (2 clusters)
Points: (2,4), (6,8), (1,2), (4,5), (3,5)
After iterations using Euclidean distance:
Cluster 1: (1,2), (2,4) Cluster 2: (3,5), (4,5), (6,8)
K-Means minimizes intra-cluster distance.
2(e) ROLAP, MOLAP & HOLAP
ROLAP: Uses relational databases, scalable, slower queries
MOLAP: Uses multidimensional cubes, fast queries, less scalable
HOLAP: Combines both ROLAP and MOLAP advantages
SECTION C
3(a) Mapping 2D table to multidimensional model
A 2D sales table (Product, Time, Sales) is mapped into a cube with dimensions Product, Time, Location and measure Sales.
This enables slice, dice, drill-down, and roll-up operations.
3(b) Data Characterization & Discrimination
Data Characterization: Summarizes general features of a class.
Data Discrimination: Compares features of two or more classes.
Used for descriptive data mining.
4(a) Min-Max vs Z-Score Normalization
Min-Max:
v′=v−minmax−minv' = \frac{v−min}{max−min}v′=max−minv−min
Z-Score:
v′=v−meanstdv' = \frac{v−mean}{std}v′=stdv−mean
Binary data uses 0/1 values, while nominal data represents categories like colors or names.
4(b) Data Mining Architecture
Components include: Data sources
Data warehouse Database server
Data mining engine Pattern evaluation module
User interface
5(a) Decision Tree-Based Classifiers
Decision trees classify data using if-then rules.
They use entropy and information gain to split nodes.
Advantages: Easy to understand, fast classification
Disadvantages: Overfitting
5(b) Bayesian Classification (Result)
Given tuple X = (youth, medium, yes, fair), Using Bayes theorem, the tuple is classified as:
buys_computer = YES
6(a) Types of Clustering Methods
Partitioning (K-Means) Hierarchical
Density-based (DBSCAN) Grid-based
Model-based
Partitioning clustering divides data into K clusters minimizing distance.
6(b) DBSCAN Algorithm
DBSCAN groups data based on density using parameters ε (epsilon) and MinPts.
It identifies clusters of arbitrary shape and handles noise effectively.
7(a) OLAP vs OLTP & Slice vs Dice
OLAP: Analytical, read-intensive, historical data
OLTP: Transactional, real-time updates
Slice: Fixes one dimension
Dice: Selects multiple dimensions
7(b) Spatial Data Mining
Spatial data includes geographical information (maps, satellite data).
Mining involves spatial clustering, association, and trend detection using GIS tools.
Related Notes
BASIC ELECTRICAL ENGINEERING
ENGINEERING PHYSICS THEORY EXAMINATION 2024-25
(SEM I) ENGINEERING CHEMISTRY THEORY EXAMINATION...
THEORY EXAMINATION 2024-25 ENGINEERING MATHEMATICS...
(SEM I) THEORY EXAMINATION 2024-25 ENGINEERING CHE...
(SEM I) THEORY EXAMINATION 2024-25 ENVIRONMENT AND...
Need more notes?
Return to the notes store to keep exploring curated study material.
Back to Notes StoreLatest Blog Posts
Best Home Tutors for Class 12 Science in Dwarka, Delhi
Top Universities in Chennai for Postgraduate Courses with Complete Guide
Best Home Tuition for Competitive Exams in Dwarka, Delhi
Best Online Tutors for Maths in Noida 2026
Best Coaching Centers for UPSC in Rajender Place, Delhi 2026
How to Apply for NEET in Gurugram, Haryana for 2026
Admission Process for BTech at NIT Warangal 2026
Best Home Tutors for JEE in Maharashtra 2026
Meet Our Exceptional Teachers
Discover passionate educators who inspire, motivate, and transform learning experiences with their expertise and dedication
Explore Tutors In Your Location
Discover expert tutors in popular areas across India
Discover Elite Educational Institutes
Connect with top-tier educational institutions offering world-class learning experiences, expert faculty, and innovative teaching methodologies