THEORY EXAMINATION (SEM–VI) 2016-17 DATA WAREHOUSING & DATA MINING
DATA WAREHOUSING & DATA MINING – NCS066
B.Tech (SEM VI) | Section-wise Solved Answers
SECTION – A
(Explain the following – 2 marks each)
(a) Difference between Data Warehouse and Database
A database is designed for daily transaction processing and stores current data. A data warehouse stores historical, integrated, and summarized data used mainly for analysis and decision-making.
(b) Snowflake Model vs Fact Constellation Model
The snowflake model normalizes dimension tables into multiple related tables, reducing redundancy. A fact constellation model contains multiple fact tables sharing dimension tables and supports complex business processes.
(c) Characteristics of a Data Warehouse
A data warehouse is subject-oriented, integrated, time-variant, and non-volatile. These characteristics help in long-term data analysis.
(d) Use of Metadata
Metadata describes data about data, such as source, structure, and meaning. It helps users understand, manage, and use warehouse data effectively.
(e) Concept Hierarchy
A concept hierarchy defines levels of abstraction for data, such as city → state → country, enabling data generalization.
(f) Need for Data Cleaning
Data cleaning removes noise, inconsistencies, and missing values to improve data quality and accuracy of mining results.
(g) Frequent Itemset, Support, Confidence
A frequent itemset appears frequently in a dataset.
Support measures how often it appears.
Confidence measures the strength of association between items.
(h) Decision Tree for Student Database
A decision tree represents decisions based on attributes like marks, attendance, and result, helping classify student performance.
(i) Classification of OLAP Tools
OLAP tools are classified as MOLAP, ROLAP, and HOLAP based on storage and processing methods.
(j) Spatial Mining
Spatial mining deals with geographical data. It discovers patterns such as spatial relationships and location-based trends.
SECTION – B
(Attempt any five – explained properly)
(a) Star Schema vs Snowflake Schema
Star schema uses a central fact table connected to denormalized dimension tables, making queries faster. Snowflake schema normalizes dimension tables, reducing storage but increasing query complexity.
(b) Mapping Data Warehouse to Multiprocessor Architecture
The steps include data partitioning, parallel query processing, load balancing, and synchronization to improve performance.
(c) Challenges in Data Mining
Challenges include handling noisy data, scalability, integration of heterogeneous data, and user interaction complexity.
(d) Smoothing Techniques in Data Cleaning
Smoothing techniques include binning, regression, and clustering, which reduce noise and improve data consistency.
(e) Classification of Data Mining Systems
Data mining systems are classified based on databases mined, knowledge types, techniques used, and application domains.
(f) Decision Tree Classification
Decision trees split data based on attributes to form classification rules. Issues include attribute selection, overfitting, and handling continuous data. Overfitting can be reduced using pruning.
(g) MOLAP vs ROLAP Architecture
MOLAP stores data in multidimensional cubes for fast access. ROLAP stores data in relational tables and uses SQL for analysis.
(h) Web Mining
Web mining extracts useful information from web data.
• Content mining analyzes page contents
• Structure mining analyzes links
• Usage mining analyzes user behavior
SECTION – C
(Attempt any two – long answers)
(3) University Data Warehouse
(i) A snowflake schema is drawn with dimensions student, course, semester, and instructor linked to a fact table containing count and avg_grade.
(ii) OLAP operations such as roll-up (semester → year) and slice (CS courses) are used to list average grades per student.
(4) K-Means Clustering Algorithm
The algorithm partitions the given five points into k clusters by selecting centroids, assigning points based on distance, and updating centroids iteratively until convergence.
(5) Spatial vs Temporal Mining
Spatial mining analyzes location-based data such as maps. Temporal mining analyzes time-based data such as trends over time. Both help discover hidden patterns.
Related Notes
BASIC ELECTRICAL ENGINEERING
ENGINEERING PHYSICS THEORY EXAMINATION 2024-25
(SEM I) ENGINEERING CHEMISTRY THEORY EXAMINATION...
THEORY EXAMINATION 2024-25 ENGINEERING MATHEMATICS...
(SEM I) THEORY EXAMINATION 2024-25 ENGINEERING CHE...
(SEM I) THEORY EXAMINATION 2024-25 ENVIRONMENT AND...
Need more notes?
Return to the notes store to keep exploring curated study material.
Back to Notes StoreLatest Blog Posts
Best Home Tutors for Class 12 Science in Dwarka, Delhi
Top Universities in Chennai for Postgraduate Courses with Complete Guide
Best Home Tuition for Competitive Exams in Dwarka, Delhi
Best Online Tutors for Maths in Noida 2026
Best Coaching Centers for UPSC in Rajender Place, Delhi 2026
How to Apply for NEET in Gurugram, Haryana for 2026
Admission Process for BTech at NIT Warangal 2026
Best Home Tutors for JEE in Maharashtra 2026
Meet Our Exceptional Teachers
Discover passionate educators who inspire, motivate, and transform learning experiences with their expertise and dedication
Explore Tutors In Your Location
Discover expert tutors in popular areas across India
Discover Elite Educational Institutes
Connect with top-tier educational institutions offering world-class learning experiences, expert faculty, and innovative teaching methodologies