(SEM VI) THEORY EXAMINATION 2017-18 DATA-WAREHOUSING & DATA MINING

B.Tech Data Structure 0 downloads
₹29.02

Data Warehousing & Data Mining (NCS-066)

Complete Section-Wise Explanation – B.Tech Semester VI


Introduction to the Subject

Data Warehousing and Data Mining focuses on how large volumes of data are stored, organized, processed, and analyzed to extract meaningful knowledge. While a data warehouse is used for systematic storage and analysis of historical data, data mining applies intelligent techniques to discover patterns, trends, and relationships hidden inside the data.

This subject is extremely important for areas like:


Business intelligence                                   Decision support systems

Market basket analysis                                Customer behavior analysis

Machine learning foundations

The paper is divided into three sections: A, B, and C, each testing a different depth of understanding.


SECTION A – Basic Concepts & Definitions

Pattern:
Attempt all questions
10 questions × 2 marks = 20 marks

Nature of Section A


Section A checks your basic conceptual clarity. Answers must be short but meaningful, focusing on correct definitions and differences. Writing unnecessary explanations here may waste time.

Explanation of Section A Topics


Key Steps of Data Mining
The data mining process begins with data cleaning, followed by data integration, data selection, data transformation, data mining, pattern evaluation, and finally knowledge presentation. These steps ensure that raw data is converted into useful information.


Support and Confidence
Support measures how frequently an itemset appears in the database, while confidence measures how often the rule has been found to be true. These are essential parameters in association rule mining.


Attribute Selection Measures and Drawback of Information Gain
Attribute selection measures decide the best attribute for splitting data in decision trees. Information gain favors attributes with many distinct values, which can lead to biased results.


Classification vs Clustering
Classification is a supervised learning technique where class labels are known, while clustering is an unsupervised technique where data is grouped based on similarity without predefined labels.


Apriori Algorithm Statement
The Apriori principle states that all non-empty subsets of a frequent itemset must also be frequent. It is used to reduce search space in association rule mining.


Drawbacks of K-Means Algorithm
K-means requires the number of clusters in advance, is sensitive to initial centroids, and does not perform well with outliers or non-spherical clusters.


Chi-Square Test
The chi-square test measures the statistical significance of the difference between observed and expected frequencies, commonly used in attribute selection.


Roll-Up vs Drill-Down
Roll-up summarizes data to a higher level of hierarchy, while drill-down provides detailed data at a lower level.


Hierarchical Clustering Methods
Hierarchical clustering methods include agglomerative (bottom-up) and divisive (top-down) approaches.


Features of Genetic Algorithm
Genetic algorithms use selection, crossover, mutation, and fitness functions to find optimal solutions inspired by natural evolution.


SECTION B – Descriptive Theory & Numericals


Pattern:
Attempt any three questions
3 × 10 marks = 30 marks

Nature of Section B


This section requires descriptive answers written in paragraphs. You must explain concepts properly and, where required, solve numericals step by step.

Explanation of Major Questions

Data Mining / Knowledge Extraction Process

The knowledge extraction process involves transforming raw data into valuable knowledge through stages such as data cleaning, integration, transformation, mining, and evaluation. Each stage plays a crucial role in ensuring accuracy and relevance of extracted patterns.


OLAP vs OLTP


OLTP systems are used for day-to-day transaction processing and handle large numbers of short online transactions. 

OLAP systems are designed for analysis, supporting complex queries and decision-making using historical data.


Apriori Algorithm Numerical

This question tests practical application of association rule mining. Using the given transaction database and minimum support and confidence, frequent itemsets are generated step by step, followed by rule generation. Proper calculation of support and confidence is essential.


Database Schemas

Database schemas define how data is organized in a warehouse. Common schemas include star schema, snowflake schema, and fact constellation schema. Each differs in complexity, redundancy, and query performance.


Data Backup and Recovery in Data Warehouse

Data backup ensures data safety, while recovery restores data in case of failure. Techniques include full backup, incremental backup, and disaster recovery planning.



SECTION C – Advanced Analysis & Algorithms

Pattern:
Attempt any one part from each question
5 × 10 marks = 50 marks

This section carries the highest weightage and determines overall performance.


Question 3

Three-Tier Data Warehouse Architecture & ETL

The three-tier architecture consists of bottom tier (data warehouse server), middle tier (OLAP server), and top tier (front-end tools).
ETL (Extract, Transform, Load) extracts data from sources, transforms it into a suitable format, and loads it into the warehouse.

Data Cleaning Strategies

Data cleaning removes noise, handles missing values, corrects inconsistencies, and improves data quality for better mining results.


Question 4

Clustering Methods & STING

Clustering methods include partitioning, hierarchical, density-based, and grid-based methods.
STING is a grid-based clustering method that divides data space into cells and stores statistical information for efficient clustering.

Applications of Data Warehousing, Web Mining & Spatial Mining

Data warehousing is used in banking, retail, healthcare, and telecom.
Web mining extracts patterns from web data, while spatial mining analyzes geographical and spatial data.


Question 5

Data Warehouse Definition & Design Strategies

A data warehouse is a subject-oriented, integrated, time-variant, and non-volatile collection of data.
Design strategies include scalability, data quality, security, and performance optimization.

Short Notes

Concept hierarchy organizes data at different abstraction levels.
ROLAP uses relational databases, while MOLAP uses multidimensional databases.
Gain ratio improves information gain by reducing bias.
Classification assigns labels, while clustering groups data based on similarity.


Question 6

Decision Tree Classifier & Information Gain

This question involves constructing a decision tree using given gains and deriving decision rules. Proper understanding of entropy and gain is required.

Laplacian Correction & Bayesian Classification

Laplacian correction avoids zero probability issues in Bayesian classifiers. The given tuple is classified by calculating posterior probabilities.


Question 7

K-Means Algorithm Numerical

This problem tests clustering concepts using Euclidean distance. Starting with given initial centroids, new cluster centers are calculated after the first iteration.

Hierarchical Clustering & BIRCH

Hierarchical clustering builds nested clusters.
BIRCH is a scalable hierarchical clustering method using CF trees for large datasets.

File Size
153.94 KB
Uploader
SuGanta International
⭐ Elite Educators Network

Meet Our Exceptional Teachers

Discover passionate educators who inspire, motivate, and transform learning experiences with their expertise and dedication

KISHAN KUMAR DUBEY

KISHAN KUMAR DUBEY

Sant Ravidas Nagar Bhadohi, Uttar Pradesh , Babusarai Market , 221314
5 Years
Years
₹10000+
Monthly
₹201-300
Per Hour

This is Kishan Kumar Dubey. I have done my schooling from CBSE, graduation from CSJMU, post graduati...

Swethavyas bakka

Swethavyas bakka

Hyderabad, Telangana , 500044
10 Years
Years
₹10000+
Monthly
₹501-600
Per Hour

I have 10+ years of experience in teaching maths physics and chemistry for 10th 11th 12th and interm...

Vijaya Lakshmi

Vijaya Lakshmi

Hyderabad, Telangana , New Nallakunta , 500044
30+ Years
Years
₹9001-10000
Monthly
₹501-600
Per Hour

I am an experienced teacher ,worked with many reputed institutions Mount Carmel Convent , Chandrapu...

Shifna sherin F

Shifna sherin F

Gudalur, Tamilnadu , Gudalur , 643212
5 Years
Years
₹6001-7000
Monthly
₹401-500
Per Hour

Hi, I’m Shifna Sherin! I believe that every student has the potential to excel in Math with the righ...

Divyank Gautam

Divyank Gautam

Pune, Maharashtra , Kothrud , 411052
3 Years
Years
Not Specified
Monthly
Not Specified
Per Hour

An IIT graduate having 8 years of experience teaching Maths. Passionate to understand student proble...

Explore Tutors In Your Location

Discover expert tutors in popular areas across India

Physiotherapy Guidance (Certified Professionals Only) Near Sector 120 Noida – Expert Care for Pain Relief and Recovery Sector 120, Noida
Prenatal Yoga Training Near By Uttam Nagar – Safe & Guided Pregnancy Wellness Uttam Nagar, Delhi
German Language Classes Near Sector 118 Noida – Learn German with Expert Trainers Noida
Spanish Language Classes Near Uttam Nagar – Learn Spanish with Confidence Uttam Nagar, Delhi
Japanese Language Classes Near Uttam Nagar – Learn Japanese for Global Opportunities Uttam Nagar, Delhi
Candle Making Classes In Dwarka Mor – Learn the Art of Handmade Candle Crafting Dwarka Mor, Delhi
Yoga Classes Near Saket Transform Your Mind, Body & Lifestyle with Professional Yoga Training in 2026 Saket, Delhi
Violin Classes Near DLF Phase 5 – Learn, Grow & Perform with Confidence DLF Phase V, Gurugram
Meditation Coaching Near Malibu Town, Gurugram – Find Inner Calm & Mental Clarity Malibu Town, Gurugram
Fashion Designing Course Near Sector 81 Gurugram – Turn Your Creativity into a Successful Career Sector 81, Gurugram
Prenatal Yoga Training Near Sector 123 Noida – A Complete Guide for Healthy Pregnancy Noida
English Spoken Classes Near Rosewood City – Improve Your Confidence and Fluency Rosewood, Gurugram
Public Speaking Training Near Uttam Nagar – Speak with Confidence & Impact Uttam Nagar, Delhi
Guitar Classes Near By Kalkaji Learn Guitar from Experts & Turn Your Musical Passion into a Lifelong Skill Kalkaji, Delhi
TOEFL Coaching Near Noida Sector 104 – Complete Preparation Guide for Study Abroad Sector 104, Noida
Guitar Classes Near DLF Phase 1 Gurugram – Professional Music Training for Kids, Beginners & Adults DLF Phase I, Gurugram
Tailoring & Stitching Classes Near By Dwarka Mor – Learn Professional Sewing Skills Dwarka Mor, Delhi
Spoken English Classes Near By Green Park Build Fluency, Confidence & Professional Communication Skills in 2026 Green Park, Delhi
Accounts & Commerce Classes Near By Dwarka Mor Professional Coaching Dwarka Mor, Delhi
Web Development Classes Near Uttam Nagar – Learn to Build Modern Websites Uttam Nagar, Delhi
⭐ Premium Institute Network

Discover Elite Educational Institutes

Connect with top-tier educational institutions offering world-class learning experiences, expert faculty, and innovative teaching methodologies

Réussi Academy of languages

sugandha mishra

Réussi Academy of languages
Madhya pradesh, Indore, G...

Details

Coaching Center
Private
Est. 2021-Present

Sugandha Mishra is the Founder Director of Réussi Academy of Languages, a premie...

IGS Institute

Pranav Shivhare

IGS Institute
Uttar Pradesh, Noida, Sec...

Details

Coaching Center
Private
Est. 2011-2020

Institute For Government Services

Krishna home tutor

Krishna Home tutor

Krishna home tutor
New Delhi, New Delhi, 110...

Details

School
Private
Est. 2001-2010

Krishna home tutor provide tutors for all subjects & classes since 2001

Edustunt Tuition Centre

Lakhwinder Singh

Edustunt Tuition Centre
Punjab, Hoshiarpur, 14453...

Details

Coaching Center
Private
Est. 2021-Present
Great success tuition & tutor

Ginni Sahdev

Great success tuition & tutor
Delhi, Delhi, Raja park,...

Details

Coaching Center
Private
Est. 2011-2020