(SEM VIII) THEORY EXAMINATION 2021-22 DATA WAREHOUSING & DATA MINING
SECTION A
(Attempt all – 2 × 10 = 20 marks)
(a) Data Warehousing
Data Warehousing is the process of collecting, storing, and managing large volumes of historical data from multiple sources to support decision-making and analysis.
(b) Data Warehousing Components
The main components are data sources, ETL tools (Extract, Transform, Load), data warehouse storage, metadata, and front-end tools for reporting and analysis.
(c) Data Warehouse Process
The data warehouse process involves extracting data from sources, cleaning and transforming it, loading it into the warehouse, and providing access for analysis and reporting.
(d) Warehousing Strategy
Warehousing strategy defines how data is collected, stored, organized, and accessed in a data warehouse to meet business and analytical needs efficiently.
(e) Data Cleaning
Data cleaning is the process of removing errors, inconsistencies, duplicate records, and missing values from data to improve data quality.
(f) Need of Data Mining
Data mining is needed to discover hidden patterns, relationships, trends, and useful information from large datasets for better decision-making.
(g) Classification
Classification is a data mining technique that assigns data items to predefined classes based on their attributes.
(h) Clustering
Clustering groups similar data objects into clusters without predefined labels, based on similarity or distance measures.
(i) Data Visualization
Data visualization represents data graphically using charts, graphs, and dashboards to make analysis and understanding easier.
(j) Aggregation
Aggregation is the process of summarizing detailed data into higher-level information, such as totals or averages, for analysis.
SECTION B
(Attempt any three – 10 × 3 = 30 marks)
2(a) OLAP Functions, OLAP Tools, and OLAP Servers
OLAP (Online Analytical Processing) enables fast analysis of multidimensional data.
OLAP functions include roll-up, drill-down, slice, dice, and pivot operations.
OLAP tools provide interfaces for analysis, reporting, and visualization.
OLAP servers store and process multidimensional data efficiently and are classified as MOLAP, ROLAP, and HOLAP servers.
2(b) Hardware and Operating Systems for Data Warehousing
Data warehouses require high-performance hardware such as powerful processors, large memory, high-speed storage, and parallel processing systems.
Operating systems must support multitasking, scalability, fault tolerance, and efficient resource management to handle large analytical workloads.
2(c) Binning, Clustering, and Regression
Binning smooths data by grouping values into bins to reduce noise.
Clustering groups similar data objects without class labels.
Regression models the relationship between variables to predict continuous values.
2(d) Statistical Measures in Large Databases for Classification
Statistical measures such as mean, median, variance, correlation, entropy, and information gain are used to evaluate attributes and improve classification accuracy in large databases.
2(e) Building a Data Warehouse
Building a data warehouse involves requirement analysis, data source identification, ETL process design, schema design, data loading, testing, deployment, and maintenance.
SECTION C
3(a) Tuning and Testing of Data Warehouse under Data Visualization
Tuning improves performance by optimizing queries, indexes, and storage.
Testing ensures data accuracy, consistency, performance, and reliability before deployment.
Both are essential for effective data visualization and analysis.
3(b) Parallel Processors and Cluster Systems
Parallel processors divide tasks across multiple CPUs to increase performance.
Cluster systems connect multiple computers to work as a single system, improving scalability and fault tolerance in data warehouse processing.
4(a) Mapping Data Warehouse to Multiprocessor Architecture
Mapping involves distributing data and queries across multiple processors to achieve parallelism, reduce response time, and improve throughput in data warehouse systems.
4(b) Data Cube Aggregation and Dimensionality Reduction
Data cube aggregation summarizes data across dimensions to reduce computation.
Dimensionality reduction reduces the number of attributes while preserving important information, improving mining efficiency.
5(a) Distance-Based vs Decision Tree-Based Algorithms
Distance-based algorithms classify data using similarity measures like Euclidean distance.
Decision tree-based algorithms classify data using hierarchical decision rules derived from attributes.
5(b) Web Mining, Spatial Mining, and Temporal Mining
Web mining extracts useful patterns from web data.
Spatial mining analyzes geographical or spatial data.
Temporal mining studies time-related patterns and trends in data.
6(a) Warehousing Software and Warehouse Schema Design
Warehousing software manages ETL, storage, and analysis.
Schema design includes star schema, snowflake schema, and fact constellation schema to organize multidimensional data efficiently.
6(b) Database System vs Data Warehouse & Multi-Dimensional Data Model
Database systems support day-to-day transactions, while data warehouses support analytical processing.
A multi-dimensional data model organizes data into facts and dimensions for OLAP analysis.
7(a) Numerosity Reduction, Concept Hierarchy Generation, and Decision Tree
Numerosity reduction reduces data size using techniques like sampling and regression.
Concept hierarchy generation organizes data into levels of abstraction.
Decision trees classify data using tree-structured decision rules.
7(b) Hierarchical and Partitioned Clustering Algorithms
Hierarchical algorithms create clusters in a tree-like structure.
Partitioned algorithms divide data into a fixed number of clusters based on optimization criteria.
Related Notes
BASIC ELECTRICAL ENGINEERING
ENGINEERING PHYSICS THEORY EXAMINATION 2024-25
(SEM I) ENGINEERING CHEMISTRY THEORY EXAMINATION...
THEORY EXAMINATION 2024-25 ENGINEERING MATHEMATICS...
(SEM I) THEORY EXAMINATION 2024-25 ENGINEERING CHE...
(SEM I) THEORY EXAMINATION 2024-25 ENVIRONMENT AND...
Need more notes?
Return to the notes store to keep exploring curated study material.
Back to Notes StoreLatest Blog Posts
Best Home Tutors for Class 12 Science in Dwarka, Delhi
Top Universities in Chennai for Postgraduate Courses with Complete Guide
Best Home Tuition for Competitive Exams in Dwarka, Delhi
Best Online Tutors for Maths in Noida 2026
Best Coaching Centers for UPSC in Rajender Place, Delhi 2026
How to Apply for NEET in Gurugram, Haryana for 2026
Admission Process for BTech at NIT Warangal 2026
Best Home Tutors for JEE in Maharashtra 2026
Meet Our Exceptional Teachers
Discover passionate educators who inspire, motivate, and transform learning experiences with their expertise and dedication
Explore Tutors In Your Location
Discover expert tutors in popular areas across India
Discover Elite Educational Institutes
Connect with top-tier educational institutions offering world-class learning experiences, expert faculty, and innovative teaching methodologies