(SEM V) THEORY EXAMINATION 2024-25 DATA ANALYTICS
Subject Code: BCS052
Maximum Marks: 70
Time: 3 Hours
Paper ID: 310908
Question Paper Overview
SECTION A (2 × 7 = 14 Marks)
(Short Answer / Conceptual Questions)
a. Differentiate between Predictive and Prescriptive Data Analytics.
b. Define the term Data Lake, Database, and Data Warehouse.
c. Explain the concept of Outliers.
d. Describe the concept of Lasso Regression.
e. Differentiate between Stream Processing and Traditional Data Processing.
f. Write the two limitations of K-Means.
g. Discuss the various categories of clustering techniques.
SECTION B (Attempt any three × 7 = 21 Marks)
a. Explain the different categories of data analytics with examples.
b. Explore PCA (Principal Component Analysis).
Given data = {4, 8, 13, 7; 11, 4, 5, 14}.
Compute the principal components and reduce dimension from 2D to 1D.
c. Explain Market Basket Analysis.
Is it supervised or unsupervised?
How can a company use it to improve marketing strategies?
d. Differentiate between CLIQUE and ProCLUS clustering algorithms.
e. Differentiate between NoSQL and Relational Databases.
Identify when to use NoSQL instead of a Relational Database, with an example.
SECTION C (Attempt one part from each question × 7 = 35 Marks)
Q3
(a) Differentiate between Structured, Semi-Structured, and Unstructured Data.
OR
(b) Describe Big Data and its characteristics.
Q4
(a) Differentiate between Neural Network and Artificial Neural Network.
OR
(b) Given two fuzzy sets:
A = {(10, 0.2), (20, 0.4), (25, 0.7), (30, 0.9), (40, 1), (50, 0.4)}
B = {(10, 0.4), (20, 0.1), (25, 0.9), (30, 0.2), (40, 0.6), (50, 0.6)}
Apply Union, Intersection, Complement, Bold Union, and Bold Intersection operations.
Q5
(a) Apply the Flajolet-Martin Algorithm on the data stream:
S = 1, 3, 2, 1, 2, 3, 4, 3, 1, 2, 3, 1
Given: h(x) = (6x + 1) mod 5
Identify unique elements in the stream.
OR
(b) Discuss the concept of filtering in Data Stream Processing and explain Bloom Filtering in detail.
Q6
(a) Cluster the following eight points into three clusters using K-Means Algorithm:
A₁(2,10), A₂(2,5), A₃(8,4), A₄(5,8), A₅(7,5), A₆(6,4), A₇(1,2), A₈(4,9)
Initial centers: A₁(2,10), A₄(5,8), A₇(1,2)
Distance function:
- P(a,b)=∣x2−x1∣+∣y2−y1∣P(a,b) = |x₂ - x₁| + |y₂ - y₁|P(a,b)=∣x2−x1∣+∣y2−y1∣
Find the final cluster centers.
OR
(b) A transaction database has 6 transactions with Support = 50%, Confidence = 60%:
| TID | Items Bought |
|---|---|
| 10 | Beer, Nuts, Diaper |
| 20 | Beer, Coffee, Diaper |
| 30 | Beer, Diaper, Eggs |
| 40 | Nuts, Eggs, Milk |
| 50 | Nuts, Coffee, Diaper, Eggs, Milk |
| 60 | Beer, Nuts, Diaper |
i) Use Apriori Algorithm to find frequent itemsets.
ii) Show all strong association rules (with support & confidence).
Q7
(a) Brief about the main components of MapReduce.
OR
(b) Draw and explain the architecture of HIVE with its features.
Key Topics for Revision
1. Categories of Data Analytics
| Type | Description | Example |
|---|---|---|
| Descriptive | Summarizes past data | Monthly sales reports |
| Diagnostic | Explains reasons behind trends | Root cause analysis |
| Predictive | Forecasts future trends | Predicting customer churn |
| Prescriptive | Suggests optimal actions | Recommending marketing offers |
2. Data Storage Concepts
Database: Structured, transactional data (SQL). Data Warehouse: Historical, analytical storage (OLAP).
Data Lake: Raw, unstructured storage (Hadoop, AWS S3).
3. Outliers
Data points that deviate significantly from others. Detected using:
Z-score, IQR (Interquartile Range),
Visualization (Box Plot).
4. Lasso Regression
Regularized regression using L1 penalty.
Shrinks coefficients to zero → performs feature selection.
5. Stream Processing vs Traditional Processing
| Stream Processing | Traditional Processing |
|---|---|
| Real-time data flow | Batch data |
| Frameworks: Apache Flink, Kafka | Hadoop, Spark |
| Example: IoT sensor data | Daily transaction logs |
6. PCA (Principal Component Analysis)
Used for dimensionality reduction.
Steps:
Standardize data. Compute covariance matrix.
Calculate eigenvalues & eigenvectors. Project data onto principal components.
7. Market Basket Analysis
Unsupervised learning (association rule mining).
Uses Apriori Algorithm: Finds frequent itemsets, e.g., “Beer → Diaper.”
Applications: Retail recommendations, cross-selling, layout optimization.
8. Clustering
Partitioning methods: K-Means, K-Medoids. Hierarchical methods: Agglomerative, Divisive.
Density-based: DBSCAN, OPTICS. Grid-based: CLIQUE, STING.
9. NoSQL vs Relational Database
| Feature | Relational | NoSQL |
|---|---|---|
| Schema | Fixed | Flexible |
| Scaling | Vertical | Horizontal |
| Use Case | Banking | Social media, IoT |
| Example | MySQL | MongoDB, Cassandra |
10. Big Data Characteristics (5Vs)
Volume: Massive data size. Velocity: Fast data generation.
Variety: Structured, semi/unstructured. Veracity: Data accuracy.
Value: Extracting useful insights.
11. Flajolet–Martin Algorithm
Estimates number of distinct elements in data streams using hash functions.
Efficient for large-scale streaming data.
12. Bloom Filtering
Probabilistic data structure for membership testing.
Space-efficient but allows false positives.
Used in caching, networking, and databases.
13. Apriori Algorithm
Step 1: Generate frequent itemsets using support. Step 2: Generate strong association rules using confidence.
Example:
Support(A→B) = freq(A∪B) / total transactions Confidence(A→B) = freq(A∪B) / freq(A)
14. K-Means Clustering
Iterative algorithm that partitions data into k clusters.
Limitations:
Sensitive to initial centroids. Assumes spherical clusters.
15. MapReduce Components
Map Phase: Input split → key-value pairs.
Shuffle & Sort: Group similar keys.
Reduce Phase: Aggregate output.
16. HIVE Architecture
Built on top of Hadoop for data querying (SQL-like interface).
Components:
Driver: Compiles queries.
Metastore: Stores schema.
Execution Engine: Converts queries to MapReduce.
HiveQL: SQL-based query language.
Related Notes
BASIC ELECTRICAL ENGINEERING
ENGINEERING PHYSICS THEORY EXAMINATION 2024-25
(SEM I) ENGINEERING CHEMISTRY THEORY EXAMINATION...
THEORY EXAMINATION 2024-25 ENGINEERING MATHEMATICS...
(SEM I) THEORY EXAMINATION 2024-25 ENGINEERING CHE...
(SEM I) THEORY EXAMINATION 2024-25 ENVIRONMENT AND...
Need more notes?
Return to the notes store to keep exploring curated study material.
Back to Notes StoreLatest Blog Posts
Best Home Tutors for Class 12 Science in Dwarka, Delhi
Top Universities in Chennai for Postgraduate Courses with Complete Guide
Best Home Tuition for Competitive Exams in Dwarka, Delhi
Best Online Tutors for Maths in Noida 2026
Best Coaching Centers for UPSC in Rajender Place, Delhi 2026
How to Apply for NEET in Gurugram, Haryana for 2026
Admission Process for BTech at NIT Warangal 2026
Best Home Tutors for JEE in Maharashtra 2026
Meet Our Exceptional Teachers
Discover passionate educators who inspire, motivate, and transform learning experiences with their expertise and dedication
Explore Tutors In Your Location
Discover expert tutors in popular areas across India
Discover Elite Educational Institutes
Connect with top-tier educational institutions offering world-class learning experiences, expert faculty, and innovative teaching methodologies