(SEM V) THEORY EXAMINATION 2023-24 DATA ANALYTICS
Subject Code: KCS051
Subject Name: Data Analytics
Course: B.Tech (Semester V)
Maximum Marks: 100
Duration: 3 Hours
Exam Year: 2023–24
Sections: A, B, and C
SECTION A – Short Answer Questions (2 × 10 = 20 Marks)
Attempt all questions briefly.
a. How have advancements in technology contributed to the scalability of analytics?
b. What are the sources of data in data analytics?
c. Elaborate on the mathematical foundations of Support Vector Machines (SVMs).
d. Discuss advantages of Bayesian methods in real-world applications.
e. Elaborate on methods used for filtering streams in real-time analytics.
f. What are key considerations when implementing sampling techniques for stream data?
g. Differentiate stream-based algorithms vs batch processing in frequent itemset mining.
h. What are challenges in Apriori algorithm under memory constraints?
i. Explain the role of Hive in the Hadoop ecosystem.
j. How does the MapReduce framework facilitate distributed processing?
Tips:
Revise Hadoop ecosystem (HDFS, MapReduce, Hive).
Learn SVM equations — hyperplane, kernel trick.
Understand real-time stream filtering and batch vs stream difference.
SECTION B – Medium-Length Questions (10 × 3 = 30 Marks)
Attempt any three of the following.
Characteristics of Data: Volume, Variety, Velocity, Veracity, Value.
Impact: affects storage, scalability, and analytical model design.
Bayesian Networks:
Probabilistic graphical model based on conditional dependencies.
Used in medical diagnosis, predictive modeling, and uncertainty handling.
Real-time Analytics Platforms:
Tools: Apache Storm, Flink, Kafka Streams.
Used for live data processing — IoT sensors, stock trading, etc.
Clustering Comparison:
K-Means: Efficient, assumes spherical clusters.
Hierarchical: Dendrogram-based, better for non-spherical or unknown clusters.
Sharding in NoSQL:
Horizontal data partitioning to improve scalability.
Addresses challenges in distributed database management (MongoDB, Cassandra).
SECTION C – Long / Analytical Questions (10 × 5 = 50 Marks)
Q3. Neural Networks and Fuzzy Models
a. Generalization in Neural Networks:
Balancing bias–variance trade-off using regularization and dropout.
OR
b. Fuzzy Logic Models:
Fuzzy rules capture uncertainty better than crisp models.
Applied in expert systems and predictive modeling.
Q4. Stream Data Analysis
a. Counting distinct elements in streams:
Algorithms: Flajolet–Martin, HyperLogLog, and Bloom filters.
Used for real-time metrics, unique user counts, etc.
OR
b. Counting uniqueness in a window:
Tracks element frequency and diversity within time-based windows.
Q5. Clustering
a. CLIQUE vs ProCLUS:
CLIQUE: Grid-based subspace clustering; efficient for high-dimensional data.
ProCLUS: Uses projected clustering, better handling of noise and outliers.
OR
b. Non-Euclidean Clustering:
Uses Mahalanobis, cosine, or Manhattan distance instead of Euclidean.
Important for text, graphs, and categorical data.
Q6. Interactive and NoSQL Systems
a. Interactive Techniques:
Visualization tools like Tableau, Power BI, D3.js.
Enable intuitive exploration of large datasets.
OR
b. NoSQL for Unstructured Data:
Databases: MongoDB, Cassandra, CouchDB.
Outperform relational DBs in scalability and flexible schema handling.
Q7. Modern Analytics Tools
a. Analysis vs Reporting:
Analysis: Discover patterns (machine learning, clustering).
Reporting: Summarize data for decision-making (dashboards).
OR
b. Modern Tools:
Power BI, Tableau, Google Data Studio, Apache Spark, Hadoop, TensorFlow.
Revolutionized analytics via automation, scalability, and visualization.
Key Topics to Prepare
Core Concepts
Data lifecycle and sources Big Data characteristics (5Vs)
Types of analytics: Descriptive, Predictive, Prescriptive
Machine Learning Models
SVM, Bayesian networks, Neural networks, Fuzzy logic
Real-time & Stream Processing
Algorithms for streaming data Tools: Apache Kafka, Flink, Spark Streaming
Clustering & Mining
K-Means, Hierarchical, CLIQUE, ProCLUS, Apriori
Big Data Frameworks
Hadoop ecosystem: HDFS, Hive, MapReduce NoSQL databases: MongoDB, Cassandra
Visualization & Tools
Power BI, Tableau, D3.js Interactive analytics and dashboards
Study Tips
Understand key algorithms conceptually (not just formulas).
Practice diagram-based answers — network models, data flow in Hadoop.
Review use cases — predictive analytics, fraud detection, IoT.
Revise modern tools and frameworks — Spark, Kafka, Hive, Tableau.
Focus on conceptual clarity in clustering, streaming, and NoSQL.
Related Notes
BASIC ELECTRICAL ENGINEERING
ENGINEERING PHYSICS THEORY EXAMINATION 2024-25
(SEM I) ENGINEERING CHEMISTRY THEORY EXAMINATION...
THEORY EXAMINATION 2024-25 ENGINEERING MATHEMATICS...
(SEM I) THEORY EXAMINATION 2024-25 ENGINEERING CHE...
(SEM I) THEORY EXAMINATION 2024-25 ENVIRONMENT AND...
Need more notes?
Return to the notes store to keep exploring curated study material.
Back to Notes StoreLatest Blog Posts
Avoid Common Mistakes in CMAT Exam and Score High
Crack CMAT Like a Pro: Smart Strategies and Expert Study Support from Suganta Tutors
Master the CSIR-UGC NET 2025: Step-by-Step Guide to Achieve JRF & Teaching Excellence
Thomas Edison’s Inspiring Journey: How Education and Persistence Created the Light of...
5 Powerful AI Tools Every Student Should Use to Learn Smarter and Faster
SAT vs ACT Explained: Which Test Gives You a Better Edge for U.S. College Admissions?
8 Interesting Ways to Increase Your Concentration While Studying