(SEM V) THEORY EXAMINATION 2023-24 DATA ANALYTICS

B.Tech Data Structure 0 downloads
₹29.00

Subject Code: KCS051

Subject Name: Data Analytics

Course: B.Tech (Semester V)

Maximum Marks: 100

Duration: 3 Hours

Exam Year: 2023–24

Sections: A, B, and C

SECTION A – Short Answer Questions (2 × 10 = 20 Marks)

Attempt all questions briefly.

a. How have advancements in technology contributed to the scalability of analytics?
b. What are the sources of data in data analytics?
c. Elaborate on the mathematical foundations of Support Vector Machines (SVMs).
d. Discuss advantages of Bayesian methods in real-world applications.
e. Elaborate on methods used for filtering streams in real-time analytics.
f. What are key considerations when implementing sampling techniques for stream data?
g. Differentiate stream-based algorithms vs batch processing in frequent itemset mining.
h. What are challenges in Apriori algorithm under memory constraints?
i. Explain the role of Hive in the Hadoop ecosystem.
j. How does the MapReduce framework facilitate distributed processing?

Tips:

Revise Hadoop ecosystem (HDFS, MapReduce, Hive).

Learn SVM equations — hyperplane, kernel trick.

Understand real-time stream filtering and batch vs stream difference.

SECTION B – Medium-Length Questions (10 × 3 = 30 Marks)

Attempt any three of the following.

Characteristics of Data:                    Volume, Variety, Velocity, Veracity, Value.

Impact: affects storage, scalability, and analytical model design.

Bayesian Networks:

Probabilistic graphical model based on conditional dependencies.

Used in medical diagnosis, predictive modeling, and uncertainty handling.

Real-time Analytics Platforms:

Tools: Apache Storm, Flink, Kafka Streams.

Used for live data processing — IoT sensors, stock trading, etc.

Clustering Comparison:

K-Means: Efficient, assumes spherical clusters.

Hierarchical: Dendrogram-based, better for non-spherical or unknown clusters.

Sharding in NoSQL:

Horizontal data partitioning to improve scalability.

Addresses challenges in distributed database management (MongoDB, Cassandra).

SECTION C – Long / Analytical Questions (10 × 5 = 50 Marks)

Q3. Neural Networks and Fuzzy Models

a. Generalization in Neural Networks:

Balancing bias–variance trade-off using regularization and dropout.
OR
b. Fuzzy Logic Models:

Fuzzy rules capture uncertainty better than crisp models.

Applied in expert systems and predictive modeling.

Q4. Stream Data Analysis

a. Counting distinct elements in streams:

Algorithms: Flajolet–Martin, HyperLogLog, and Bloom filters.

Used for real-time metrics, unique user counts, etc.
OR
b. Counting uniqueness in a window:

Tracks element frequency and diversity within time-based windows.

Q5. Clustering

a. CLIQUE vs ProCLUS:

CLIQUE: Grid-based subspace clustering; efficient for high-dimensional data.

ProCLUS: Uses projected clustering, better handling of noise and outliers.
OR
b. Non-Euclidean Clustering:

Uses Mahalanobis, cosine, or Manhattan distance instead of Euclidean.

Important for text, graphs, and categorical data.

Q6. Interactive and NoSQL Systems

a. Interactive Techniques:

Visualization tools like Tableau, Power BI, D3.js.

Enable intuitive exploration of large datasets.
OR
b. NoSQL for Unstructured Data:

Databases: MongoDB, Cassandra, CouchDB.

Outperform relational DBs in scalability and flexible schema handling.

Q7. Modern Analytics Tools

a. Analysis vs Reporting:

Analysis: Discover patterns (machine learning, clustering).

Reporting: Summarize data for decision-making (dashboards).
OR
b. Modern Tools:

Power BI, Tableau, Google Data Studio, Apache Spark, Hadoop, TensorFlow.

Revolutionized analytics via automation, scalability, and visualization.

Key Topics to Prepare

 Core Concepts

Data lifecycle and sources                                                 Big Data characteristics (5Vs)

Types of analytics: Descriptive, Predictive, Prescriptive

 Machine Learning Models

SVM, Bayesian networks, Neural networks, Fuzzy logic

 Real-time & Stream Processing

Algorithms for streaming data                                          Tools: Apache Kafka, Flink, Spark Streaming

 Clustering & Mining

K-Means, Hierarchical, CLIQUE, ProCLUS, Apriori

 Big Data Frameworks

Hadoop ecosystem: HDFS, Hive, MapReduce                     NoSQL databases: MongoDB, Cassandra

 Visualization & Tools

Power BI, Tableau, D3.js                                                       Interactive analytics and dashboards

 Study Tips

Understand key algorithms conceptually (not just formulas).

Practice diagram-based answers — network models, data flow in Hadoop.

Review use cases — predictive analytics, fraud detection, IoT.

Revise modern tools and frameworks — Spark, Kafka, Hive, Tableau.

Focus on conceptual clarity in clustering, streaming, and NoSQL.

File Size
142.28 KB
Uploader
Payal Saini