(SEM VI) THEORY EXAMINATION 2024-25 DATA ANALYTICS

B.Tech Engineering 0 downloads
₹29.00

BIT601 – DATA ANALYTICS

Section-Wise Solved Answers (2024–25)


SECTION A

Attempt all questions in brief (2 × 7 = 14 marks)


(a) What are the main sources of data in data analytics?

The main sources of data in data analytics include transactional data from business systems, sensor and machine-generated data from IoT devices, social media data, web logs, survey data, and publicly available datasets. These sources together provide both structured and unstructured data for analysis.


(b) What is the purpose of the ‘operationalization’ phase?

The operationalization phase focuses on deploying the developed analytics model into real-world use. It ensures that insights or models are integrated into business processes so that decisions can be taken automatically or semi-automatically based on analytics results.


(c) What is the purpose of Support Vector Machines (SVM) in classification?

SVM is used for classification by finding an optimal separating boundary, called a hyperplane, between different classes. It aims to maximize the margin between data points of different classes, leading to better generalization and accuracy.


(d) What are fuzzy decision trees?

Fuzzy decision trees are decision trees that use fuzzy logic instead of crisp values. They allow partial membership of data points in multiple classes, which helps in handling uncertainty and imprecise data.


(e) Define stream computing and mention one key feature.

Stream computing is the processing of continuous data streams in real time. A key feature is low latency, meaning data is processed immediately as it arrives rather than being stored first.


(f) What are the advantages of K-means clustering?

K-means clustering is simple to understand, computationally efficient, and works well for large datasets. It is widely used due to its speed and ease of implementation.


(g) What is the purpose of MapReduce in big data processing?

MapReduce is used to process large datasets by dividing tasks into smaller parts (map phase) and then combining results (reduce phase). It enables parallel processing across distributed systems.


SECTION B

Attempt any three (7 × 3 = 21 marks)


(a) Stages in a data analytics project

A data analytics project starts with business understanding, where goals are clearly defined. This is followed by data collection and data preparation, where raw data is cleaned and organized. Next comes data exploration to understand patterns. Model planning and model building are then carried out to develop analytical models. Finally, results are communicated and operationalized for real-world use.


(b) Support vector and kernel methods comparison

Linear SVM works well for linearly separable data. Polynomial kernels handle non-linear patterns, while radial basis function (RBF) kernels manage complex data distributions. Kernel methods allow SVMs to operate in higher-dimensional spaces without explicitly computing them.


(c) Mining data streams in stock market prediction

Data streams in stock markets include live price feeds and trading volumes. Stream mining helps detect trends and anomalies in real time. Challenges include high speed, noise, and concept drift, while benefits include timely decision-making and risk reduction.


(d) Apriori algorithm for frequent itemsets

The Apriori algorithm finds frequent itemsets by using the principle that subsets of frequent itemsets must also be frequent. It generates candidate itemsets and prunes those that do not meet minimum support, repeating this process iteratively.


(e) Pig vs Hive

Pig is a scripting platform used for data flow processing, while Hive provides a SQL-like interface for querying data. Pig is more flexible for procedural tasks, whereas Hive is user-friendly for structured queries and reporting.


SECTION C


Q3. Attempt any one (7 marks)

(a) Difference between data analysis and data reporting

Data analysis focuses on discovering insights and patterns using statistical or machine learning techniques. Data reporting summarizes existing data using charts and dashboards. For example, predicting sales trends is analysis, while monthly sales charts are reporting.


(b) Model planning vs model building

Model planning involves selecting techniques and defining evaluation criteria. Model building is the actual implementation of models using algorithms and training data. Planning decides what to build, while building focuses on how to build it.


Q4. Attempt any one (7 marks)

(a) Neural networks and learning

Neural networks are computational models inspired by the human brain. Learning occurs by adjusting weights based on error, while generalization allows the model to perform well on unseen data. This makes neural networks powerful for prediction tasks.


(b) PCA computation (conceptual explanation)

Principal Component Analysis reduces data dimensionality by identifying directions of maximum variance. It transforms original correlated variables into uncorrelated principal components, simplifying analysis while retaining essential information.


Q5. Attempt any one (7 marks)

(a) Real-Time Analytics Platform (RTAP)

RTAP processes streaming data instantly to generate immediate insights. Applications include fraud detection, smart cities, healthcare monitoring, and online recommendation systems.


(b) Sampling in data streams

Sampling selects representative data from continuous streams to reduce processing load. It helps manage memory, improves efficiency, and still provides accurate analytical insights.


Q6. Attempt any one (7 marks)

(a) Importance of parallelism in clustering

Parallelism speeds up clustering of large datasets by dividing data across multiple processors. Techniques include MapReduce-based clustering and distributed K-means algorithms.


(b) ProCLUS vs CLIQUE

ProCLUS is a subspace clustering algorithm that focuses on relevant dimensions, while CLIQUE identifies dense regions in high-dimensional spaces. ProCLUS is more scalable and efficient for large datasets.


Q7. Attempt any one (7 marks)


(a) Sharding in NoSQL databases

Sharding divides large datasets into smaller parts across multiple servers. It improves scalability, load balancing, and performance when handling massive data volumes.


(b) Hadoop Distributed File System (HDFS)

HDFS stores data across multiple nodes and ensures fault tolerance by replicating data blocks. If one node fails, data is automatically retrieved from another replica, ensuring reliability.

File Size
138.67 KB
Uploader
SuGanta International
⭐ Elite Educators Network

Meet Our Exceptional Teachers

Discover passionate educators who inspire, motivate, and transform learning experiences with their expertise and dedication

KISHAN KUMAR DUBEY

KISHAN KUMAR DUBEY

Sant Ravidas Nagar Bhadohi, Uttar Pradesh , Babusarai Market , 221314
5 Years
Years
₹10000+
Monthly
₹201-300
Per Hour

This is Kishan Kumar Dubey. I have done my schooling from CBSE, graduation from CSJMU, post graduati...

Swethavyas bakka

Swethavyas bakka

Hyderabad, Telangana , 500044
10 Years
Years
₹10000+
Monthly
₹501-600
Per Hour

I have 10+ years of experience in teaching maths physics and chemistry for 10th 11th 12th and interm...

Vijaya Lakshmi

Vijaya Lakshmi

Hyderabad, Telangana , New Nallakunta , 500044
30+ Years
Years
₹9001-10000
Monthly
₹501-600
Per Hour

I am an experienced teacher ,worked with many reputed institutions Mount Carmel Convent , Chandrapu...

Shifna sherin F

Shifna sherin F

Gudalur, Tamilnadu , Gudalur , 643212
5 Years
Years
₹6001-7000
Monthly
₹401-500
Per Hour

Hi, I’m Shifna Sherin! I believe that every student has the potential to excel in Math with the righ...

Divyank Gautam

Divyank Gautam

Pune, Maharashtra , Kothrud , 411052
3 Years
Years
Not Specified
Monthly
Not Specified
Per Hour

An IIT graduate having 8 years of experience teaching Maths. Passionate to understand student proble...

Explore Tutors In Your Location

Discover expert tutors in popular areas across India

Yoga Classes Near By Green Park Elevate Your Physical Strength, Mental Clarity & Lifestyle in 2026 Green Park, Delhi
High Profit Margin Business Opportunities Near Sector 109 Gurugram (Dwarka Expressway) Gurugram
Guitar Classes Near Central Noida Sector 5 – Learn Guitar with Professional Trainers B Block Sector 5, Noida
Dance Classes Near By Najafgarh (Bollywood, Hip-Hop & Classical) Najafgarh, Delhi
Soap Making Classes Near By Dwarka Mor – Learn Handmade & Herbal Soap Crafting Dwarka Mor, Delhi
Violin Classes Near DLF Phase 5 – Learn, Grow & Perform with Confidence DLF Phase V, Gurugram
Web Development Classes Near Noida Sector 103 – Complete Guide to Start Your Tech Career Noida
Singing & Guitar Classes Near Sector 106 Gurugram (Dwarka Expressway) – Discover Your Musical Talent Sector 106, Gurugram
Zumba Classes Near Sector 131 Greater Noida – Enjoy Dance Fitness and Stay Healthy Noida
Cake Decoration Classes Near By Dwarka Mor – Master the Art of Creative Cake Designing Dwarka Mor, Delhi
Zumba Classes Near Sector 130 Greater Noida – Enjoy Dance Fitness and Stay Active Sector 130, Noida
SEO Training Classes Near Kirti Nagar – Master Search Engine Optimization Kirti Nagar, Delhi
Science Classes Near Sector 88A Gurugram – Build Strong Concepts for a Bright Future Sector 88A, Gurugram
Spoken English Classes Near By Sarita Vihar Improve Fluency, Build Confidence & Unlock Career Opportunities in 2026 Sarita Vihar, Delhi
Meditation Coaching Near Sector 126 Noida – A Complete Guide to Mental Wellness and Inner Peace Sector 126, Noida
Yoga Classes Near Sector 137 Greater Noida – Improve Health, Fitness and Mental Well-Being Through Professional Yoga Training Sector 137, Noida
SEO Training Near Sector 63 Gurugram – Master Search Engine Optimization & Build a High-Growth Career Sector 63, Gurugram
IELTS / TOEFL Coaching Near Uttam Nagar – Achieve Your Study Abroad Dream Uttam Nagar, Delhi
Spoken English Classes Near By Kalkaji Improve Fluency, Build Confidence & Grow Career Opportunities in 2026 Kalkaji, Delhi
Guitar Classes Near By Lajpat Nagar Learn Guitar with Expert Trainers & Turn Your Passion into a Powerful Skill Lajpat Nagar, Delhi
⭐ Premium Institute Network

Discover Elite Educational Institutes

Connect with top-tier educational institutions offering world-class learning experiences, expert faculty, and innovative teaching methodologies

Réussi Academy of languages

sugandha mishra

Réussi Academy of languages
Madhya pradesh, Indore, G...

Details

Coaching Center
Private
Est. 2021-Present

Sugandha Mishra is the Founder Director of Réussi Academy of Languages, a premie...

IGS Institute

Pranav Shivhare

IGS Institute
Uttar Pradesh, Noida, Sec...

Details

Coaching Center
Private
Est. 2011-2020

Institute For Government Services

Krishna home tutor

Krishna Home tutor

Krishna home tutor
New Delhi, New Delhi, 110...

Details

School
Private
Est. 2001-2010

Krishna home tutor provide tutors for all subjects & classes since 2001

Edustunt Tuition Centre

Lakhwinder Singh

Edustunt Tuition Centre
Punjab, Hoshiarpur, 14453...

Details

Coaching Center
Private
Est. 2021-Present
Great success tuition & tutor

Ginni Sahdev

Great success tuition & tutor
Delhi, Delhi, Raja park,...

Details

Coaching Center
Private
Est. 2011-2020