(SEM VI) THEORY EXAMINATION 2023-24 BIG DATA AND ANALYTICS
KDS601 – BIG DATA AND ANALYTICS (B.Tech Sem VI, 2023–24)
All answers are written in simple, clear, humanized language (not short bullet points) and are prepared strictly according to the uploaded question paper (Page 1).
Reference: Uploaded Question Paper
KDS601-BIG-DATA-AND-ANALYTICS
SECTION A
Attempt all questions in brief (2 × 10 = 20 marks)
(a) Differences between structured, semi-structured, and unstructured data
Structured data is highly organized and stored in rows and columns, such as data in relational databases. Semi-structured data does not follow a fixed schema but uses tags or markers, for example JSON and XML files. Unstructured data has no predefined format and includes text documents, images, videos, emails, and social media content.
(b) Drivers of Big Data
Big Data is driven by the rapid growth of social media, mobile devices, IoT sensors, cloud computing, digital transactions, and online services. These sources continuously generate large volumes of diverse and fast-moving data.
(c) Core functionalities of Apache Hadoop
Apache Hadoop provides distributed data storage using HDFS, parallel data processing using MapReduce, resource management using YARN, and fault tolerance through data replication across multiple nodes.
(d) Importance of Hadoop data format
Hadoop data formats such as Avro, Parquet, and ORC affect storage efficiency, compression, and processing speed. Optimized formats reduce disk usage and improve query performance, making large-scale data analysis faster and more efficient.
(e) Steps to import data from RDBMS to Hadoop using Sqoop
First, Sqoop establishes a connection with the RDBMS. Then, it analyzes table metadata, divides data into splits, launches parallel Map tasks, and finally imports the data into HDFS or Hive tables.
(f) How a file system works
A file system manages how data is stored, organized, and retrieved. It maintains file names, directories, permissions, and metadata, and ensures efficient access and storage of data on physical disks.
(g) Fair Scheduler vs Capacity Scheduler in Hadoop
The Fair Scheduler allocates resources equally among running jobs, ensuring fairness. The Capacity Scheduler divides resources into queues with fixed capacities, ensuring guaranteed resource availability for different organizations.
(h) Data types used in MongoDB
MongoDB supports data types such as String, Integer, Boolean, Double, Date, Array, Object, Null, and ObjectId, allowing flexible schema design.
(i) HiveQL and its key features
HiveQL is a SQL-like query language used in Apache Hive. It supports table creation, querying, partitioning, and integration with Hadoop, allowing users to analyze big data without writing complex MapReduce code.
(j) Data processing operators in Pig
Pig supports operators such as LOAD, FILTER, GROUP, JOIN, FOREACH, ORDER, DISTINCT, and STORE to perform data transformation and analysis.
SECTION B
Attempt any three (10 × 3 = 30 marks)
(a) Why Big Data is crucial for modern businesses and industries
Big Data helps organizations analyze customer behavior, optimize operations, improve decision-making, and gain competitive advantage. Industries use Big Data for fraud detection, predictive maintenance, personalized marketing, risk analysis, and innovation. Data-driven insights enable faster and more accurate business strategies.
(b) Hadoop Distributed File System (HDFS) and its working
HDFS is a distributed storage system designed for large datasets. Files are divided into blocks and stored across multiple DataNodes. The NameNode manages metadata, while DataNodes store actual data. Replication ensures fault tolerance and reliability even if a node fails.
(c) Benefits and challenges of using HDFS
HDFS provides scalability, fault tolerance, and cost-effective storage. It supports parallel processing of big data. However, it is not suitable for small files, real-time access, or low-latency applications and requires skilled administration.
(d) NoSQL databases vs traditional RDBMS
Traditional RDBMS use fixed schemas, SQL queries, and vertical scaling. NoSQL databases offer flexible schemas, horizontal scaling, and high availability. NoSQL systems are ideal for big data applications like social media, IoT, and real-time analytics.
(e) Role of ZooKeeper in Hadoop cluster monitoring
ZooKeeper coordinates distributed applications by providing configuration management, synchronization, leader election, and fault detection. It ensures high availability and consistency across Hadoop clusters.
SECTION C
3(a) Big Data analytics vs traditional data analytics
Traditional analytics deals with structured, small-scale data using centralized systems. Big Data analytics handles massive, diverse, and fast-moving data using distributed systems. Tools like Hadoop, Spark, Hive, and Pig are used in Big Data analytics, while traditional analytics relies on RDBMS and data warehouses.
3(b) Big Data features: Security, Protection, and Auditing
Big Data security includes authentication, authorization, encryption, and access control. Data protection ensures confidentiality and integrity, while auditing tracks user actions and data access to ensure compliance and accountability.
Related Notes
BASIC ELECTRICAL ENGINEERING
ENGINEERING PHYSICS THEORY EXAMINATION 2024-25
(SEM I) ENGINEERING CHEMISTRY THEORY EXAMINATION...
THEORY EXAMINATION 2024-25 ENGINEERING MATHEMATICS...
(SEM I) THEORY EXAMINATION 2024-25 ENGINEERING CHE...
(SEM I) THEORY EXAMINATION 2024-25 ENVIRONMENT AND...
Need more notes?
Return to the notes store to keep exploring curated study material.
Back to Notes StoreLatest Blog Posts
Best Home Tutors for Class 12 Science in Dwarka, Delhi
Top Universities in Chennai for Postgraduate Courses with Complete Guide
Best Home Tuition for Competitive Exams in Dwarka, Delhi
Best Online Tutors for Maths in Noida 2026
Best Coaching Centers for UPSC in Rajender Place, Delhi 2026
How to Apply for NEET in Gurugram, Haryana for 2026
Admission Process for BTech at NIT Warangal 2026
Best Home Tutors for JEE in Maharashtra 2026
Meet Our Exceptional Teachers
Discover passionate educators who inspire, motivate, and transform learning experiences with their expertise and dedication
Explore Tutors In Your Location
Discover expert tutors in popular areas across India
Discover Elite Educational Institutes
Connect with top-tier educational institutions offering world-class learning experiences, expert faculty, and innovative teaching methodologies