(SEM VIII) THEORY EXAMINATION 2022-2023 BIG DATA
SECTION A
(Attempt all | 2 × 10 = 20 Marks)
(a) Benefits of HDFS over NFS
HDFS provides high fault tolerance, scalability, and distributed storage across multiple nodes, whereas NFS is centralized and less reliable for large-scale data processing. HDFS is optimized for big data analytics and parallel processing.
(b) Structured, Semi-Structured & Unstructured Data
Structured data follows a fixed schema (tables, rows). Semi-structured data uses tags or markers (XML, JSON). Unstructured data has no predefined format, such as images, videos, and social media data.
(c) Sources of data in Big Data
Sources include social media, sensors and IoT devices, transaction logs, web clickstreams, mobile devices, multimedia content, and enterprise applications.
(d) Metadata in HDFS
Metadata stores information about files such as file name, size, permissions, block locations, and replication details. It is maintained by the NameNode.
(e) Map vs Reduce
Map processes input data and converts it into key-value pairs. Reduce aggregates and processes these key-value pairs to produce final output.
(f) Indexing
Indexing is a technique used to improve data retrieval speed by creating a data structure that allows quick access to records.
(g) Shuffle vs Sort
Shuffle transfers intermediate map outputs to reducers. Sort arranges the data by keys before reduction.
(h) TF-IDF
TF-IDF (Term Frequency–Inverse Document Frequency) measures the importance of a word in a document relative to a collection of documents.
(i) NameNode, DataNode, JobTracker, TaskTracker
NameNode manages metadata, DataNode stores data blocks, JobTracker manages MapReduce jobs, and TaskTracker executes tasks on slave nodes.
(j) File name and block size
Windows: Max filename 255 chars, block size ~4 KB
Linux: Max filename 255 chars, block size ~4 KB
Hadoop: Block size 128 MB (default), large block size for efficient processing
SECTION B
(Attempt any THREE | 10 × 3 = 30 Marks)
2(a) 5 Vs of Big Data and their importance
The 5 Vs are Volume (huge data size), Velocity (speed of data generation), Variety (multiple data types), Veracity (data quality), and Value (useful insights). These characteristics explain why traditional systems fail and why specialized Big Data tools are required.
2(b) History and evolution of Hadoop
Hadoop originated from Google’s GFS and MapReduce papers. It evolved into Apache Hadoop, providing open-source distributed storage (HDFS) and processing (MapReduce), later enhanced with YARN, Spark, Hive, and HBase.
2(c) Data replication in HDFS
Replication stores multiple copies of data blocks across different nodes. Benefits include fault tolerance and high availability. Challenges include increased storage cost and network overhead.
2(d) Fair vs Capacity Scheduler in YARN
Fair Scheduler ensures equal resource sharing among applications, while Capacity Scheduler allocates fixed resources to queues. Fair Scheduler is flexible; Capacity Scheduler is suitable for large organizations.
2(e) Pig and its execution modes
Apache Pig uses Pig Latin for data processing.
Execution modes: Local Mode MapReduce Mode
Pig simplifies complex data flows compared to traditional databases, which use SQL and structured schema.
SECTION C
3(a) Security, compliance, auditing & protection in Big Data
Big Data security includes authentication, authorization, encryption, auditing, and compliance. Key features are data privacy, secure access control, regulatory compliance, and monitoring using tools like Kerberos and Ranger.
3(b) Challenges of conventional data systems
Traditional systems lack scalability, flexibility, and performance for large datasets. Big Data solves these using distributed storage, parallel processing, and fault tolerance.
4(a) Hadoop Distributed File System
HDFS stores large data across clusters using replication and parallel access. It supports scalability, fault tolerance, and high-throughput processing.
4(b) Anatomy of a MapReduce job
Input split → Map → Shuffle → Sort → Reduce → Output. JobTracker coordinates tasks while TaskTrackers execute them.
5(a) Data ingestion methods: Flume & Sqoop
Flume ingests streaming data like logs, while Sqoop transfers structured data between RDBMS and Hadoop.
5(b) Hadoop I/O support
Hadoop supports compression, serialization, Avro for schema-based storage, and file formats like SequenceFiles and Parquet.
6(a) NoSQL & MongoDB
MongoDB stores data as documents using JSON-like format. It supports CRUD operations, flexible schema, indexing, and high scalability.
6(b) Scala features
Scala supports object-oriented and functional programming, classes, objects, closures, pattern matching, and is tightly integrated with Spark.
7(a) HBase vs RDBMS
HBase is distributed, schema-less, and scalable, while RDBMS is centralized and schema-based. HBase offers advanced indexing and column-family design.
7(b) Role of ZooKeeper
ZooKeeper manages configuration, synchronization, leader election, and monitoring in Hadoop clusters. It helps build reliable distributed applications.
Related Notes
BASIC ELECTRICAL ENGINEERING
ENGINEERING PHYSICS THEORY EXAMINATION 2024-25
(SEM I) ENGINEERING CHEMISTRY THEORY EXAMINATION...
THEORY EXAMINATION 2024-25 ENGINEERING MATHEMATICS...
(SEM I) THEORY EXAMINATION 2024-25 ENGINEERING CHE...
(SEM I) THEORY EXAMINATION 2024-25 ENVIRONMENT AND...
Need more notes?
Return to the notes store to keep exploring curated study material.
Back to Notes StoreLatest Blog Posts
Best Home Tutors for Class 12 Science in Dwarka, Delhi
Top Universities in Chennai for Postgraduate Courses with Complete Guide
Best Home Tuition for Competitive Exams in Dwarka, Delhi
Best Online Tutors for Maths in Noida 2026
Best Coaching Centers for UPSC in Rajender Place, Delhi 2026
How to Apply for NEET in Gurugram, Haryana for 2026
Admission Process for BTech at NIT Warangal 2026
Best Home Tutors for JEE in Maharashtra 2026
Meet Our Exceptional Teachers
Discover passionate educators who inspire, motivate, and transform learning experiences with their expertise and dedication
Explore Tutors In Your Location
Discover expert tutors in popular areas across India
Discover Elite Educational Institutes
Connect with top-tier educational institutions offering world-class learning experiences, expert faculty, and innovative teaching methodologies