(SEM VIII) THEORY EXAMINATION 2021-22 BIG DATA
SECTION A
(Attempt all – 2 marks each)
(a) Apache Hadoop
Apache Hadoop is an open-source framework used to store and process large volumes of data across distributed computer systems using simple programming models.
(b) Big Data
Big Data refers to extremely large and complex datasets that cannot be processed efficiently using traditional data processing tools.
(c) Need of Hadoop
Hadoop is needed to store and process huge amounts of data in a cost-effective, fault-tolerant, and scalable manner using distributed computing.
(d) Digital Data
Digital data is information stored or transmitted in binary form (0s and 1s), such as text files, images, videos, and audio.
(e) Data replication in HDFS
Data replication in HDFS is the process of storing multiple copies of data blocks across different nodes to ensure fault tolerance and data availability.
(f) Serialization in HDFS
Serialization is the process of converting data objects into a byte stream for storage or transmission in Hadoop Distributed File System.
(g) Schedulers
Schedulers manage resource allocation and job execution in Hadoop. Common schedulers include FIFO, Fair Scheduler, and Capacity Scheduler.
(h) NameNode
NameNode is the master node in HDFS that manages metadata, file names, block locations, and access permissions.
(i) ZooKeeper
ZooKeeper is a centralized coordination service used for configuration management, synchronization, and cluster monitoring in Hadoop.
(j) Execution modes of Pig
Pig supports two execution modes: Local mode and MapReduce mode.
SECTION B
(Attempt any three – 10 marks each)
2(a) Views in HIVE and Difference Between Internal and External Tables
Views in HIVE are virtual tables created using SELECT queries. They do not store data physically and simplify complex queries.
Internal tables store both data and metadata in HIVE warehouse, and data is deleted when the table is dropped.
External tables store only metadata in HIVE, while data remains in external storage even after the table is dropped.
2(b) MapReduce Framework and Its Working
MapReduce is a programming model for processing large datasets in parallel. It consists of two main functions: Map and Reduce.
The Map function processes input data and produces key-value pairs. The Reduce function aggregates and processes these pairs to generate final output.
2(c) Structured, Semi-Structured, and Unstructured Data
Structured data is organized in fixed format like tables (e.g., databases).
Semi-structured data has flexible structure like XML and JSON.
Unstructured data has no predefined structure such as videos, images, and social media posts.
2(d) Shuffle & Sort Phase and Reducer Phase
Shuffle & Sort phase transfers intermediate key-value pairs from Mapper to Reducer and sorts them by key.
Reducer phase processes sorted data to generate final output by aggregating values.
2(e) Benefits and 5V’s of Big Data
Big Data helps in better decision-making, cost reduction, improved customer experience, and innovation.
The 5V’s are Volume, Velocity, Variety, Veracity, and Value.
SECTION C
3(a) Hadoop Ecosystem Frameworks and Joins & Subqueries
Hadoop ecosystem includes tools like HDFS, MapReduce, HIVE, PIG, HBase, Sqoop, Flume, and ZooKeeper.
Joins combine data from multiple tables, while subqueries are queries within queries used for filtering or computation.
3(b) Statement for Developing a MapReduce Application
Steps include writing Mapper class, Reducer class, Driver class, setting input/output paths, configuring job, and executing MapReduce program.
4(a) Analytic Processes and Tools in Big Data
Analytic processes include data acquisition, storage, processing, analysis, and visualization.
Tools include Hadoop, Spark, HIVE, Pig, HBase, and NoSQL databases.
4(b) Cluster Specification and Hadoop Cluster Setup
Cluster specification defines hardware, software, nodes, memory, and storage requirements.
Hadoop cluster setup involves configuring NameNode, DataNode, ResourceManager, and NodeManager.
5(a) Master-Slave and Peer-Peer Replication
Master-slave replication has one master controlling data updates.
Peer-to-peer replication allows all nodes to share equal responsibility for data replication.
5(b) HBase Concepts and ZooKeeper Role
HBase is a column-oriented NoSQL database built on HDFS.
ZooKeeper helps in coordination, leader election, and monitoring HBase clusters.
6(a) Anatomy of MapReduce Job Run
A MapReduce job involves job submission, input splitting, mapping, shuffling, reducing, and output generation.
6(b) Analysis vs Reporting
Analysis focuses on discovering insights and patterns, while reporting presents historical data in structured formats.
7(a) Compression and Serialization in Hadoop I/O
Compression reduces data size for faster processing and storage efficiency.
Serialization converts objects into byte streams for data transfer.
7(b) HBase Storage Mechanism and Table Creation Query
HBase stores data in tables, column families, rows, and cells.
Query to create table:
create 'student','info'
Related Notes
BASIC ELECTRICAL ENGINEERING
ENGINEERING PHYSICS THEORY EXAMINATION 2024-25
(SEM I) ENGINEERING CHEMISTRY THEORY EXAMINATION...
THEORY EXAMINATION 2024-25 ENGINEERING MATHEMATICS...
(SEM I) THEORY EXAMINATION 2024-25 ENGINEERING CHE...
(SEM I) THEORY EXAMINATION 2024-25 ENVIRONMENT AND...
Need more notes?
Return to the notes store to keep exploring curated study material.
Back to Notes StoreLatest Blog Posts
Best Home Tutors for Class 12 Science in Dwarka, Delhi
Top Universities in Chennai for Postgraduate Courses with Complete Guide
Best Home Tuition for Competitive Exams in Dwarka, Delhi
Best Online Tutors for Maths in Noida 2026
Best Coaching Centers for UPSC in Rajender Place, Delhi 2026
How to Apply for NEET in Gurugram, Haryana for 2026
Admission Process for BTech at NIT Warangal 2026
Best Home Tutors for JEE in Maharashtra 2026
Meet Our Exceptional Teachers
Discover passionate educators who inspire, motivate, and transform learning experiences with their expertise and dedication
Explore Tutors In Your Location
Discover expert tutors in popular areas across India
Discover Elite Educational Institutes
Connect with top-tier educational institutions offering world-class learning experiences, expert faculty, and innovative teaching methodologies