(SEM VIII) THEORY EXAMINATION 2023-24 BIG DATA
SECTION A
(Attempt all | 2 × 10 = 20 Marks)
a. List any five Big Data platforms
Apache Hadoop, Apache Spark, Apache Flink, Apache Storm, Google BigQuery.
b. Importance of Hadoop technology in Big Data Analytics
Hadoop enables distributed storage and parallel processing of large datasets at low cost, providing scalability, fault tolerance, and high availability.
c. Three benefits of MapReduce
MapReduce offers scalability, fault tolerance, and parallel processing of large data across clusters.
d. Define heartbeat in HDFS
Heartbeat is a periodic signal sent by DataNodes to NameNode to indicate that they are active and functioning properly.
e. List any five Big Data platforms
Apache Hadoop, Apache Spark, Cassandra, MongoDB, Amazon EMR.
f. Define data replication in HDFS
Data replication is the process of storing multiple copies of data blocks on different DataNodes to ensure fault tolerance and data availability.
g. Name any two data ingestion tools in Hadoop
Apache Flume and Apache Sqoop.
h. Compare NoSQL and Relational Databases
Relational databases use structured schema and SQL, while NoSQL databases support flexible schema and handle large-scale unstructured data.
i. Advantages of Scala over Java
Scala supports functional programming, concise syntax, immutability, and better performance with Apache Spark.
j. Differentiate between Pig and Hive
Pig uses procedural language (Pig Latin) for data flow, while Hive uses declarative SQL-like language (HiveQL) for querying data.
SECTION B
(Attempt any THREE | 3 × 10 = 30 Marks)
2(a) Structured, Semi-Structured & Unstructured Data
Structured data is organized in rows and columns such as databases and spreadsheets.
Semi-structured data has tags or markers like XML and JSON files.
Unstructured data has no predefined format, such as videos, images, emails, and social media posts.
Big Data technologies handle all three types efficiently.
2(b) Anatomy of a MapReduce Job Run
A MapReduce job begins with data input split into blocks. The Map phase processes data into key-value pairs. The Shuffle and Sort phase groups similar keys. The Reduce phase aggregates results and stores output in HDFS. The JobTracker coordinates tasks, while TaskTrackers execute them.
2(c) Design and Concept of HDFS
HDFS follows a master-slave architecture with NameNode managing metadata and DataNodes storing data blocks. Data is stored in large blocks with replication. HDFS provides high fault tolerance, scalability, and is optimized for batch processing.
2(d) CRUD operations in MongoDB
CRUD stands for Create, Read, Update, and Delete. Create inserts documents using insert().
Read retrieves data using find(). Update modifies documents using update().
Delete removes documents using delete(). MongoDB stores data in flexible JSON-like documents.
2(e) Role of ZooKeeper in HBase
ZooKeeper manages coordination, synchronization, configuration, and leader election among HBase components. It ensures reliability, fault tolerance, and consistency in distributed environments.
SECTION C
3(a) 5 Vs of Big Data and their implications
The 5 Vs are:
Volume: Huge amount of data Velocity: Speed of data generation
Variety: Different data formats Veracity: Data quality and accuracy
Value: Useful insights from data
These characteristics require specialized tools for storage, processing, and analytics.
3(b) Components of Big Data Architecture
Components include data sources, data ingestion layer, storage layer (HDFS/NoSQL), processing layer (MapReduce/Spark), analytics layer, and visualization tools. Together they enable end-to-end Big Data processing.
4(a) HDFS Architecture & Fault Tolerance
HDFS uses NameNode, DataNode, and Secondary NameNode. Fault tolerance is achieved using data replication, heartbeat monitoring, and automatic re-replication of failed blocks.
4(b) Hadoop Streaming and Pipes
Hadoop Streaming allows MapReduce programs in languages like Python or Perl. Pipes enable C/C++ programs to interact with Hadoop via standard input/output.
5(a) Client Read and Write Operations in HDFS
For write operation, the client contacts NameNode for metadata and writes data to DataNodes in a pipeline.
For read operation, the client fetches metadata from NameNode and reads data directly from nearest DataNode.
5(b) Cluster Specification & Hadoop Cluster Setup
Cluster specification includes number of nodes, CPU, RAM, storage, and network bandwidth.
Setting up Hadoop cluster involves installing Hadoop, configuring core-site.xml, hdfs-site.xml, yarn-site.xml, formatting NameNode, and starting Hadoop services.
6(a) Features of Apache Spark & Integration with Hadoop
Spark provides in-memory processing, high speed, fault tolerance, and supports batch, streaming, ML, and graph processing.
Spark can work with HDFS, YARN, and Hadoop MapReduce.
6(b) NameNode High Availability & HDFS Federation
High Availability removes single point of failure using Active-Standby NameNodes.
HDFS Federation allows multiple NameNodes to manage separate namespaces for scalability.
7(a) Need of Pig & Execution Modes
Pig simplifies complex data processing using Pig Latin.
Execution modes are: Local mode
MapReduce mode Tez mode
7(b) Apache Hive Architecture
Hive architecture includes UI, Driver, Compiler, Metastore, Execution Engine, and HDFS. Hive converts HiveQL queries into MapReduce or Spark jobs for execution.
Related Notes
BASIC ELECTRICAL ENGINEERING
ENGINEERING PHYSICS THEORY EXAMINATION 2024-25
(SEM I) ENGINEERING CHEMISTRY THEORY EXAMINATION...
THEORY EXAMINATION 2024-25 ENGINEERING MATHEMATICS...
(SEM I) THEORY EXAMINATION 2024-25 ENGINEERING CHE...
(SEM I) THEORY EXAMINATION 2024-25 ENVIRONMENT AND...
Need more notes?
Return to the notes store to keep exploring curated study material.
Back to Notes StoreLatest Blog Posts
Best Home Tutors for Class 12 Science in Dwarka, Delhi
Top Universities in Chennai for Postgraduate Courses with Complete Guide
Best Home Tuition for Competitive Exams in Dwarka, Delhi
Best Online Tutors for Maths in Noida 2026
Best Coaching Centers for UPSC in Rajender Place, Delhi 2026
How to Apply for NEET in Gurugram, Haryana for 2026
Admission Process for BTech at NIT Warangal 2026
Best Home Tutors for JEE in Maharashtra 2026
Meet Our Exceptional Teachers
Discover passionate educators who inspire, motivate, and transform learning experiences with their expertise and dedication
Explore Tutors In Your Location
Discover expert tutors in popular areas across India
Discover Elite Educational Institutes
Connect with top-tier educational institutions offering world-class learning experiences, expert faculty, and innovative teaching methodologies