(SEM VI) THEORY EXAMINATION 2021-22 BIG DATA

B.Tech Data Structure 0 downloads
₹29.00

BIG DATA (KCS-061)

B.Tech Semester VI – Theory Examination (2021–22) 


BIG-DATA-KCS061


Big Data is an interdisciplinary subject that deals with the storage, processing, analysis, and management of extremely large and complex data sets that cannot be efficiently handled using traditional data processing techniques. With the exponential growth of data generated from social media, sensors, mobile devices, transactions, and online services, organizations require scalable and distributed systems to extract meaningful insights from data. Big Data technologies such as Hadoop, HDFS, Map-Reduce, Hive, Pig, and NoSQL databases like MongoDB provide cost-effective and fault-tolerant solutions for handling massive volumes of structured, semi-structured, and unstructured data.


The uploaded question paper clearly indicates that the examination focuses on Big Data fundamentals, Hadoop ecosystem, HDFS architecture, Map-Reduce processing, Hive and Pig frameworks, NoSQL databases, and MongoDB operations. To score well, answers must be written in clear, logically connected paragraphs, with conceptual explanations, architecture discussion, and suitable examples wherever required.

 

SECTION A – FUNDAMENTAL BIG DATA CONCEPTS

(Based on Section A on Page-1 of the paper) 

 

Big Data platforms refer to software frameworks that enable distributed storage and processing of large data sets. Examples include Hadoop, Spark, Cassandra, HBase, and Flink, all of which support scalability and fault tolerance.
 

Big Data finds extensive application across industries. In healthcare, it is used for patient data analysis and disease prediction, while in e-commerce it supports recommendation systems, customer behavior analysis, and demand forecasting.
 

In Map-Reduce, Sort and Shuffle play a critical role between the Map and Reduce phases. After the Map phase generates intermediate key-value pairs, the framework automatically sorts the data by keys and shuffles it across the network so that all values corresponding to the same key reach the same reducer. This process ensures correctness and efficiency of parallel processing.
 

The full form of HDFS is Hadoop Distributed File System, which is designed to store very large files across clusters of commodity hardware while providing high throughput access.
 

The default block size of HDFS is 128 MB, which is significantly larger than traditional file systems. This large block size reduces the number of disk seeks and improves performance for large sequential reads.

Hadoop consists of two main types of nodes: NameNode and DataNode. The NameNode manages metadata and namespace information, while DataNodes store the actual data blocks.
 

NoSQL databases differ from relational databases in terms of schema flexibility, scalability, and consistency models. While relational databases follow a fixed schema and ACID properties, NoSQL databases offer schema-less design, horizontal scalability, and are optimized for distributed environments.

MongoDB provides limited support for ACID properties. While it ensures atomicity and consistency at the document level, it relaxes strict transactional guarantees across multiple documents to achieve higher scalability and performance.
 

A schema defines the logical structure of data, including fields, data types, and relationships. In Big Data systems, schema can be enforced at write time or read time, providing flexibility in handling diverse data.

Hive can handle structured, semi-structured, and unstructured data by using schema-on-read, making it suitable for data warehousing on Hadoop.
 

SECTION B – BIG DATA ARCHITECTURE AND PROCESSING

(Based on Section B on Page-1) 

 

The three dimensions of Big Data are Volume, Velocity, and Variety. Volume refers to massive data sizes, velocity refers to the speed at which data is generated and processed, and variety refers to different data formats such as text, images, videos, and logs.
 

The Map-Reduce architecture consists of a client, JobTracker or ResourceManager, TaskTrackers or NodeManagers, and HDFS. The client submits a job, which is divided into map and reduce tasks. These tasks are executed in parallel across the cluster, enabling efficient large-scale processing.
 

In HDFS, when a client reads data, it first contacts the NameNode to obtain metadata and block locations. The actual data is then read directly from DataNodes. During write operations, data is split into blocks and replicated across multiple DataNodes to ensure fault tolerance.
 

CRUD operations in MongoDB include insert, read, update, and delete operations on documents stored in collections. For example, inserting a document involves adding a JSON-like object into a collection, which allows flexible schema design.
 

Map-Reduce, Pig, and Hive differ in abstraction level. Map-Reduce is a low-level programming model, Pig provides a scripting language for data flow, and Hive offers SQL-like querying through HiveQL, making it more user-friendly.
 

SECTION C – BIG DATA STORAGE AND ANALYTICS

(Based on Section C and subsequent questions) 

 

Big Data exists in multiple forms, including structured data like relational tables, semi-structured data like XML and JSON, and unstructured data like images, audio, and videos. Each form requires different processing approaches.
 

The Big Data architecture consists of data sources, data ingestion layer, storage layer such as HDFS, processing layer such as Map-Reduce or Spark, and analytics and visualization layer.
 

The detailed architecture of Map-Reduce includes job submission, input splitting, mapping, sorting, shuffling, reducing, and output generation. This pipeline enables efficient distributed computation.

Scale-up involves increasing resources on a single machine, whereas scale-out involves adding more machines to a cluster. Hadoop uses scale-out architecture by distributing data and computation across multiple nodes, improving performance and fault tolerance.
 

HDFS is designed with a master-slave architecture, where the NameNode manages metadata and DataNodes store data blocks. Replication ensures data reliability even if nodes fail.
 

Benefits of HDFS include scalability, fault tolerance, and cost effectiveness, while challenges include latency for small files and NameNode single point of failure.
 

NoSQL databases are classified into key-value stores, document stores, column-family stores, and graph databases. MongoDB falls under document-oriented NoSQL databases.
 

Indexing in MongoDB improves query performance by allowing faster data retrieval. For example, indexing a frequently queried field reduces search time significantly.
 

Pig execution models include local mode and Map-Reduce mode, where scripts are translated into Map-Reduce jobs.
 

Hive architecture includes components such as HiveQL engine, Metastore, Driver, Compiler, and Execution Engine, enabling SQL-like querying on HDFS data.
 

HOW TO WRITE BIG DATA ANSWERS IN THE EXAM
 

In Big Data, never write answers in short bullet points. Always start with a clear definition, followed by detailed explanation of architecture, working principles, and examples. Use correct terminology such as HDFS, Map-Reduce, schema-on-read, NoSQL, HiveQL, and scalability. Examiners give maximum weightage to conceptual clarity, architecture explanation, and practical understanding of Hadoop ecosystem.

File Size
121.71 KB
Uploader
SuGanta International
⭐ Elite Educators Network

Meet Our Exceptional Teachers

Discover passionate educators who inspire, motivate, and transform learning experiences with their expertise and dedication

KISHAN KUMAR DUBEY

KISHAN KUMAR DUBEY

Sant Ravidas Nagar Bhadohi, Uttar Pradesh , Babusarai Market , 221314
5 Years
Years
₹10000+
Monthly
₹201-300
Per Hour

This is Kishan Kumar Dubey. I have done my schooling from CBSE, graduation from CSJMU, post graduati...

Swethavyas bakka

Swethavyas bakka

Hyderabad, Telangana , 500044
10 Years
Years
₹10000+
Monthly
₹501-600
Per Hour

I have 10+ years of experience in teaching maths physics and chemistry for 10th 11th 12th and interm...

Vijaya Lakshmi

Vijaya Lakshmi

Hyderabad, Telangana , New Nallakunta , 500044
30+ Years
Years
₹9001-10000
Monthly
₹501-600
Per Hour

I am an experienced teacher ,worked with many reputed institutions Mount Carmel Convent , Chandrapu...

Shifna sherin F

Shifna sherin F

Gudalur, Tamilnadu , Gudalur , 643212
5 Years
Years
₹6001-7000
Monthly
₹401-500
Per Hour

Hi, I’m Shifna Sherin! I believe that every student has the potential to excel in Math with the righ...

Divyank Gautam

Divyank Gautam

Pune, Maharashtra , Kothrud , 411052
3 Years
Years
Not Specified
Monthly
Not Specified
Per Hour

An IIT graduate having 8 years of experience teaching Maths. Passionate to understand student proble...

Explore Tutors In Your Location

Discover expert tutors in popular areas across India

Spoken English Classes Near Tilak Nagar – Speak Fluently & Confidently Tilak Nagar, Delhi
App Development Course Near Sector 60 Gurugram – Build Android & iOS Apps with Industry Experts Gurugram
🇯🇵 Japanese Language Classes Near Sector 54 Gurugram – Learn Japanese with Expert Guidance Gurugram
Cake Decoration Classes Near By Dwarka Mor – Master the Art of Creative Cake Designing Dwarka Mor, Delhi
Japanese Language Classes Near Uttam Nagar – Learn Japanese for Global Opportunities Uttam Nagar, Delhi
Spoken English Classes Near By Green Park Build Fluency, Confidence & Professional Communication Skills in 2026 Green Park, Delhi
Yoga Classes Near Saket Transform Your Mind, Body & Lifestyle with Professional Yoga Training in 2026 Saket, Delhi
Yoga Classes Near By Tilak Nagar Holistic Wellness, Stress Relief & Stronger Mind-Body Balance Tilak Nagar, Delhi
SEO Training Classes Near Kirti Nagar – Master Search Engine Optimization Kirti Nagar, Delhi
Candle Making Classes Near Sector 83 Gurugram – Learn the Art of Handmade Candles Gurugram
Zumba Classes Near Palam Vihar Extension – Dance Your Way to Fitness New Palam Vihar, Gurugram
Dance Classes Near By Najafgarh (Bollywood, Hip-Hop & Classical) Najafgarh, Delhi
🇫🇷 French Language Classes Near Sector 112 Noida – Learn French with Expert Trainers Noida
Vedic Maths Classes Near By Dwarka Mor Improve Speed, Accuracy & Confidence in Mathematics Dwarka Mor, Delhi
🇩🇪 German Language Classes Near By Uttam Nagar – Learn German with Confidence Uttam Nagar, Delhi
Resume & Interview Coaching Near By Dwarka Mor Build a Professional Resume, Crack Interviews & Secure Your Dream Job Dwarka Mor, Delhi
Spoken English Classes Near By Subhash Nagar Improve Fluency, Build Confidence & Unlock Career Opportunities in 2026 Subhash Nagar, Delhi
Resume & Interview Coaching Near By Sector 102 Gurugram (Dwarka Expressway) – Build Confidence, Crack Interviews, Get Hired Sector 102, Gurugram
Maths Coaching Near By Dwarka Mor – Build Strong Concepts & Score Higher Dwarka Mor, Delhi
Home Tuition (All Subjects) Near Dwarka Mor – Personalized Learning for Academic Success Dwarka Mor, Delhi
⭐ Premium Institute Network

Discover Elite Educational Institutes

Connect with top-tier educational institutions offering world-class learning experiences, expert faculty, and innovative teaching methodologies

Réussi Academy of languages

sugandha mishra

Réussi Academy of languages
Madhya pradesh, Indore, G...

Details

Coaching Center
Private
Est. 2021-Present

Sugandha Mishra is the Founder Director of Réussi Academy of Languages, a premie...

IGS Institute

Pranav Shivhare

IGS Institute
Uttar Pradesh, Noida, Sec...

Details

Coaching Center
Private
Est. 2011-2020

Institute For Government Services

Krishna home tutor

Krishna Home tutor

Krishna home tutor
New Delhi, New Delhi, 110...

Details

School
Private
Est. 2001-2010

Krishna home tutor provide tutors for all subjects & classes since 2001

Edustunt Tuition Centre

Lakhwinder Singh

Edustunt Tuition Centre
Punjab, Hoshiarpur, 14453...

Details

Coaching Center
Private
Est. 2021-Present
Great success tuition & tutor

Ginni Sahdev

Great success tuition & tutor
Delhi, Delhi, Raja park,...

Details

Coaching Center
Private
Est. 2011-2020