(SEM VI) THEORY EXAMINATION 2017-18 BIG DATA

B.Tech Data Structure 0 downloads
₹29.00

Big Data (NIT-067)

Complete Section-Wise Explanation – B.Tech Semester VI


Introduction to the Subject


Big Data as a subject focuses on understanding how extremely large, fast-growing, and complex data sets are stored, processed, and analyzed to extract meaningful insights. Traditional databases and systems are not capable of handling such data efficiently, which is why distributed systems like Hadoop, MapReduce, HDFS, Hive, HBase, and NoSQL databases are used.


This paper tests:

Conceptual understanding of Big Data fundamentals       Knowledge of Hadoop ecosystem

Data storage and processing models                                 MapReduce working and design

NoSQL and graph databases                                              Real-world Big Data applications


The paper is divided into three sections: A, B, and C, and students must attempt questions as instructed.


SECTION A – Basic Concepts & Definitions


Pattern:
Attempt all questions
10 questions × 2 marks = 20 marks

Nature of Section A


Section A checks whether your basic concepts are clear. Answers should be short but meaningful. Even though the questions are brief, clarity is extremely important.

Explanation of Section A Topics


What is Big Data and why do we analyze it?
Big Data refers to extremely large and complex datasets that cannot be processed using traditional tools. We analyze Big Data to discover patterns, trends, user behavior, and insights that help in decision-making, prediction, and optimization.


Data Locality Optimization
Data locality means moving computation closer to where the data is stored instead of moving data across the network. This improves performance and reduces network congestion in distributed systems like Hadoop.

 

Tools related to Hadoop
Hadoop ecosystem includes tools such as HDFS, MapReduce, YARN, Hive, Pig, HBase, Sqoop, Flume, and Oozie, each serving a specific purpose in data storage, processing, or management.


Purpose of Hadoop Pipes
Hadoop Pipes allow developers to write MapReduce programs in languages like C++ instead of Java, improving flexibility.


Map Reducing (MapReduce)
MapReduce is a programming model that processes large data sets by dividing tasks into Map and Reduce phases, enabling parallel computation.


Operational vs Analytical Systems
Operational systems handle daily transactions like banking or billing, while analytical systems process historical data for reporting, analysis, and decision-making.


Hadoop Distributed File System (HDFS)
HDFS is a distributed file system that stores data across multiple nodes with replication to ensure fault tolerance and high availability.


Industry Examples of Big Data
Big Data is used in healthcare, banking, e-commerce, social media, telecom, and transportation sectors.


Entities of YARN
YARN includes Resource Manager, Node Manager, Application Master, and Container, which together manage cluster resources.


Hadoop Architecture
Hadoop architecture consists of HDFS for storage, YARN for resource management, and MapReduce for data processing.


SECTION B – Conceptual & System-Level Understanding


Pattern:
Attempt any three questions
3 questions × 10 marks = 30 marks

Nature of Section B

This section requires descriptive answers written in paragraphs. Students must explain concepts clearly with examples and proper flow.

Explanation of Major Section B Topics


Crowd Sourcing Analytics
Crowd sourcing analytics involves collecting and analyzing data generated by a large number of people through social media, surveys, mobile apps, and online platforms. It helps organizations understand public opinion, trends, and collective behavior at scale.


Relationship Between Cloud and Big Data
Cloud computing provides scalable infrastructure for storing and processing Big Data. Big Data applications rely on cloud resources for elasticity, cost efficiency, and high availability, while cloud platforms benefit from Big Data-driven insights.


Design of HDFS
HDFS follows a master-slave architecture. The NameNode manages metadata, while DataNodes store actual data blocks. Data is divided into blocks and replicated across nodes to ensure reliability and fault tolerance.


Hive Data Definition Queries
Hive uses HiveQL, which is similar to SQL. Data definition queries include CREATE, DROP, ALTER, and DESCRIBE, allowing structured access to large datasets stored in HDFS.


HBase and Pig Data Models
HBase uses a column-oriented data model suitable for real-time read/write access. Pig uses a high-level scripting language (Pig Latin) that simplifies MapReduce programming.


SECTION C – Advanced Analysis & Architecture

Pattern:
Attempt one part from each question
5 questions × 10 marks = 50 marks

This section carries the maximum marks and determines overall performance.


Question 3

How Hadoop Analyzes Data
Hadoop analyzes data by breaking it into smaller chunks stored in HDFS and processing them in parallel using MapReduce. The Map phase processes data blocks, while the Reduce phase aggregates results.


Cassandra Data Model
Cassandra uses a peer-to-peer distributed architecture. Data is stored in tables with rows and columns, optimized for high availability and scalability without a single point of failure.


Question 4

Anatomy of a MapReduce Job Run
A MapReduce job involves job submission, input splitting, mapping, shuffling, sorting, reducing, and final output generation. Each step is managed by YARN for resource allocation.


Types and Formats of MapReduce
MapReduce supports text, sequence, and binary formats. Different data types require different input and output formats for efficient processing.


Question 5

Data Model: Aggregations and Relations
Aggregations summarize large datasets, while relations define connections between data entities. These concepts are crucial for analytics and reporting.


Composing MapReduce Calculations
Complex MapReduce jobs can be composed by chaining multiple jobs where the output of one becomes the input of another.


Question 6

Master-Slave and Peer-to-Peer Replication
Master-slave replication uses a central controller, while peer-to-peer replication distributes control across nodes, improving fault tolerance.


Three Dimensions of Big Data
The three dimensions are Volume, Velocity, and Variety, representing data size, speed, and diversity.


Question 7

Graph Databases and Schema-less Databases
Graph databases store data as nodes and edges, ideal for relationship-based data. Schema-less databases offer flexibility by not enforcing fixed data structures.


Graph Mapping Schemas and Replication Rate
Graph mapping schemas define how graph data is structured. Lower bound replication rate refers to the minimum data duplication required to maintain availability and performance.

File Size
111.4 KB
Uploader
SuGanta International
⭐ Elite Educators Network

Meet Our Exceptional Teachers

Discover passionate educators who inspire, motivate, and transform learning experiences with their expertise and dedication

KISHAN KUMAR DUBEY

KISHAN KUMAR DUBEY

Sant Ravidas Nagar Bhadohi, Uttar Pradesh , Babusarai Market , 221314
5 Years
Years
₹10000+
Monthly
₹201-300
Per Hour

This is Kishan Kumar Dubey. I have done my schooling from CBSE, graduation from CSJMU, post graduati...

Swethavyas bakka

Swethavyas bakka

Hyderabad, Telangana , 500044
10 Years
Years
₹10000+
Monthly
₹501-600
Per Hour

I have 10+ years of experience in teaching maths physics and chemistry for 10th 11th 12th and interm...

Vijaya Lakshmi

Vijaya Lakshmi

Hyderabad, Telangana , New Nallakunta , 500044
30+ Years
Years
₹9001-10000
Monthly
₹501-600
Per Hour

I am an experienced teacher ,worked with many reputed institutions Mount Carmel Convent , Chandrapu...

Shifna sherin F

Shifna sherin F

Gudalur, Tamilnadu , Gudalur , 643212
5 Years
Years
₹6001-7000
Monthly
₹401-500
Per Hour

Hi, I’m Shifna Sherin! I believe that every student has the potential to excel in Math with the righ...

Divyank Gautam

Divyank Gautam

Pune, Maharashtra , Kothrud , 411052
3 Years
Years
Not Specified
Monthly
Not Specified
Per Hour

An IIT graduate having 8 years of experience teaching Maths. Passionate to understand student proble...

Explore Tutors In Your Location

Discover expert tutors in popular areas across India

Violin Classes Near DLF Phase 5 – Learn Classical & Modern Violin from Expert Teachers DLF Phase V, Gurugram
Resume & Interview Coaching Near By Sector 102 Gurugram (Dwarka Expressway) – Build Confidence, Crack Interviews, Get Hired Sector 102, Gurugram
Singing / Vocal Training Near DLF Phase 2 – Professional Voice Training for All Age Groups DLF Phase 2, Gurugram
Digital Marketing Classes Near By Kirti Nagar – Build a High-Growth Career in the Digital World Kirti Nagar, Delhi
Voice-Over Training Near Sector 139 Noida – Learn Professional Voice Acting & Recording Skills Noida
Spoken English Classes Near By Tilak Nagar Improve Fluency, Build Confidence & Unlock Career Opportunities in 2026 Tilak Nagar, Delhi
Meditation Coaching Near By Nangloi – Find Inner Peace & Mental Clarity Nangloi, Delhi
English Spoken Classes Near Rosewood City – Improve Your Confidence and Fluency Rosewood, Gurugram
Digital Marketing Classes Near Noida Sector 96 – Learn Modern Marketing Skills and Build a Successful Career Noida
Singing & Guitar Classes Near Sector 106 Gurugram (Dwarka Expressway) – Discover Your Musical Talent Sector 106, Gurugram
Web Development Classes Near Uttam Nagar – Learn to Build Modern Websites Uttam Nagar, Delhi
Web Development Course Near Sector 59 Gurugram – Learn Coding & Build a Successful Tech Career Sector 59, Gurugram
Spoken English Classes Near By Mehrauli Build Fluency, Improve Confidence & Unlock Better Opportunities in 2026 Mehrauli, Delhi
Financial Advisor Near Sector 104 Gurugram (Dwarka Expressway) – Smart Planning for a Secure Future Dwarka Expressway in Sector 104, Gurugram
Spoken English Classes Near By Punjabi Bagh Improve Fluency, Build Confidence & Unlock Career Opportunities in 2026 Punjabi Bagh, Delhi
Keyboard / Piano Classes Near Sector 147 Noida – Learn Music with Expert Trainers Noida
Candle Making Classes Near By Dwarka Mor – Learn the Art of Handmade Candle Crafting Dwarka Mor, Delhi
Candle Making Classes In Dwarka Mor – Learn the Art of Handmade Candle Crafting Dwarka Mor, Delhi
Spoken English Classes Near By New Friends Colony Improve Fluency, Boost Confidence & Unlock Career Growth in 2026 New Friends Colony, Delhi
Spoken English Classes Near By Hauz Khas Build Fluency, Confidence & Professional Communication Skills in 2026 Hauz Khas, Delhi
⭐ Premium Institute Network

Discover Elite Educational Institutes

Connect with top-tier educational institutions offering world-class learning experiences, expert faculty, and innovative teaching methodologies

Réussi Academy of languages

sugandha mishra

Réussi Academy of languages
Madhya pradesh, Indore, G...

Details

Coaching Center
Private
Est. 2021-Present

Sugandha Mishra is the Founder Director of Réussi Academy of Languages, a premie...

IGS Institute

Pranav Shivhare

IGS Institute
Uttar Pradesh, Noida, Sec...

Details

Coaching Center
Private
Est. 2011-2020

Institute For Government Services

Krishna home tutor

Krishna Home tutor

Krishna home tutor
New Delhi, New Delhi, 110...

Details

School
Private
Est. 2001-2010

Krishna home tutor provide tutors for all subjects & classes since 2001

Edustunt Tuition Centre

Lakhwinder Singh

Edustunt Tuition Centre
Punjab, Hoshiarpur, 14453...

Details

Coaching Center
Private
Est. 2021-Present
Great success tuition & tutor

Ginni Sahdev

Great success tuition & tutor
Delhi, Delhi, Raja park,...

Details

Coaching Center
Private
Est. 2011-2020