(SEM VIII) THEORY EXAMINATION 2022-2023 BIG DATA

B.Tech Data Structure 0 downloads
₹29.00

SECTION A

(Attempt all | 2 × 10 = 20 Marks)

 

(a) Benefits of HDFS over NFS

HDFS provides high fault tolerance, scalability, and distributed storage across multiple nodes, whereas NFS is centralized and less reliable for large-scale data processing. HDFS is optimized for big data analytics and parallel processing.

 

(b) Structured, Semi-Structured & Unstructured Data

Structured data follows a fixed schema (tables, rows). Semi-structured data uses tags or markers (XML, JSON). Unstructured data has no predefined format, such as images, videos, and social media data.

 

(c) Sources of data in Big Data

Sources include social media, sensors and IoT devices, transaction logs, web clickstreams, mobile devices, multimedia content, and enterprise applications.

 

(d) Metadata in HDFS

Metadata stores information about files such as file name, size, permissions, block locations, and replication details. It is maintained by the NameNode.

 

(e) Map vs Reduce

Map processes input data and converts it into key-value pairs. Reduce aggregates and processes these key-value pairs to produce final output.

 

(f) Indexing

Indexing is a technique used to improve data retrieval speed by creating a data structure that allows quick access to records.

 

(g) Shuffle vs Sort

Shuffle transfers intermediate map outputs to reducers. Sort arranges the data by keys before reduction.

 

(h) TF-IDF

TF-IDF (Term Frequency–Inverse Document Frequency) measures the importance of a word in a document relative to a collection of documents.

 

(i) NameNode, DataNode, JobTracker, TaskTracker

NameNode manages metadata, DataNode stores data blocks, JobTracker manages MapReduce jobs, and TaskTracker executes tasks on slave nodes.

 

(j) File name and block size

Windows: Max filename 255 chars, block size ~4 KB

Linux: Max filename 255 chars, block size ~4 KB

Hadoop: Block size 128 MB (default), large block size for efficient processing

 

SECTION B

(Attempt any THREE | 10 × 3 = 30 Marks)

 

2(a) 5 Vs of Big Data and their importance

The 5 Vs are Volume (huge data size), Velocity (speed of data generation), Variety (multiple data types), Veracity (data quality), and Value (useful insights). These characteristics explain why traditional systems fail and why specialized Big Data tools are required.

 

2(b) History and evolution of Hadoop

Hadoop originated from Google’s GFS and MapReduce papers. It evolved into Apache Hadoop, providing open-source distributed storage (HDFS) and processing (MapReduce), later enhanced with YARN, Spark, Hive, and HBase.

 

2(c) Data replication in HDFS

Replication stores multiple copies of data blocks across different nodes. Benefits include fault tolerance and high availability. Challenges include increased storage cost and network overhead.

 

2(d) Fair vs Capacity Scheduler in YARN

Fair Scheduler ensures equal resource sharing among applications, while Capacity Scheduler allocates fixed resources to queues. Fair Scheduler is flexible; Capacity Scheduler is suitable for large organizations.

 

2(e) Pig and its execution modes

Apache Pig uses Pig Latin for data processing.
Execution modes:                              Local Mode                         MapReduce Mode

Pig simplifies complex data flows compared to traditional databases, which use SQL and structured schema.

 

SECTION C

 

3(a) Security, compliance, auditing & protection in Big Data

Big Data security includes authentication, authorization, encryption, auditing, and compliance. Key features are data privacy, secure access control, regulatory compliance, and monitoring using tools like Kerberos and Ranger.

 

3(b) Challenges of conventional data systems

Traditional systems lack scalability, flexibility, and performance for large datasets. Big Data solves these using distributed storage, parallel processing, and fault tolerance.

 

4(a) Hadoop Distributed File System

HDFS stores large data across clusters using replication and parallel access. It supports scalability, fault tolerance, and high-throughput processing.

 

4(b) Anatomy of a MapReduce job

Input split → Map → Shuffle → Sort → Reduce → Output. JobTracker coordinates tasks while TaskTrackers execute them.

 

5(a) Data ingestion methods: Flume & Sqoop

Flume ingests streaming data like logs, while Sqoop transfers structured data between RDBMS and Hadoop.

 

5(b) Hadoop I/O support

Hadoop supports compression, serialization, Avro for schema-based storage, and file formats like SequenceFiles and Parquet.

 

6(a) NoSQL & MongoDB

MongoDB stores data as documents using JSON-like format. It supports CRUD operations, flexible schema, indexing, and high scalability.

 

6(b) Scala features

Scala supports object-oriented and functional programming, classes, objects, closures, pattern matching, and is tightly integrated with Spark.

 

7(a) HBase vs RDBMS

HBase is distributed, schema-less, and scalable, while RDBMS is centralized and schema-based. HBase offers advanced indexing and column-family design.

 

7(b) Role of ZooKeeper

ZooKeeper manages configuration, synchronization, leader election, and monitoring in Hadoop clusters. It helps build reliable distributed applications.

 

File Size
40.56 KB
Uploader
SuGanta International
⭐ Elite Educators Network

Meet Our Exceptional Teachers

Discover passionate educators who inspire, motivate, and transform learning experiences with their expertise and dedication

KISHAN KUMAR DUBEY

KISHAN KUMAR DUBEY

Sant Ravidas Nagar Bhadohi, Uttar Pradesh , Babusarai Market , 221314
5 Years
Years
₹10000+
Monthly
₹201-300
Per Hour

This is Kishan Kumar Dubey. I have done my schooling from CBSE, graduation from CSJMU, post graduati...

Swethavyas bakka

Swethavyas bakka

Hyderabad, Telangana , 500044
10 Years
Years
₹10000+
Monthly
₹501-600
Per Hour

I have 10+ years of experience in teaching maths physics and chemistry for 10th 11th 12th and interm...

Vijaya Lakshmi

Vijaya Lakshmi

Hyderabad, Telangana , New Nallakunta , 500044
30+ Years
Years
₹9001-10000
Monthly
₹501-600
Per Hour

I am an experienced teacher ,worked with many reputed institutions Mount Carmel Convent , Chandrapu...

Shifna sherin F

Shifna sherin F

Gudalur, Tamilnadu , Gudalur , 643212
5 Years
Years
₹6001-7000
Monthly
₹401-500
Per Hour

Hi, I’m Shifna Sherin! I believe that every student has the potential to excel in Math with the righ...

Divyank Gautam

Divyank Gautam

Pune, Maharashtra , Kothrud , 411052
3 Years
Years
Not Specified
Monthly
Not Specified
Per Hour

An IIT graduate having 8 years of experience teaching Maths. Passionate to understand student proble...

Explore Tutors In Your Location

Discover expert tutors in popular areas across India

Web Development Course Near Sector 59 Gurugram – Learn Coding & Build a Successful Tech Career Sector 59, Gurugram
Spoken English Classes Near By Kalkaji Improve Fluency, Build Confidence & Grow Career Opportunities in 2026 Kalkaji, Delhi
Guitar Classes Near Tilak Nagar – Learn, Play & Perform with Confidence Tilak Nagar, Delhi
Computer Classes Near Sector 90 Gurugram – Build Digital Skills for a Smarter Future Sector 90 Road, Gurugram
Guitar Classes Near Central Noida Sector 1 – Learn Guitar with Expert Trainers Noida
Yoga Classes Near Sector 137 Greater Noida – Improve Health, Fitness and Mental Well-Being Through Professional Yoga Training Sector 137, Noida
Spanish Language Classes Near Sector 43 Gurugram – Learn Spanish with Expert Trainers Sector 43, Gurugram
Guitar Classes Near DLF Phase 1 Gurugram – Professional Music Training for Kids, Beginners & Adults DLF Phase I, Gurugram
Graphic Designing Classes Near Noida Sector 99 – Learn Creative Design and Build a Successful Career Noida
Singing Classes Near by Uttam Nagar – Discover Your True Voice Uttam Nagar, Delhi
Vedic Maths Classes Near By Dwarka Mor Improve Speed, Accuracy & Confidence in Mathematics Dwarka Mor, Delhi
Home Tuition (All Subjects) Near Dwarka Mor – Personalized Learning for Academic Success Dwarka Mor, Delhi
Music Production (Laptop-Based) Classes Near Sector 142 Noida – Learn Professional Digital Music Creation Sector 142, Noida
Spoken English Classes Near By Govindpuri Improve Fluency, Build Confidence & Unlock Better Career Opportunities in 2026 Govindpuri, Delhi
🇫🇷 French Language Classes Near Rosewood City – Learn French for Global Opportunities Rosewood, Gurugram
Yoga Classes Near Malviya Nagar Build Strength, Reduce Stress & Transform Your Lifestyle with Professional Yoga Training in 2026 Malviya Nagar, Delhi
Academic & Tuition Skills Near Sector 87 Gurugram – Build Strong Foundations for Lifelong Success Gurugram
Prenatal Yoga Training Near By Uttam Nagar – Safe & Guided Pregnancy Wellness Uttam Nagar, Delhi
Photography Basics Classes Near Sector 82 Gurugram – Learn, Click & Create H Block Sector 82, Gurugram
Tailoring & Stitching Classes Near By Dwarka Mor – Learn Professional Sewing Skills Dwarka Mor, Delhi
⭐ Premium Institute Network

Discover Elite Educational Institutes

Connect with top-tier educational institutions offering world-class learning experiences, expert faculty, and innovative teaching methodologies

Réussi Academy of languages

sugandha mishra

Réussi Academy of languages
Madhya pradesh, Indore, G...

Details

Coaching Center
Private
Est. 2021-Present

Sugandha Mishra is the Founder Director of Réussi Academy of Languages, a premie...

IGS Institute

Pranav Shivhare

IGS Institute
Uttar Pradesh, Noida, Sec...

Details

Coaching Center
Private
Est. 2011-2020

Institute For Government Services

Krishna home tutor

Krishna Home tutor

Krishna home tutor
New Delhi, New Delhi, 110...

Details

School
Private
Est. 2001-2010

Krishna home tutor provide tutors for all subjects & classes since 2001

Edustunt Tuition Centre

Lakhwinder Singh

Edustunt Tuition Centre
Punjab, Hoshiarpur, 14453...

Details

Coaching Center
Private
Est. 2021-Present
Great success tuition & tutor

Ginni Sahdev

Great success tuition & tutor
Delhi, Delhi, Raja park,...

Details

Coaching Center
Private
Est. 2011-2020