THEORY EXAMINATION (SEM–VI) 2016-17 BIG DATA

B.Tech Data Structure 0 downloads
₹29.00

BIG DATA (NIT067)

Time: 3 Hours  Max Marks: 100


SECTION – A (Short Answer Questions)

(10 × 2 = 20 Marks)


(a) Characteristics of Big Data

The main characteristics of Big Data are known as the 5 V’s:

Volume: Huge amount of data

Velocity: Speed of data generation and processing

Variety: Structured, semi-structured, and unstructured data

Veracity: Data quality and uncertainty

Value: Useful insights derived from data


(b) Calculation of risk in marketing

Risk in marketing is calculated by analyzing customer behavior, purchase patterns, probability of loss, and uncertainty using statistical and predictive analytics techniques.


(c) Use of inferential statistics in Big Data

Inferential statistics is used to draw conclusions about a population from sampled data, helping in prediction, decision-making, and hypothesis testing.


(d) Sharding

Sharding is the process of splitting large datasets into smaller parts (shards) and distributing them across multiple servers to improve performance and scalability.


(e) Usage of Hadoop Pipes

Hadoop Pipes allows developers to write MapReduce programs in languages like C++ instead of Java.


(f) Master-Slave vs Peer-to-Peer architecture in NoSQL

Master-SlavePeer-to-Peer
Central master controls slavesNo central controller
Single point of failureHigh fault tolerance
Used in HDFSUsed in Cassandra

(g) Purpose of Bloom filter

Bloom filter is a probabilistic data structure used to quickly test whether an element is present in a dataset, reducing disk access.

(h) Classic MapReduce vs YARN

Classic MapReduceYARN
JobTracker + TaskTrackerResourceManager + NodeManager
Limited scalabilityBetter scalability
Single processing modelSupports multiple models

(i) Usage of Grunt

Grunt is an interactive shell for Apache Pig, used for writing, testing, and debugging Pig scripts.


(j) Date and Time data types in Hive

Hive uses DATE, TIMESTAMP, and STRING data types to store and manipulate date and time-based data for querying and analytics.


(k) Why Hive is preferred over PigLatin

Hive is preferred because it uses SQL-like syntax (HiveQL), making it easier for users with database background.


SECTION – B (Long Answer Questions)

(Attempt any FIVE – 5 × 10 = 50 Marks)


2(a) Relationship between crowdsourcing and Big Data

Crowdsourcing involves collecting data from a large number of users through platforms like social media, surveys, and mobile apps.
This data is:

High in volume

Generated continuously

Diverse in nature

Hence, crowdsourcing is a major source of Big Data.


Example:
User reviews on e-commerce platforms help companies analyze customer sentiment.


2(b) Aggregate Data Model

The aggregate data model groups related data into aggregates which are accessed together.


Features:           Reduces join operations         Improves performance        Used in NoSQL databases

Example:
An order aggregate contains order details, customer details, and item list.


2(c) Scale-up vs Scale-out and Hadoop

Scale-up: Adding more power (CPU, RAM) to a single machine

Scale-out: Adding more machines to distribute workload

Hadoop uses scale-out architecture by distributing data across multiple nodes using HDFS, improving fault tolerance and performance.


2(d) Building blocks of Hadoop

Main components:      HDFS (Hadoop Distributed File System) – Storage

MapReduce – Data processing    YARN – Resource management

Hadoop Common – Libraries and utilities

Together, they enable distributed storage and parallel processing.


2(e) MapReduce workflows       MapReduce workflow consists of:

Input splitting                                Map phase (key-value generation)

Shuffle and sort                             Reduce phase (aggregation)

Output generation

This workflow enables large-scale parallel processing.


2(f) HBase data model

HBase is a column-oriented NoSQL database.

Data model includes:                   Table

Row key                                          Column family

Column qualifier                             Timestamp

Cell value

It supports real-time read/write access.


2(g) Data modeling rules in Cassandra

Rules:                                              Design based on queries

Avoid joins                                      Use denormalization

Prefer wide rows

Relationships are handled using partition keys and clustering keys instead of joins.


2(h) Hive queries for joins

Natural Join:

 

SELECT * FROM A NATURAL JOIN B;

Outer Join:

 

SELECT * FROM A LEFT OUTER JOIN B ON A.id = B.id;

Used for combining data from multiple tables.


SECTION – C (Very Long Answer Questions)

(Attempt any TWO – 2 × 15 = 30 Marks)


3(i) Hadoop job processing

Steps:                                                                   Client submits job

ResourceManager allocates resources                Map tasks process input splits

Shuffle and sort phase                                        Reduce tasks generate output

Results stored in HDFS                                        This ensures fault-tolerant and parallel execution.


3(ii) Hadoop cluster modes and local mode installation

Modes of Hadoop:                                           Standalone (Local) mode

Pseudo-distributed mode                                  Fully distributed mode


Standalone mode:

Single JVM                                                        No HDFS

Used for testing and learning

Configuration involves setting environment variables and running Hadoop commands locally.


4(i) Pig scripts

Given data: Name, District, Age, Gender

Female students:

 

A = LOAD 'st.txt' USING PigStorage(',')    AS (name:chararray, district:chararray, age:int, gender:chararray); B = FILTER A BY gender == 'Female'; DUMP B;

Students from specific district:

 

C = FILTER A BY district == 'XXXX'; D = GROUP C ALL; E = FOREACH D GENERATE COUNT(C);

District-wise male count:

 

M = FILTER A BY gender == 'Male'; G = GROUP M BY district; H = FOREACH G GENERATE group, COUNT(M);


4(ii) Pig operators

Data access: LOAD, STORE        Transformations: FILTER, GROUP, JOIN, FOREACH

Debugging: DUMP, DESCRIBE, EXPLAIN

These operators simplify large-scale data processing.


5(i) Version stamps

Ways:             Auto-generated timestamp               User-defined timestamp

Pros: Data versioning, consistency
Cons: Storage overhead, complexity


5(ii) Three dimensions of Big Data

Volume: Size of data

Velocity: Speed of data generation

Variety: Different data formats

These dimensions define Big Data complexity and challenges.

File Size
117.81 KB
Uploader
SuGanta International
⭐ Elite Educators Network

Meet Our Exceptional Teachers

Discover passionate educators who inspire, motivate, and transform learning experiences with their expertise and dedication

KISHAN KUMAR DUBEY

KISHAN KUMAR DUBEY

Sant Ravidas Nagar Bhadohi, Uttar Pradesh , Babusarai Market , 221314
5 Years
Years
₹10000+
Monthly
₹201-300
Per Hour

This is Kishan Kumar Dubey. I have done my schooling from CBSE, graduation from CSJMU, post graduati...

Swethavyas bakka

Swethavyas bakka

Hyderabad, Telangana , 500044
10 Years
Years
₹10000+
Monthly
₹501-600
Per Hour

I have 10+ years of experience in teaching maths physics and chemistry for 10th 11th 12th and interm...

Vijaya Lakshmi

Vijaya Lakshmi

Hyderabad, Telangana , New Nallakunta , 500044
30+ Years
Years
₹9001-10000
Monthly
₹501-600
Per Hour

I am an experienced teacher ,worked with many reputed institutions Mount Carmel Convent , Chandrapu...

Shifna sherin F

Shifna sherin F

Gudalur, Tamilnadu , Gudalur , 643212
5 Years
Years
₹6001-7000
Monthly
₹401-500
Per Hour

Hi, I’m Shifna Sherin! I believe that every student has the potential to excel in Math with the righ...

Divyank Gautam

Divyank Gautam

Pune, Maharashtra , Kothrud , 411052
3 Years
Years
Not Specified
Monthly
Not Specified
Per Hour

An IIT graduate having 8 years of experience teaching Maths. Passionate to understand student proble...

Explore Tutors In Your Location

Discover expert tutors in popular areas across India

Spoken English Classes Near Sector 119 Noida – Improve Your Communication Skills with Expert Training Sector 119, Noida
App Development Classes Near Uttam Nagar – Build Android & iOS Apps Uttam Nagar, Delhi
App Development Classes Near Noida Sector 102 – Complete Guide to Build Your Career in Mobile App Development Noida
🇯🇵 Japanese Language Classes Near Golf Course Extension Road – Complete Guide to Learning Japanese Golf Course Ext Road, Gurugram
Yoga Classes (Home or Online) Near Sushant Lok Phase 3 – Transform Your Health Naturally Phase 3 Sushant Lok, Gurugram
Harmonium Classes Near Sushant Lok Phase 1 – Learn Classical Music with Confidence Sushant Lok Phase 1, Gurugram
Financial Advisor Near Sector 104 Gurugram (Dwarka Expressway) – Smart Planning for a Secure Future Dwarka Expressway in Sector 104, Gurugram
Yoga Classes Near By Tilak Nagar Holistic Wellness, Stress Relief & Stronger Mind-Body Balance Tilak Nagar, Delhi
Meditation Coaching Near Sohna Road – Discover Peace, Focus, and Mental Balance Sohna Road, Gurugram
Study Abroad Consultation Near Sector 101 Dwarka Expressway, Gurugram – Your Gateway to Global Education Gurugram
Singing & Guitar Classes Near Sector 106 Gurugram (Dwarka Expressway) – Discover Your Musical Talent Sector 106, Gurugram
Spoken English Classes Near By Lajpat Nagar Build Fluency, Confidence & Professional Communication Skills in 2026 Lajpat Nagar, Delhi
Drum Lessons (Electronic Drums Preferred at Home) Near Sector 146 Noida – Learn Drumming with Professional Trainers Sector 146, Noida
Baking Classes Near By Dwarka Mor – Learn Professional Baking Skills Dwarka Mor, Delhi
Spoken English Classes Near By Green Park Build Fluency, Confidence & Professional Communication Skills in 2026 Green Park, Delhi
Personal Fitness Training Near Malviya Nagar – Transform Your Health with Expert Guidance Malviya Nagar, Delhi
Photography Basics Classes Near By Dwarka Mor – Learn the Art Behind the Lens Dwarka Mor, Delhi
Diet & Nutrition Consultation Near Malibu Town – Personalized Guidance for a Healthy Lifestyle Malibu Town, Gurugram
Graphic Designing Classes Near Noida Sector 99 – Learn Creative Design and Build a Successful Career Noida
Singing / Vocal Training Near Sector 18 Market Area Noida – Learn Music with Professional Vocal Trainers Noida Sector 18, Noida
⭐ Premium Institute Network

Discover Elite Educational Institutes

Connect with top-tier educational institutions offering world-class learning experiences, expert faculty, and innovative teaching methodologies

Réussi Academy of languages

sugandha mishra

Réussi Academy of languages
Madhya pradesh, Indore, G...

Details

Coaching Center
Private
Est. 2021-Present

Sugandha Mishra is the Founder Director of Réussi Academy of Languages, a premie...

IGS Institute

Pranav Shivhare

IGS Institute
Uttar Pradesh, Noida, Sec...

Details

Coaching Center
Private
Est. 2011-2020

Institute For Government Services

Krishna home tutor

Krishna Home tutor

Krishna home tutor
New Delhi, New Delhi, 110...

Details

School
Private
Est. 2001-2010

Krishna home tutor provide tutors for all subjects & classes since 2001

Edustunt Tuition Centre

Lakhwinder Singh

Edustunt Tuition Centre
Punjab, Hoshiarpur, 14453...

Details

Coaching Center
Private
Est. 2021-Present
Great success tuition & tutor

Ginni Sahdev

Great success tuition & tutor
Delhi, Delhi, Raja park,...

Details

Coaching Center
Private
Est. 2011-2020