(SEM VI) THEORY EXAMINATION 2022-23 BIG DATA AND ANALYTICS
BIG DATA AND ANALYTICS – KDS-601
Section-wise Important Questions & Ready Answers
SECTION A
(Attempt all questions – 2 marks each)
(a) Different Kinds of Digital Data
Digital data can be classified as structured data, semi-structured data, and unstructured data. Structured data is organized in tabular form like databases, semi-structured data includes XML and JSON files, while unstructured data includes images, videos, audio files, emails, and social media content.
(b) Drivers of Big Data
The major drivers of Big Data include the rapid growth of social media, mobile devices, IoT sensors, cloud computing, digital transactions, and the need for real-time analytics. These factors generate massive
volumes of diverse and fast-moving data.
(c) Importance of Hadoop Data Format
Hadoop data format is important because it enables efficient storage and processing of large datasets. Hadoop supports formats like Text, SequenceFile, Avro, and Parquet, which improve compression, performance, and compatibility with MapReduce and other ecosystem tools.
(d) Distributed File System
A distributed file system stores data across multiple machines while appearing as a single logical system to users. It provides scalability, fault tolerance, and high availability by distributing data blocks across nodes.
(e) Working of File System
A file system manages how data is stored, retrieved, and organized on storage devices. It handles file naming, access control, data allocation, and metadata management to ensure efficient and secure data access.
(f) Use of Data Replication
Data replication creates multiple copies of data across different nodes. It improves fault tolerance, data availability, and reliability by ensuring data remains accessible even if a node fails.
(g) Need of Scheduler in Hadoop
A scheduler is required in Hadoop to allocate cluster resources efficiently among multiple jobs. It ensures fairness, optimal resource utilization, and balanced workload execution across nodes.
(h) Data Types Used in MongoDB
MongoDB supports data types such as String, Integer, Boolean, Double, Array, Object, Date, ObjectId, and Binary data, enabling flexible schema-less storage.
(i) Applications of Big Data Using Pig
Pig is used for data cleansing, transformation, aggregation, and analysis of large datasets. It is widely applied in log analysis, ETL processes, customer behavior analysis, and recommendation systems.
(j) Data Processing Operators Used in Pig
Pig operators include LOAD, FILTER, GROUP, FOREACH, JOIN, ORDER, DISTINCT, UNION, and STORE, which help in performing complex data transformations easily.
SECTION B
(Attempt any three – 10 marks each)
2(a) Overcoming Challenges of Conventional Data Analysis Systems
Conventional systems fail due to limited scalability, high cost, and inability to process unstructured data. These challenges are overcome using distributed computing, parallel processing, cloud infrastructure, and Big Data frameworks like Hadoop and Spark, which enable scalable and cost-effective analytics.
2(b) Hadoop Ecosystem – Concept and Architecture
The Hadoop ecosystem consists of HDFS for storage, MapReduce for processing, YARN for resource management, and tools like Hive, Pig, HBase, Sqoop, Flume, and Oozie. Together, they support data ingestion, storage, processing, and analysis of large datasets.
(In exam, a neat labeled architecture diagram is expected.)
2(c) HDFS Monitoring and Maintenance Process
HDFS monitoring involves checking disk usage, node health, and block replication using tools like NameNode UI and logs. Maintenance includes balancing data, replacing failed nodes, repairing corrupted blocks, and ensuring optimal replication for reliability.
2(d) New Features in Hadoop 2.0
Hadoop 2.0 introduced YARN for better resource management, improved scalability, support for multiple processing models, enhanced fault tolerance, and better performance compared to Hadoop 1.x.
2(e) Apache Hive Installation and Architecture
Hive is installed on Hadoop to enable SQL-like querying using HiveQL. Its architecture includes user interface, driver, compiler, optimizer, execution engine, and metastore. Hive translates queries into MapReduce jobs for execution on HDFS.
SECTION C
3(a) Big Data Architecture and Characteristics
Big Data architecture includes data sources, data ingestion, storage layer, processing layer, analytics layer, and visualization. Its key characteristics are Volume, Velocity, Variety, Veracity, and Value, which define the nature and complexity of Big Data systems.
3(b) Big Data Security, Protection, and Auditing Features
Big Data security includes authentication, authorization, encryption, data masking, and auditing. Tools like Kerberos, Ranger, and Knox ensure secure access, data protection, and compliance monitoring.
Related Notes
BASIC ELECTRICAL ENGINEERING
ENGINEERING PHYSICS THEORY EXAMINATION 2024-25
(SEM I) ENGINEERING CHEMISTRY THEORY EXAMINATION...
THEORY EXAMINATION 2024-25 ENGINEERING MATHEMATICS...
(SEM I) THEORY EXAMINATION 2024-25 ENGINEERING CHE...
(SEM I) THEORY EXAMINATION 2024-25 ENVIRONMENT AND...
Need more notes?
Return to the notes store to keep exploring curated study material.
Back to Notes StoreLatest Blog Posts
Best Home Tutors for Class 12 Science in Dwarka, Delhi
Top Universities in Chennai for Postgraduate Courses with Complete Guide
Best Home Tuition for Competitive Exams in Dwarka, Delhi
Best Online Tutors for Maths in Noida 2026
Best Coaching Centers for UPSC in Rajender Place, Delhi 2026
How to Apply for NEET in Gurugram, Haryana for 2026
Admission Process for BTech at NIT Warangal 2026
Best Home Tutors for JEE in Maharashtra 2026
Meet Our Exceptional Teachers
Discover passionate educators who inspire, motivate, and transform learning experiences with their expertise and dedication
Explore Tutors In Your Location
Discover expert tutors in popular areas across India
Discover Elite Educational Institutes
Connect with top-tier educational institutions offering world-class learning experiences, expert faculty, and innovative teaching methodologies