Overview
The Big Data Hadoop developer course has been designed to impart an in-depth knowledge of Big Data processing using Hadoop and Spark and Data Science.
Learning Outcomes
- Understand the different components of the Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark
- Gain knowledge of Hadoop Distributed File System (HDFS) and YARN as well as their architecture, and learn how to work with them for storage and resource management
- Understand MapReduce and its characteristics, and assimilate some advanced MapReduce concepts
- Get an overview of Sqoop and Flume and describe how to ingest data using them
- Understand different types of file formats, Avro Schema, using Arvo with Hive, and Sqoop and Schema evolution
- Get to know about HBase, its architecture, data storage, and working with HBase. You will also understand the difference between HBase and RDBMS
- Gain a working knowledge of Pig and its components
- Do functional programming in Spark
- Understand resilient distribution datasets (RDD) in detail
- Implement and build Spark applications
- Understand the common use-cases of Spark and the various interactive algorithms
- Learn Spark SQL, creating, transforming, and querying Data frames
Duration: 4 days Workshop + Post Workshop Support
Modules
- Introduction
- Introduction to Big data and Hadoop Ecosystem
- HDFS and YARN
- MapReduce and Scoop
- Basics of Hive and Impala
- Types of Data Formats
- Advanced Hive Concept and Data File Partitioning
- Apache Flume and HBase
- Pig/Tableau & QlikView
- Basics of Apache Spark
- RDDs in Spark
- Implementation of Spark Applications
- Spark Parallel Processing
- Spark RDD Optimization Techniques
- Spark Algorithm
Deliverables
- 2 days Instructor-Led Classroom training from Certified Trainer of Senior Profile.
- Course materials(soft copy) and practice exercises
- Big data Course Completion Certification
Date :
11/10/2023
Language :
Japanees