Big Data & Hadoop Developer Course

Become Master in HDFS, Yarn, Map Reduce, Hive, HBase, Sqoop, Flume, Oozie, Zoopkeeper, Spark and Storm by our Big Data Hadoop Certification Training program

Hadoop Professionals are attracting best pay Packages due to shortage of skills in worldwide markets.

 

What is Big Data and Hadoop?

Massive amount of data is coming from various sources like smartphones, twitters, facebook and other sources. Big data is a collection of the large volumes of data that can’t be processed using the traditional Database management systems. According to various survey’s 90 percent of the world’s data is generated in the last 2 years.

Software framework for storing and processing big data is known as Hadoop. To address these issues in traditional Database management systems, Google labs came up with an algorithm to split their large amount of data into smaller chunks and map them to many computers and when calculations were done, bring back the results to consolidate. This Hadoop framework has many components such as HDFS, MapReduce, HBase, Hive, Pig,sqoop, zookeeper to analyse structured and unstructured data using commodity hardware.

Pre-requisites for Big Data and Hadoop Certification Course:

* No fundamentals to learn Big Data and Hadoop Course. Basic knowledge of Core Java SQL will be beneficial, but certainly not mandatory.

* As part of Big Data and Hadoop Certification course, Manumedisoft Training Services can provide a complementary self-paced course on core java.

Who should go for Big Data and Hadoop Course:

 * Graduates and Professionals aspiring for making a career in Big data and Hadoop 

* Software developers and Engineers

* Analysts, Data analysts, Java Architects, DBA, and Database related professionals

* Project leads, Architects and Project Managers

Manumedisoft's Online Big Data Hadoop Training has helped thousands of Big Data Hadoop professionals around the globe to bag top jobs in the industry. Our Online Big Data Hadoop Certification course includes lifetime access, 24X7 support and class recordings. 

As a part of the course, you will be required to execute real-life industry-based projects. The projects included are in the any domains of Banking, Telecommunication, Social media, Insurance, and E-commerce.  This Big Data course also includes its fundamental and latest modules, like HDFS, Map Reduce, Hive, HBase, Sqoop, Flume, Oozie, Zoopkeeper, Spark and Storm. At end of the program, aspirants are awarded with Big Data & Hadoop Certification. You will also work on a project as part of your training which would prepare to take up assignments on Big data

Objectives of the Course

After completion of the Big Data and Hadoop Course from Manumedisoft, you will be able to:

* Completely understanding Apache Hadoop Framework

* Understanding of HDFS, learn how MapReduce processes the data

* Hadoop development and implementation
*
Get an overview of Sqoop and Flume and describe how to consume data using them

* Understand how YARN engages in managing to compute resources into clusters

* Create database and tables in Hive and Impala, understand HBase, and use Hive and Impala

* Design, build, install, configuring the applications involving Big Data and Hadoop Ecosystem

* Maintain security and data privacy

Who can become a Big Data and Hadoop Professional?

Hadoop Certification Training can help you get a Big data Hadoop job if you have the readiness to build a career in Big Data Domain. There are no predefined prerequisites to learn Hadoop, but comprehensive.

The following professionals are eligible to become a BigData Hadoop Professional, Software developers, Architects, Analysts, DBA, Data Analysts, Business Analysts, Big Data professionals, or anyone who is considering to building a career in Big Data and Hadoop is ideal applicants for the Big Data and Hadoop training. It’s a incorrect belief that only professionals with knowledge in Java programming background are suitable for learning Big Data Hadoop or joining a career in this domain. An elementary knowledge of any programming language like Java, C++ or Python, and Linux is always an additional advantage.

Key Features

      • High quality hours of training
      • Trainers are Industry experts & working professionals
      • Comprehensive up-to date contents
      • Exercises & Hands-on assignments
      • Course completion certificate

Group Discount

    • 10% discount for 3 or more registration 

Agenda

Module 1: Introduction to Big Data Hadoop Spark Developers

  • What is Big Data?
  • The Rise of Bytes
  • Data Explosion and its Sources
  • Types of Data – Structured, Semi-structured, Unstructured data
  • Why did Big Data suddenly become so prominent
  • Data – The most valuable resource
  • Characteristics of Big Data – IBM’s Definition
  • Limitations of Traditional Large-Scale Systems
  • Various Use Cases for Big Data
  • Challenges of Big Data
  • Hadoop Introduction – What is Hadoop? Why Hadoop?
  • Is Hadoop a fad or here to stay? – Hadoop Job Trends
  • History and Milestones of Hadoop
  • Hadoop Core Components – MapReduce & HDFS
  • Why HDFS?
  • Comparing SQL Database with Hadoop
  • Understanding the big picture – Hadoop Eco-Systems
  • Commercial Distribution of Hadoop – Cloudera, Hortonworks, MapR, IBM BigInsight, Cloud Computing – Amazon Web Services, Microsoft Azure HDInsight
  • Supported Operating Systems
  • Organizations using Hadoop
  • Hands on with Linux File System
  • Hadoop Documentation and Resources

Module 2: Getting Started with Hadoop Setup

  • Deployment Modes – Standalone, Pseudo-Distributed Single node, Multinode
  • Demo Pseudo-Distributed Virtual Machine Setup on Windows
  • Virtual Box – Introduction
  • Install Virtual Box
  • Open a VM in Virtual Box
  • Hadoop Configuration overview
  • Configuration parameters and values
  • HDFS parameters
  • MapReduce parameters
  • YARN parameters
  • Hadoop environment setup
  • Environment variables
  • Hadoop Core Services – Daemon Process Status using JPS
  • Overview of Hadoop WebUI
  • Firefox Bookmarks
  • Web Ports
  • Eclipse development environment setup

Module 3: Hadoop Architecture and HDFS

  • Introduction to Hadoop Distributed File System
  • Regular File System v/s HDFS
  • HDFS Architecture
  • Components of HDFS – NameNode, DataNode, Secondary NameNode
  • HDFS Features – Fault Tolerance, Horizontal Scaling
  • Data Replication, Rack Awareness
  • Setting up HDFS Block Size
  • HDFS2.0 – High Availability, Federation
  • Hands on with Hadoop HDFS,WebUI and Linux Terminal Commands
  • HDFS File System Operations
  • Name Node Metadata, File System Namespace, NameNode Operation,
  • Data Block Split, Benefits of Data Block Approach, HDFS – Block Replication Architecture, Block placement, Replication Method, Data Replication Topology, Network Topology, Data Replication Representation
  • Anatomy of Read and Write data on HDFS
  • Failure and Recovery in Read/Write Operation
  • Hadoop Component failures and recoveries
  • HDFS Programming Basics – Java API
  • Java API Introduction
  • Hadoop Configuration API
  • HDFS API Overview
  • Accessing HDFS Programmatically

Module 4: MapReduce Framework

  • What is MapReduce and Why it is popular
  • MapReduce Framework– Introduction, Driver, Mapper, Reducer, Combiner, Split, Shuffle & Sort
  • Example: Word Count the Hello World of MapReduce
  • Use cases of MapReduce
  • MapReduce Logical Data Flow – with multiple/single reduce task
  • MapReduce Framework revisited
  • Steps to write a MapReduce Program
  • Packaging MapReduce Jobs in a JAR
  • MapReduce CLASSPATH
  • Different ways of running MapReduce job
  • Run on Eclipse – local v/s HDFS
  • Run M/R job using YARN
  • Writing and Viewing Log Files and Web UI
  • Input Splits in MapReduce
  • Relation between Input Splits and HDFS Blocks
  • Hands on with Map Reduce Programming

Module 5: MapReduce Advanced

  • Map Reduce Architecture
  • Responsibility of JobTracker, TaskTracker in classic MapReduce v1
  • Anatomy of MapReduce Jobs Execution in classic MRv1(JT, TT)
  • Hadoop 2.0, YARN, MRv2
  • Hadoop 1.0 Limitations
  • MapReduce Limitations
  • YARN Architecture
  • Classic vs YARN
  • MapReduce CLASSPATH
  • YARN multitenancy
  • MapReduce and YARN command line tools
  • Run M/R job using YARN
  • Anatomy of MapReduce Jobs Execution MRv2 - YARN(RM, AM, NM)
  • How status updates in MRv1 and YARN
  • Speculative Execution
  • Data Locality
  • Reducer – Shuffle, Sort and Partitioner
  • Responsibility of JobTracker, TaskTracker in classic MapReduce v1
  • How Partitioners and Reducers Work Together
  • Determining the optimal number of Reducers for a Job
  • Setting Mapper Counts and Reducer Counts
  • Writing Customer Partitioners
  • Strategies for Debugging MapReduce Code
  • Counters - Retrieving Job Information
  • Logging in Hadooop
  • MapReduce unit tests with JUnit and MRUnitframkework
  • MapReduce I/O Format
  • Understanding Data Types of Keys and Values
  • Understanding Input/output Format, Sequence Input/output format
  • Creating Custom Writable and WritableComparable Implementations
  • Implementing Custom InputFormats and OutputFormats
  • Saving Binary Data Using SequenceFile and Avro Data Files
  • Map-Side Join, Reduce-Side Join, Cartesian Product
  • Creating Map-Only MapReduce Jobs Example: DistCp command

Module 6: Data Warehousing - Pig

  • Pig Data Flow Language – MapReduce using Scripting
  • Challenges Of MapReduce Development Using Java
  • Need for High Level Languages - Pig
  • PIG vs MapReduce
  • What is/n’t PIG, PigLatin, Grunt Shell
  • Where to/not to use Pig?
  • Pig Installation and Configuration
  • Architecture: The Big Picture, Pig Components
  • Execution Environments - Local, Mapreduce
  • Different ways of Invoking Pig – Interactive, Batch
  • Pig Example: Data Analysis in Pig Latin
  • Quickstart and Interoperability
  • Data Model
  • Expression in Pig Latin
  • Pig Data Types,
  • Nulls in Pig Latin
  • Pig Operation
  • Core Relational Operators – Load, Store, Filter, Transform, Join, Group, CoGroup, Union, Foreach, Sort/Order, Combine/Split, Distinct, Filter, Limit, Describe, Explain, Illustrate
  • Group v/s CoGroup v/s Join
  • PIG Latin: File Loaders & in built UDF(Python, Java) usage
  • PIG v/s SQL
  • Implementation & Usage of Pig UDF

Module 7 : Data Warehousing - Hive and HiveQL

  • Limitations of MapReduce
  • Need for High Level Languages
  • Analytical OLAP - Datawarehousing with Apache Hive and Apache Pig
  • HiveQL- SQL like interface for MapReduce
  • What is Hive, Background, Hive QL
  • Where to use Hive? Why use Hive when Pig is here?
  • Pig v/s Hive
  • Hive Installation, Configuration Files
  • Hive Components, Architecture and Metastore
  • Metastore – configuration
  • Driver, Query Compiler, Optimizer and Execution Engine
  • Hive Server and Client components
  • Hive Data Types
  • Hive Data Mode
  • File Formats
  • Hive Example
  • Hive DDL
  • CREATE,ALTER,DROP,TRUNCATE
  • Create/Show Database
  • Create/Show/Drop Tables
  • Hive DML
  • SELECT, INSERT, OVERWRITE, EXPLAIN
  • Load Files & Insert Data into Tables
  • Managed Tables v/s External Tables – Loading Data
  • Hive QL - Select, Filter, Join, Group By, Having, Cubes-Fact/Dimension(Star Schema)
  • Implementation & Usage of Hive UDF, UDTF and SerDe
  • Partitioned Table - loading data
  • Bucketing
  • Multi-Table Inserts
  • Joins
  • Hands on with Hive – CRUD - Get,Put,Delete,Scan
  • Limitations of Hive
  • SQLv/s Hive

Module 8: NoSQL Databases - HBase

  • NoSQL Introduction
  • RDBMS (SQL) v/s HBase (NoSQL)
  • RDBMS – Benefits, ACID, Demarits
  • CAP Theorem and Eventual consistency
  • Row Oriented v/s Column Oriented Storage
  • NoSQL: ColumnDB(HBase,Cassandra),Document(MongoDB,CouchDB, MarkLogic),GraphDB(Neo4J),
  • KeyValue(Memcached, Riak, Redis, DynamoDB)
  • What is HBase?
  • HBase comes as a rescue
  • Synopsis of how typical RDBMS scaling story runs
  • HBase – Hadoop Database
  • HBase Introduction, Installation, Configuration
  • HBase Overview: part of Hadoop Ecosystem
  • Problems with Batch Processing like MR
  • HBase v/s HDFS
  • Batch vs. Real Time Data Processing
  • Use-cases for Real Time Data Read/Write
  • HBase Storage Architecture
  • Write Path, Read Path
  • HBase components - HMaster, HRegionServer
  • ZooKeeper
  • Replication
  • HBase Data Model
  • Column Families
  • Column Value & Key Pair
  • HBase Operation - Memstore / HFile / WAL
  • HBase Client - HBase Shell
  • CRUD Operations
  • Create via Put method
  • Read via Get method
  • Update via Put method
  • Delete via Delete method
  • Creating table, table properties, versioning, compression
  • Bulk Loading HBase
  • Accessing HBase using Java Client v/s Admin API
  • Introduction to Java API
  • Read / Write path
  • HBase Operation - Memstore / HFile / WAL
  • HBase Client - HBase Shell
  • CRUD Operations – Create, Read, Update, Delete
  • Drop
  • Scans
  • Scan Caching
  • Batch Caching
  • MapReduce Integration
  • Filters
  • Counters
  • Co-processors
  • Secondary Index
  • Compaction – major, minor
  • Splits
  • Bloom Filters
  • Caches
  • Apache Phoenix
  • When Would I Use Apache HBase?
  • Companies Using HBase
  • When/Why to use HBase/Cassandra/MongoDB/Neo4J?

Module 9: Import/Export Data - Sqoop, Flume

  • Setup MySQL RDBMS
  • Sqoop - Import/Export Structured Data to/from HDFS from/to RDBMS
  • Introduction to Sqoop
  • Installing Sqoop, Configuration
  • Why Sqoop
  • Benefits of Sqoop
  • Sqoop Processing
  • How Sqoop works
  • Sqoop Architecture
  • Importing Data – to HDFS, Hive, HBase
  • Exporting Data – to MySQL
  • Sqoop Connectors
  • Sqoop Commands
  • Flume – Import Semi-Structured (Ex. Log message) Data to HDFS
  • Why Flume
  • Flume - Introduction
  • Flume Model
  • Scalability In Flume
  • How Flume works
  • Flume Complex Flow - Multiplexing
  • Hands on with Sqoop, Flume

Module 10: Workflows using Oozie

  • MapReduce Workflows
  • Workflows Introduction
  • Oozie - Simple/Complex MapReduce Workflow
  • Introduction to Oozie
  • Oozie Workflows
  • Oozie Service/Scheduler
  • Oozie use-cases

Module 11: Administering Hadoop

  • Oracle VirtualBox to Open a VM
  • Open a VM using Oracle
  • Hadoop Cluster Configuration overview
  • Configuration parameters and values
  • HDFS parameters
  • MapReduce parameters
  • Hadoop environment setup
  • Include and Exclude configuration files
  • Site v/s Default conf files
  • Environment Variables
  • Scripts
  • Hadoop Multi-node Installation
  • Passwordless SSH setup
  • Configuration Files of Hadoop Cluster
  • Safe Mode
  • Dfs Admin
  • Hadoop Ports
  • Hadoop Quotas
  • Security - Kerberos
  • ZooKeeper
  • What is Zookeeper
  • Introduction to ZooKeeper
  • Challenges Faced in Distributed Applications
  • ZooKeeper Coordination, Architecture
  • Hue, Cloudera Manager
  • Hadoop Cluster Performance Management
  • Important Hadoop tuning parameters
  • Hadoop Cluster Benchmarking Jobs – How to run the jobs
  • Counters
  • HDFS Benchmarking
  • Debugging, Troubleshooting
  • Reference

Module 12: Apache Spark

  • Spark Concepts, Installation and Architecture
  • Spark Modes
  • Spark web UI
  • Spark shell
  • RDD Operations / transformations
  • Key-Value pair RDDs
  • MapReduce on RDD
  • Submitting the first program to Spark
  • Spark SQL

FAQ

Who are the instructors?

We believe in quality & follow a rigorous process in selecting our trainers. All our trainers are industry experts/ professionals with an experience in delivering trainings.

Whom do I contact, if I have further clarifications?

You can call us on +1-630-974-5490 Option:4 or 630-225-1019 or email at training@manumedisoft.com

What if I miss the online class?

You will get the recording of the session and also you can attend the session from the next batch. you can contact help for next batch details.

What are the pre-requisites for the course?

There are no pre-requisites to learn Big Data and Hadoop Course. Basic knowledge of Core Java SQL will be beneficial, but certainly not mandatory.

As part of Big Data and Hadoop Certification course, ManuMediSoft Training Services can provide a complementary self-paced course on core java.

Do you provide placement assistance?

YES, we provide placement assistance. We also assist you with resume building & share important interview questions once you are done with the training. .

Please Select Your Country :
Training Type
Date
Time
Enquire before Enroll

Testimonials

  • I have attended many training courses in my time. Presenter was appalling, good, or superb. Please convey my thanks to Jeff, because he is in the superb bracket, a master in fact - one of the best.

    Mcmillan Raz
    SAP BI Consultant And Freelancer At Global Training
  • Training with MMS has been a great experience. Their consultant Mark have always been available to assist and bring as much value as possible to our training. Plus, we’ve received rave reviews from both business and IT users who have gone through the previous sessions. This made us to join MMS Training.

    Mahendra S Das
    SAP FICO Trainer