Save More With Bundle Courses

Data Science Bundle
Big Data Architect Bundle
Web Development Bundle
Mobile Bundle

Big Data Hadoop Developer Training Certification Course , Austin

  4.5 Ratings
  17899 Learners

Big Data industry is growing at a rate of 10% each year and leaves its impact on nearly every business house worldwide. Become a part of this big data revolution. Take a step ahead with AcadGild’s big data training certification and gain proficiency to work with Hadoop, MapReduce, Apache Pig, Hive, and more.

Featured In
Acadgild gets ranked as one of the Top 10 Worldwide Technology Boot Camps.
Course Overview
Introduction to Big Data
Get introduced to big data and the challenges associated with handling it. Understand the different ways to manage the big data problem and how Hadoop fits in this role.
Introduction to Hadoop Framework
Master the Hadoop framework, Hadoop federation, and the features of Hadoop that makes it an unparalleled framework for processing big data.
MapReduce
Understand MapReduce with detailed discussions on various MapReduce phases and data processing for various file format along with real-world examples.
Apache Pig
Understand Apache Pig by contrasting it with MapReduce. Sift through various data types and explore data processing techniques using Pig. Learn to deal with exceptional scenarios using UDFs and by optimizing Pig Query.
Hive
Get introduced to Hive and its similarity with SQL. Understand the architecture of Hive, databases creation, tables, and perform various operations using Hive.
HBase
Learn about NoSql database and difference between HBase and relational databases. Explore features of the NoSQL databases, CAP theorem, and the HBase architecture. Understand the data model and perform various operations.
Sqoop and Flume
Import and export data from traditional databases, like SQL, Oracle to Hadoop using Sqoop to perform various operations. Master import streaming of data to Hadoop using Apache Flume.
Oozie
Learn about Oozie and implement it in the workflow to schedule a Hadoop job.
Highly Experienced
Mentors
Develop 2 Real-Time Projects
Lifetime Access to Dashboard
24x7
Support
Free Job Preparation Week
Course Syllabus
  • Introduction to Big Data
  • Sources of Data Flood
  • Exploding Data Problem
  • Solution for Data Exploding - Hadoop
  • Evolutionary Features of Hadoop
  • Hadoop Timeline
  • Who is Using Big Data?
  • Job Trends in Big Data
  • Solving Big Data Problem
  • Hadoop Cluster - Introduction & Concepts
  • Hadoop 1.x Architecture
  • Progression from Hadoop 1.x to Hadoop 2.x
  • Introduction to YARN Application
  • Anatomy of a YARN Application
  • Hadoop Distributed File System - HDFS
  • HDFS Data Flow
  • Blocks in HDFS
  • HDFS High Level Architecture
  • Processing Feature
  • Relation between Hadoop and Split
  • HDFS File-Write
  • Data Model: File Read
  • Hadoop Installation
  • HDFS Commands
  • Demo of HDFS Commands
  • Hadoop Configuration Files
  • Hadoop Eco System: Key Components
  • HDFS & YARN
  • Data Access
  • Governance & Integration
  • Security
  • Operations
  • MapReduce Definition & Overview
  • Real Life Examples
  • Building Principles
  • Mapper-Reducer Functions
  • MapReduce Example : Word Count
  • Demo to Build a MR Application (Word Count)
  • Hands-On: Word Count Problem
  • Hands-On : Weather Data Problem
  • Hands-On : Call Data Records
  • Introduction
  • Adages
  • Advantages
  • Why Pig?
  • Pig Deployment
  • Pig Terminology
  • Data Types & Handling
  • Apach Pig Architecture
  • Installation
  • Execution - Running Modes
  • Running Pig
  • Relation Operators
  • Hands On : Pig Latin Commands
  • Hands On : Use Case with YouTube Data
  • Hands-On : Writing Pig UDF
  • Hands-On : Execution of xml file Using Pig
  • Hands-On : Advanced Joins Using Pig
  • Real World Use Cases using Pig
  • Mentee can select project from predefined set of AcadGild projects or they can come up with their own ideas for their projects
  • Mentee can select project from predefined set of AcadGild projects or they can come up with their own ideas for their projects
  • Distributed cache
  • Reduce side Join in MR
  • Counters in MapReduce
  • MR unit testing
  • XML processing Using MapReduce
  • Custom input format
  • Custom output format
  • Sequence file input format in MR
  • Inroduction
  • Function
  • Hive architechture
  • Data storage
  • Introduction to HQL
  • Hive query lifecycle on Hadoop
  • Basic operations in Hive
  • Create table and load data
  • Altering & dropping tables
  • Joins and union
  • Partitioning and bucketing
  • Why UDF ?
  • Real world Use cases using Hive
  • UDF demo using Hive
  • Thrift server demo
  • NoSQL databases
  • HBase v/s RDBMS
  • CAP theorem
  • HBase: column family and HBase data model
  • HMaster and slave
  • HBase components
  • ZooKeeper
  • HBase work flow
  • Put and Get
  • Scan
  • Filters in HBase
  • Delete
  • Data loading techniques
  • What is HBase thrift server
  • Integrating HBase with your application
  • Example for sending request and response from Thrift server
  • HBase rest server
  • Import and export of structured data on Hadoop
  • Introduction to Apache Sqoop
  • Sqoop architecture
  • Import and export in Sqoop
  • Sqoop commands
  • Sqoop installation
  • Injecting unstructured data into Hadoop
  • Apache Flume introduction
  • Flume architecture
  • Flume component
  • Introduction to Oozie
  • Oozie co-ordinator
  • Oozie workflows
  • Oozie scheduler hands-on.
  • Mentee can select project from predefined set of AcadGild projects or they can come up with their own ideas for their projects
  • Mentee can select project from predefined set of AcadGild projects or they can come up with their own ideas for their projects
  • Mentee can select project from predefined set of AcadGild projects or they can come up with their own ideas for their projects
  • Mentee can select project from predefined set of AcadGild projects or they can come up with their own ideas for their projects
  • Why is Data So Important?
  • Pre-requisite – Data Scale
  • What is Big Data?
  • Big Bank: Big Challenge
  • Customer Churn Analysis
  • Point-of-Sale Transaction Analysis
  • Common Problems
  • 3 Vs of Big Data
  • Defining Big Data
  • Sources of Data Flood
  • Exploding Data Problem
  • Redefining the Challenges of Big Data
  • Possible Solutions
  • Scaling Up Vs. Scaling Out
  • Challenges of Scaling Out
  • Solution for Data Explosion-Hadoop
  • Hadoop: Introduction
  • Hadoop in Layman's Term
  • Hadoop Ecosystem
  • Evolutionary Features of Hadoop
  • Big Data Benchmarks
  • Hadoop Timeline
  • Why Learn Big Data Technologies?
  • Who is Using Big Data?
  • Yearly Salaries in Big Data World
  • Job Trends in Big Data
  • HDFS: Introduction
  • Design of HDFS
  • Why Hadoop Cluster?
  • HDFS Blocks
  • Components of Hadoop 1.x
  • NameNode and Hadoop Cluster
  • Arrangement of Racks
  • Arrangement of Machines and Racks
  • Local FS and HDFS
  • NameNode
  • Checkpointing
  • Replica Placement
  • Benefits-Replica Placement and Rack Awareness
  • URI
  • URL and URN
  • HDFS Commands
  • Problems with HDFS in Hadoop 1.x
  • HDFS Federation (Included in Hadoop 2.x)
  • HDFS Federation
  • High Availability
  • Configuration Files in Hadoop
  • HDFS Configurations
  • Core Configurations
  • Configuration Files in Hadoop
  • Java API to Read HDFS File
  • Java API to Write HDFS File
  • Java API - Listing of File in HDFS
  • Important Java Classes to Read From HDFS
  • Anatomy of File Read From HDFS
  • Data Read Steps
  • Checksum and Data Integrity
  • Data Read from HDFS: Additional Points
  • Important Java Classes to Write Data to HDFS
  • Anatomy of File Write to HDFS
  • Writing File to HDFS: Steps
  • Handling Failures During Writing a File
  • Building Principles
  • Introduction to MapReduce
  • Some More Real-World Examples
  • Broad Steps
  • Finding Out Maximum Temperature
  • Pseudo Code
  • Mapper Class
  • Reducer Class
  • Driver Code
  • Exploring Methods of Mapper
  • Exploring Methods of Reducer
  • InputSplit
  • InputSplit and Data Blocks – Difference
  • Why Is The Block Size 128 MB?
  • RecordReader
  • InputFormat
  • Default Inputformat: TextInputFormat
  • MapReduce Example
  • OutputFormat
  • Using a Different OutputFormat
  • Important Points
  • Important Points
  • Data Locality
  • JobTracker and TaskTracker
  • Speculative Execution
  • Combiner
  • Using Combiner
  • Partitioner
  • Using Partitioner
  • Map Only Job
  • Flow of Operations in MapReduce
  • "Serialization in MapReduce
  • Custom Writable in MapReduce
  • Custom Writable in MapReduce
  • Custom WritableComparable in MapReduce
  • Overview
  • Schedulers in YARN
  • FIFO Scheduler
  • Capacity Scheduler
  • Fair Scheduler
  • Differences between Hadoop 1.x and Hadoop 2.x "
  • Introduction
  • Pig vs SQL
  • Adages/Philosophy of Pig
  • Some
  • Use-Cases
  • Why Pig?
  • Apache Pig Architecture
  • Simple Data Types
  • Complex Data Types Samples
  • Execution
  • Operators Installation
  • Nested Foreach:Getting Count of Distinct Names
  • Our DataSets
  • Pig Operators:UNION
  • Pig Operators:COGROUP
  • Pig Operators:FLATTEN
  • Pig Operators:PARALLEL
  • Parameter Substitution
  • Macros
  • Anatomy of Reduce-side-Join
  • Job Optimizations in Pig
  • Evaluate UDF in Pig
  • Working with DEFINE
  • Filter UDF in Pig
  • Execution of XML Files in Pig
  • Execution of CSV FIles in Pig
  • Non-Linear Data Flows and Multiquery
  • Optimisations in Pig
  • Project 1 Discussion contd.
  • Python: Download and Installation
  • Eclipse
  • Support for Python
  • Why Python?
  • Python: Introduction
  • Python: Working Interactively
  • Python: Data Types
  • Python Numbers
  • Python Strings
  • Python Lists
  • Split()
  • Python Tuples
  • Tuple Vs List Operations Type Conversion
  • Conditional Statements
  • While Loops For Loops
  • Lambda Functions Map Functions
  • Filter Function Reduce Function
  • File Handling
  • Classes and Objects
  • Modules
  • os Module
  • Flume: Introduction
  • Installation
  • Flume Architecture
  • Example Description Demo:Working_With_Flume_example
  • Demo: exec_source
  • Demo: spool_dir
  • Transactions
  • Batching
  • Exec Source
  • Spooling Directory Source
  • File Channel
  • Memory Channel
  • Logger Sink
  • HDFS Sink
  • Partitioning
  • Interceptor
  • Demo: interceptor.conf
  • Demo: partition.conf
  • Binary File Format
  • Demo: sequencefile.conf
  • Fan Out
  • Demo: fanout.conf Selector in Fan Out
  • Running Hadoop in Local Mode
  • Demo: HadoopLocal
  • MRUnit Testing
  • Demo: MRUnitTesting
  • Java Static Classes
  • Passing Configurations to MapReduce Programs
  • Demo: StaticConfigurations
  • Fetching Logs of MapReduce Jobs
  • Dynamic Configurations
  • Demo: DynamicConfigurations
  • Counters
  • Demo: Counters
  • SequenceFileFormat
  • Demo: SequenceFiles
  • Custom Input Format
  • Small File Problem in Hadoop
  • Demo: FilesPacking
  • DBInputFormat
  • Demo: DBInputFormat
  • DBOutputFormat
  • Demo: DBOutputFormat
  • NLineInputFormat
  • Demo: NLineInputFormat
  • MultipleOutputs
  • Demo: MultipleOutput
  • MultipleInputs
  • Reduce Side Join
  • Example for REDUCE-SIDE JOIN Using MapReduce
  • Anatomy of Reduce-side Join
  • Demo: ReduceSideJoin
  • Distributed Cache
  • Map Side Join
  • Map Side Join Process
  • Demo: MapSideJoin
  • Secondary Sort
  • Demo: SecondarySort
  • Total Order Sort Using Multiple Reducers
  • Demo: TotalOrder
  • Introduction
  • Hive DDL
  • Demo: Databases.ddl
  • Demo: Tables.ddl
  • Hive Views
  • Demo: Views.ddl
  • Architecture
  • Primary Data Types
  • Data Load
  • Demo: ImportExport.dml
  • Demo: HiveQueries.dml
  • Demo: Explain.hql Table Types
  • Demo: ExternalTable.ddl
  • Complex Data Types
  • Demo: Working with Complex Datatypes
  • Hive Variables
  • Demo: Working with Hive Variables
  • Hive Variables and Execution Customisation
  • Demo: Working with Hive Execution
  • A Walkthrough of Hive Components
  • Architecture
  • Execution Engines of Hive
  • The Metastore
  • Overview of Hive Internal
  • Advantages & Limitations Hive Clients
  • Services and Clients Installing Hive
  • Working with Arrays
  • Demo: Arrays
  • Sort By and Order By
  • Demo: Order_By_and_Sort_By
  • Distribute By and Cluster By
  • Demo:Distribute_By_and_Cluster_By
  • Partitioning
  • Static and Dynamic Partitioning
  • Demo: Partitioning Bucketing
  • Bucketing Vs Partitioning
  • Demo: Bucketing Sampling
  • Demo: Sampling
  • Joins and Types
  • Bucket-Map Join
  • Sort-Merge-Bucket-Map Join
  • Left Semi Join
  • Demo: Join Optimisations
  • Input Formats in Hive
  • Sequence Files in Hive
  • RC File in Hive
  • File Formats in Hive
  • ORC Files in Hive
  • Inline Index in ORC Files
  • ORC File Configurations in Hive
  • Input Formats in Hive
  • Demo: File Formats
  • SerDe in Hive
  • Demo: CSVSerDe
  • JSONSerDe
  • RegexSerDe
  • Analytic and Windowing in Hive
  • Demo: Analytics.hql
  • Hcatalog in Hive
  • Demo: Using_HCatalog
  • Accessing Hive with JDBC
  • Demo: HiveQueries.java
  • HiveServer2 and Beeline
  • Demo: beeline
  • UDF in Hive
  • Demo: ToUpper.java and Working_with_UDF
  • Optimizations in Hive
  • Demo: Optimizations
  • Challenges with traditional RDBMS
  • Features of NoSQL databases
  • NoSQL Database Types
  • CAP Theorem
  • What is HBase Regions
  • HBase HMaster ZooKeeper
  • HBase First Read
  • HBase Meta Table
  • Region Server Components
  • HBase Write Steps
  • HBase MemStore
  • HBase Region Flush
  • HBase HFile
  • HBase Read Merge
  • Read Amplification
  • HBase Minor Compaction
  • HBase Major Compaction
  • Region Split
  • HDFS Data Replication
  • Data Recovery
  • Apache HBase Architecture Benefits
  • HBase Vs. RDBMS
  • Shell Commands
  • Java Classes for DDL
  • HBaseConfiguration
  • Java Classes for DML
  • Put Method
  • KeyValue Class
  • Client Side Write Buffer
  • List of Puts
  • Handling Failure in Put
  • Atomic compare-and-set (CAS)
  • Get Method
  • getRowOrBefore
  • Delete Method
  • Effect of setting timestamps
  • Atomic compare-and-delete (CAD)
  • Scan Operation
  • Caching
  • Batching
  • Batch Operations
  • HBase Filters
  • Types of HBase Filters
  • Performances with HBase Filters
  • HBase Filters with Command Line
  • HBase Counters
  • Other clients of HBase
  • Apache Thrift and REST
  • HBase REST Java API
  • Bulk Load Techniques: Custom MapReduce
  • Hive Integration with HBase
  • Pig Integration with HBase
  • Performance Considerations
  • Introduction to Oozie
  • Oozie Architechture
  • Oozie Workflow Nodes
  • Oozie Server
  • Oozie Workflow
  • Sqoop Architecture
  • Sqoop Features
  • Sqoop Hands On
  • Major Project Discussion
  • Getting started with Spark - Part 1
  • Major Project Discussion contd.
  • Getting started with Spark - Part 2
  • Major Project Discussion contd.
  • Final discussion on implementation of projects.
  • Why Is Data So Important?
  • Pre-Requisite – Data Scale
  • What Is Big Data?
  • Big Bank: Big Challenge
  • Customer Churn Analysis
  • Point-Of-Sale Transaction Analysis
  • Common Problems
  • 3 Vs Of Big Data
  • Defining Big Data
  • Sources Of Data Flood
  • Exploding Data Problem
  • Redefining The Challenges Of Big Data
  • Possible Solutions: Scaling Up Vs. Scaling Out
  • Challenges Of Scaling Out
  • Solution For Data Explosion-Hadoop
  • Hadoop: Introduction
  • Hadoop In Layman's Term
  • Hadoop Ecosystem
  • Evolutionary Features Of Hadoop
  • Big Data Benchmarks
  • Hadoop Timeline
  • Why Learn Big Data Technologies?
  • Who Is Using Big Data?
  • Yearly Salaries In Big Data World
  • Job Trends In Big Data
  • HDFS: Introduction
  • Design Of HDFS
  • Why Hadoop Cluster?
  • HDFS Blocks
  • Components Of Hadoop 1.X
  • NameNode And Hadoop Cluster
  • Arrangement Of Racks
  • Arrangement Of Machines And Racks
  • Local FS And HDFS
  • NameNode
  • Checkpointing
  • Replica Placement
  • Benefits-Replica Placement And Rack Awareness
  • URI
  • URL And URN
  • HDFS Commands
  • Problems With HDFS In Hadoop 1.X
  • HDFS Federation (Included In Hadoop 2.X)
  • HDFS Federation
  • High Availability
  • Configuration Files In Hadoop
  • HDFS Configurations
  • Core Configurations
  • Configuration Files In Hadoop
  • Java API To Read HDFS File
  • Java API To Write HDFS File
  • Java API - Listing Of File In HDFS
  • Important Java Classes To Read From HDFS
  • Anatomy Of File Read From HDFS
  • Data Read Steps
  • Checksum And Data Integrity
  • Data Read From HDFS: Additional Points
  • Important Java Classes To Write Data To HDFS
  • Anatomy Of File Write To HDFS
  • Writing File To HDFS: Steps
  • Handling Failures During Writing A File
  • Building Principles
  • Introduction To MapReduce
  • Some More Real-World Examples
  • Broad Steps
  • Finding Out Maximum Temperature
  • Pseudo Code
  • Mapper Class
  • Reducer Class
  • Driver Code
  • Exploring Methods Of Mapper
  • Exploring Methods Of Reducer
  • InputSplit
  • InputSplit And Data Blocks – Difference
  • Why Is The Block Size 128 MB?
  • RecordReader
  • InputFormat
  • Default Inputformat: TextInputFormat
  • MapReduce Example
  • OutputFormat
  • Using A Different OutputFormat
  • Important Points
  • Data Locality
  • JobTracker And TaskTracker
  • Speculative Execution
  • Combiner
  • Using Combiner
  • Partitioner
  • Using Partitioner
  • Map Only Job
  • Flow Of Operations In MapReduce
  • Serialization In MapReduce
  • Custom Writable In MapReduce
  • Custom Writable In MapReduce
  • Custom WritableComparable In MapReduce
  • Overview
  • Schedulers In YARN
  • FIFO Scheduler
  • Capacity Scheduler
  • Fair Scheduler
  • Differences Between Hadoop 1.X And Hadoop 2.X
  • Introduction
  • Pig Vs SQL
  • Adages/Philosophy Of Pig
  • Some Use-Cases
  • Why Pig?
  • Apache Pig Architecture
  • Simple Data Types
  • Complex Data Types
  • Samples Execution
  • Operators Installation
  • Nested Foreach: Getting Count Of Distinct Names
  • Our DataSets
  • Pig Operators: UNION
  • Pig Operators: COGROUP
  • Pig Operators: FLATTEN
  • Pig Operators: PARALLEL
  • Parameter Substitution
  • Macros
  • Anatomy Of Reduce-Side-Join
  • Job Optimizations In Pig
  • Evaluate UDF In Pig
  • Working With DEFINE
  • Filter UDF In Pig
  • Execution Of XML Files In Pig
  • Execution Of CSV Files In Pig
  • Non-Linear Data Flows And Multiquery
  • Optimizations In Pig
  • Project Discussion
  • Case Study Discussion
  • General Discussion
  • Flume: Introduction
  • Installation
  • Flume Architecture
  • Example Description
  • Demo:Working_With_Flume_example
  • Demo: Exec_source
  • Demo: Spool_dir
  • Transactions
  • Batching
  • Exec Source
  • Spooling Directory Source
  • File Channel
  • Memory Channel
  • Logger Sink
  • HDFS Sink
  • Partitioning
  • Interceptor
  • Demo: Interceptor.Conf
  • Demo: Partition.Conf
  • Binary File Format
  • Demo: Sequencefile.Conf
  • Fan Out
  • Demo: Fanout.Conf Selector In Fan Out
  • Map Reduce Limitations And Motivation Towards Spark
  • What Is Spark?
  • Features
  • Spark Unified Platform
  • Spark In Hadoop Ecosystem
  • Why In-Memory Processing?
  • Terasort Wining
  • Most Active Project In Apache Spark Survey
  • Industries Using Spark
  • Popular Use Cases Across The Industry Wide
  • Spark Components - Driver
  • Executor
  • Worker
  • Spark Master
  • Significance Of Spark Context
  • Spark APIs Overview
  • Resilient Distributed Datasets
  • Properties Of RDD
  • Creating RDDs
  • Transformations In RDD
  • Actions In RDD
  • Saving Data Through RDD
  • Key-Value Pair RDD
  • Introduction
  • Hive DDL
  • Demo: Databases.Ddl
  • Demo: Tables.Ddl
  • Hive Views
  • Demo: Views.Ddl
  • Architecture
  • Primary Data Types
  • Data Load
  • Demo: ImportExport.Dml
  • Demo: HiveQueries.Dml
  • Demo: Explain.Hql Table Types
  • Demo: ExternalTable.Ddl
  • Complex Data Types
  • Demo: Working With Complex Datatypes
  • Hive Variables
  • Demo: Working With Hive Variables
  • Hive Variables And Execution Customisation
  • Demo: Working With Hive Execution
  • A Walkthrough Of Hive Components Architecture
  • Execution Engines Of Hive
  • The Metastore
  • Overview Of Hive Internal
  • Advantages & Limitations Hive Clients
  • Services And Clients
  • Installing Hive
  • Working With Arrays
  • Sort By And Order By
  • Distribute By And Cluster By
  • Partitioning
  • Static And Dynamic Partitioning
  • Bucketing Vs Partitioning
  • Joins And Types
  • Bucket-Map Join
  • Sort-Merge-Bucket-Map Join
  • Left Semi Join
  • Demo: Join Optimisations
  • Input Formats In Hive
  • Sequence Files In Hive
  • RC File In Hive
  • File Formats In Hive
  • ORC Files In Hive
  • Inline Index In ORC Files
  • ORC File Configurations In Hive
  • SerDe In Hive
  • Demo: CSVSerDe
  • JSONSerDe
  • RegexSerDe
  • Analytic And Windowing In Hive
  • Demo: Analytics.Hql
  • Hcatalog In Hive
  • Demo: Using_HCatalog
  • Accessing Hive With JDBC
  • Demo: HiveQueries.Java
  • HiveServer2 And Beeline
  • Demo: Beeline
  • UDF In Hive
  • Demo: ToUpper.Java And Working_with_UDF
  • Optimizations In Hive
  • Demo: Optimizations
  • Challenges With Traditional RDBMS
  • Features Of NoSQL Databases
  • NoSQL Database Types
  • CAP Theorem
  • What Is HBase Regions
  • HBase HMaster ZooKeeper
  • HBase First Read
  • HBase Meta Table
  • Region Server Components
  • HBase Write Steps
  • HBase MemStore
  • HBase Region Flush
  • HBase HFile
  • HBase Read Merge
  • Read Amplification
  • HBase Minor Compaction
  • HBase Major Compaction
  • Region Split
  • HDFS Data Replication
  • Data Recovery
  • Apache HBase Architecture Benefits
  • HBase Vs. RDBMS
  • Shell Commands
  • Java Classes For DDL
  • HBaseConfiguration
  • Java Classes For DML
  • Put Method
  • KeyValue Class
  • Client Side Write Buffer
  • List Of Puts
  • Handling Failure In Put
  • Atomic Compare-And-Set (CAS)
  • Get Method
  • GetRowOrBefore
  • Delete Method
  • Effect Of Setting Timestamps
  • Atomic Compare-And-Delete (CAD)
  • Scan Operation
  • Caching
  • Batching
  • Batch Operations
  • HBase Filters
  • Types Of HBase Filters
  • Performances With HBase Filters
  • HBase Filters With Command Line
  • HBase Counters
  • Other Clients Of HBase
  • Apache Thrift And REST
  • HBase REST Java API
  • Bulk Load Techniques: Custom MapReduce
  • Hive Integration With HBase
  • Pig Integration With HBase
  • Performance Considerations
  • Introduction To Oozie
  • Oozie Architecture
  • Oozie Workflow Nodes
  • Oozie Server
  • Oozie Workflow
  • Sqoop Architecture
  • Sqoop Features
  • Sqoop Hands On
  • Major Project Introduction
  • Case Study Discussion
  • Major Project Discussion
  • Case Study Discussion
  • Major Project Discussion
  • Case Study Discussion
  • Why Is Data So Important?
  • Pre-Requisite – Data Scale
  • What Is Big Data?
  • Big Bank: Big Challenge
  • Customer Churn Analysis
  • Point-Of-Sale Transaction Analysis
  • Common Problems
  • 3 Vs Of Big Data
  • Defining Big Data
  • Sources Of Data Flood
  • Exploding Data Problem
  • Redefining The Challenges Of Big Data
  • Possible Solutions: Scaling Up Vs. Scaling Out
  • Challenges Of Scaling Out
  • Solution For Data Explosion-Hadoop
  • Hadoop: Introduction
  • Hadoop In Layman's Term
  • Hadoop Ecosystem
  • Evolutionary Features Of Hadoop
  • Big Data Benchmarks
  • Hadoop Timeline
  • Why Learn Big Data Technologies?
  • Who Is Using Big Data?
  • Yearly Salaries In Big Data World
  • Job Trends In Big Data
  • HDFS: Introduction
  • Design Of HDFS
  • Why Hadoop Cluster?
  • HDFS Blocks
  • Components Of Hadoop 1.X
  • NameNode And Hadoop Cluster
  • Arrangement Of Racks
  • Arrangement Of Machines And Racks
  • Local FS And HDFS
  • NameNode
  • Checkpointing
  • Replica Placement
  • Benefits-Replica Placement And Rack Awareness
  • URI
  • URL And URN
  • HDFS Commands
  • Problems With HDFS In Hadoop 1.X
  • HDFS Federation (Included In Hadoop 2.X)
  • HDFS Federation
  • High Availability
  • Configuration Files In Hadoop
  • HDFS Configurations
  • Core Configurations
  • Configuration Files In Hadoop
  • Java API To Read HDFS File
  • Java API To Write HDFS File
  • Java API - Listing Of File In HDFS
  • Important Java Classes To Read From HDFS
  • Anatomy Of File Read From HDFS
  • Data Read Steps
  • Checksum And Data Integrity
  • Data Read From HDFS: Additional Points
  • Important Java Classes To Write Data To HDFS
  • Anatomy Of File Write To HDFS
  • Writing File To HDFS: Steps
  • Handling Failures During Writing A File
  • Building Principles
  • Introduction To MapReduce
  • Some More Real-World Examples
  • Broad Steps
  • Finding Out Maximum Temperature
  • Pseudo Code
  • Mapper Class
  • Reducer Class
  • Driver Code
  • Exploring Methods Of Mapper
  • Exploring Methods Of Reducer
  • InputSplit
  • InputSplit And Data Blocks – Difference
  • Why Is The Block Size 128 MB?
  • RecordReader
  • InputFormat
  • Default Inputformat: TextInputFormat
  • MapReduce Example
  • OutputFormat
  • Using A Different OutputFormat
  • Important Points
  • Data Locality
  • JobTracker And TaskTracker
  • Speculative Execution
  • Combiner
  • Using Combiner
  • Partitioner
  • Using Partitioner
  • Map Only Job
  • Flow Of Operations In MapReduce
  • Serialization In MapReduce
  • Custom Writable In MapReduce
  • Custom Writable In MapReduce
  • Custom WritableComparable In MapReduce
  • Overview
  • Schedulers In YARN
  • FIFO Scheduler
  • Capacity Scheduler
  • Fair Scheduler
  • Differences Between Hadoop 1.X And Hadoop 2.X
  • Introduction
  • Pig Vs SQL
  • Adages/Philosophy Of Pig
  • Some Use-Cases
  • Why Pig?
  • Apache Pig Architecture
  • Simple Data Types
  • Complex Data Types
  • Samples Execution
  • Operators Installation
  • Nested Foreach: Getting Count Of Distinct Names
  • Our DataSets
  • Pig Operators: UNION
  • Pig Operators: COGROUP
  • Pig Operators: FLATTEN
  • Pig Operators: PARALLEL
  • Parameter Substitution
  • Macros
  • Anatomy Of Reduce-Side-Join
  • Job Optimizations In Pig
  • Evaluate UDF In Pig
  • Working With DEFINE
  • Filter UDF In Pig
  • Execution Of XML Files In Pig
  • Execution Of CSV Files In Pig
  • Non-Linear Data Flows And Multiquery
  • Optimizations In Pig
  • Project Discussion
  • Case Study Discussion
  • General Discussion
  • Flume: Introduction
  • Installation
  • Flume Architecture
  • Example Description
  • Demo:Working_With_Flume_example
  • Demo: Exec_source
  • Demo: Spool_dir
  • Transactions
  • Batching
  • Exec Source
  • Spooling Directory Source
  • File Channel
  • Memory Channel
  • Logger Sink
  • HDFS Sink
  • Partitioning
  • Interceptor
  • Demo: Interceptor.Conf
  • Demo: Partition.Conf
  • Binary File Format
  • Demo: Sequencefile.Conf
  • Fan Out
  • Demo: Fanout.Conf Selector In Fan Out
  • Map Reduce Limitations And Motivation Towards Spark
  • What Is Spark?
  • Features
  • Spark Unified Platform
  • Spark In Hadoop Ecosystem
  • Why In-Memory Processing?
  • Terasort Wining
  • Most Active Project In Apache Spark Survey
  • Industries Using Spark
  • Popular Use Cases Across The Industry Wide
  • Spark Components - Driver
  • Executor
  • Worker
  • Spark Master
  • Significance Of Spark Context
  • Spark APIs Overview
  • Resilient Distributed Datasets
  • Properties Of RDD
  • Creating RDDs
  • Transformations In RDD
  • Actions In RDD
  • Saving Data Through RDD
  • Key-Value Pair RDD
  • Introduction
  • Hive DDL
  • Demo: Databases.Ddl
  • Demo: Tables.Ddl
  • Hive Views
  • Demo: Views.Ddl
  • Architecture
  • Primary Data Types
  • Data Load
  • Demo: ImportExport.Dml
  • Demo: HiveQueries.Dml
  • Demo: Explain.Hql Table Types
  • Demo: ExternalTable.Ddl
  • Complex Data Types
  • Demo: Working With Complex Datatypes
  • Hive Variables
  • Demo: Working With Hive Variables
  • Hive Variables And Execution Customisation
  • Demo: Working With Hive Execution
  • A Walkthrough Of Hive Components Architecture
  • Execution Engines Of Hive
  • The Metastore
  • Overview Of Hive Internal
  • Advantages & Limitations Hive Clients
  • Services And Clients
  • Installing Hive
  • Working With Arrays
  • Sort By And Order By
  • Distribute By And Cluster By
  • Partitioning
  • Static And Dynamic Partitioning
  • Bucketing Vs Partitioning
  • Joins And Types
  • Bucket-Map Join
  • Sort-Merge-Bucket-Map Join
  • Left Semi Join
  • Demo: Join Optimisations
  • Input Formats In Hive
  • Sequence Files In Hive
  • RC File In Hive
  • File Formats In Hive
  • ORC Files In Hive
  • Inline Index In ORC Files
  • ORC File Configurations In Hive
  • SerDe In Hive
  • Demo: CSVSerDe
  • JSONSerDe
  • RegexSerDe
  • Analytic And Windowing In Hive
  • Demo: Analytics.Hql
  • Hcatalog In Hive
  • Demo: Using_HCatalog
  • Accessing Hive With JDBC
  • Demo: HiveQueries.Java
  • HiveServer2 And Beeline
  • Demo: Beeline
  • UDF In Hive
  • Demo: ToUpper.Java And Working_with_UDF
  • Optimizations In Hive
  • Demo: Optimizations
  • Challenges With Traditional RDBMS
  • Features Of NoSQL Databases
  • NoSQL Database Types
  • CAP Theorem
  • What Is HBase Regions
  • HBase HMaster ZooKeeper
  • HBase First Read
  • HBase Meta Table
  • Region Server Components
  • HBase Write Steps
  • HBase MemStore
  • HBase Region Flush
  • HBase HFile
  • HBase Read Merge
  • Read Amplification
  • HBase Minor Compaction
  • HBase Major Compaction
  • Region Split
  • HDFS Data Replication
  • Data Recovery
  • Apache HBase Architecture Benefits
  • HBase Vs. RDBMS
  • Shell Commands
  • Java Classes For DDL
  • HBaseConfiguration
  • Java Classes For DML
  • Put Method
  • KeyValue Class
  • Client Side Write Buffer
  • List Of Puts
  • Handling Failure In Put
  • Atomic Compare-And-Set (CAS)
  • Get Method
  • GetRowOrBefore
  • Delete Method
  • Effect Of Setting Timestamps
  • Atomic Compare-And-Delete (CAD)
  • Scan Operation
  • Caching
  • Batching
  • Batch Operations
  • HBase Filters
  • Types Of HBase Filters
  • Performances With HBase Filters
  • HBase Filters With Command Line
  • HBase Counters
  • Other Clients Of HBase
  • Apache Thrift And REST
  • HBase REST Java API
  • Bulk Load Techniques: Custom MapReduce
  • Hive Integration With HBase
  • Pig Integration With HBase
  • Performance Considerations
  • Introduction To Oozie
  • Oozie Architecture
  • Oozie Workflow Nodes
  • Oozie Server
  • Oozie Workflow
  • Sqoop Architecture
  • Sqoop Features
  • Sqoop Hands On
  • Major Project Introduction
  • Case Study Discussion
  • Major Project Discussion
  • Case Study Discussion
  • Major Project Discussion
  • Case Study Discussion
  • Get your doubts cleared with mentor.
Projects Which Students Will Develop
State-Wise Development Analysis In India
Aim of this project is to analyze how various state governments have performed in different developmental schemes. This analysis will be helpful in finding out how successful the government has been in implementing various projects.
State-Wise Development Analysis In India
USA Crime Analysis
Aim of this project is to analyze which area in USA is more crime prone and what type of crime is more prominent in different areas of USA. This analysis will help in understanding the efficiency of USA police in solving criminal cases.
USA Crime Analysis
Twitter Sentiment Analysis
Aim of this project is to perform Sentiment analysis on Twitter data to analyze the sentiments related to a particular aspect.
Twitter Sentiment Analysis
USA Consumer Forum Data Analysis
Aim of this project is to analyze performance of various companies on aspects like customer query resolution time, customer satisfaction rate, etc. and determine which of them is more customer centric.
USA Consumer Forum Data Analysis
Titanic Data Analysis
Aim of this project is to analyze the casualty details like average age of the passenger who survived and died, number of females survived, details of passengers travelling in different classes etc.
Titanic Data Analysis
Job Preparation Week
After you complete your course, our unique job preperation solution makes sure you can check out all the essentials of your job preperation checklist, right from your resume to your interview skills.
In-depth Mock Interviews
With 2 In depth mock interviews, you are at complete edge over the others.
Resume Building And Interview Questions
Resume builds the first impression and we help you build a resume that stands out.
Online Reputation Building
Helps build a strong online presence in LinkedIn, Git, Stack Overflow and many more.
Resume Sharing With Top Employers
Your resume is shared with top employers, so that you find your dream job.
30+
Offers Made
To Students
2500+
Hours Spent
Coding
100+
Recommendations
Given By Clients
500+
Projects Completed
By Students
Places you could land up to
Customers Feedback
FAQ's
Hadoop is an open source software framework that is used for storing and processing big data. This course focuses on improved performance in terms of data processing by emphasizing on implementing real-time case studies within a stipulated duration. This course will enable trainees to take real-time big data projects after successfully completing the course.
Any graduate aiming to successfully build their career around big data can do this course. The course will be beneficial for:
  • Software Developers and Architects
  • Professionals with analytics and data management profile
  • Business Intelligence Professionals
  • Project Managers
  • Data Scientists
  • Professionals with business intelligence, ETL, and data warehousing background
  • Professionals from testing and mainframes background warehousing
After your training you will be equipped with the necessary skills that will help you handle big data projects in any sector. We provide real-time case studies, projects, and assignments that span for around 200 hours. Extra assistance, like mock interviews sessions, resume building, and career guidance are included in the job preperation week.
Big data and Hadoop have many components like Pig, Hive, and Hbase where Java is not a pre-requisite. People from various domains with no prior knowledge of Java have got successfully trained with us and are now working in the big data industry. Though, knowledge of core Java is an added advantage, as it acts as a main component of Hadoop (MapReduce is implemented in Java).
Data scientists handle business needs as and when requirements arise, they also prepare plans to implement the analytics project.
A big data developer on the other hand is responsible for the design and implementation of applications to perform analysis on the data generated to uncover insights and to make a business intelligent by analyzing data from various sources.
A basic knowledge of Java and SQL will be helpful, however it is not mandatory.
Big data is huge collection of data tha can be referred to as an asset. Big data can include many different types of data in different formats. Hadoop is essentially a programming framework that stores and processes huge data. It is basically a tool to handle big data to get business insights.
Extensive training will be given on MapReduce, Pig, Hive, HBase, Oozie, Sqoop, Flume, and Spark.
Mentors are qualified big data professionals with a minimum of 5+ years of experience. A love for coding and a passion for teaching are essential prerequisites in all our mentors.
Absolutely! We strongly encourage students who come up with their own ideas.
You need a Mac or Windows machine and an Internet connection with a minimum speed of 500 kbps.
Besides the classes, spending around 3 hours for revision and self-study every day will be enough.
  • If you decide to leave within first week of class starts , we refund the fess completely.
  • If you decide to leave before the class starts, 50% of the total paid fee would be deducted and the remaining amount will be refunded to the user.
  • The refund policy would be applied if the total amount paid is more than 50% of the course fees.
  • If a user is opting for a complementary course, then the refund policy would be applied only on the 1st course.
The classes are held on weekends as well as on weekdays. You can enroll for a batch that is convenient to suit your personal schedule.
If you want to learn more about the courses offered by AcadGild, mail us at enquiry@acadgild.com with your mobile number and we will reach out to you.

Upcoming batches of Big Data Hadoop Training

Course Name Location Start Date Course Detail
Big Data Hadoop Training Bangalore 06 May Know More
Big Data Hadoop Training Chennai 06 May Know More
Big Data Hadoop Training Pune 06 May Know More
Big Data Hadoop Training Hyderabad 06 May Know More
Big Data Hadoop Training Delhi 06 May Know More
Big Data Hadoop Training Mumbai 06 May Know More
Big Data Hadoop Training Kolkata 06 May Know More
Big Data Hadoop Training Bhubaneswar 06 May Know More
Big Data Hadoop Training San-diego 06 May Know More
Big Data Hadoop Training San-francisco 06 May Know More
Big Data Hadoop Training Charlotte 06 May Know More
Big Data Hadoop Training Boston 06 May Know More
Big Data Hadoop Training Philadelphia 06 May Know More
Big Data Hadoop Training Washington 06 May Know More
Big Data Hadoop Training Newyork 06 May Know More