Designed with Inputs from Top Industry Professionals
Industry professionals from top tech companies guide our industry-aligned curriculum. Our curriculum is continually refreshed with the latest on big data technologies so that you never fall behind.
Introduction to Big Data
From data-driven strategies to decision making, our future capabilities heavily rely on big data technologies. In this course, you learn about the core concepts behind big data problems, applications, and systems. Understand the different ways to manage big data and how Hadoop fits into this role.
Introduction to the Hadoop Framework
Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. Master the Hadoop framework's features and learn how to install and run programs on it!
MapReduce is an application framework that you can use to develop and run applications on Hadoop and Spark. We discuss MapReduce phases and data processing methods for various file formats along with real-world examples.
Spark's usage is 21%, up from last year's modest 3% as it is being viewed as the next generation Hadoop. It is an open-source data processing framework that runs up to 100 times faster than MapReduce, supports real-time stream processing of data, provides high-level APIs in Java, Python, Scala, and R, and also comes with an in-built machine-learning library, MLib.
We will cover Spark's programming model in much detail and teach you how to install, run, and interact with Spark to analyze real-world datasets.
Pig is a dataflow language that is built on top of Hadoop to make it easier to process, clean, and analyze big data. We teach you how to execute its Hadoop jobs in MapReduce and Spark.
Kafka is a distributed streaming platform that lets you build real-time streaming applications and data pipelines through a combination of messaging, storage, and stream processing. We teach you how to transform data as it arrives.
Get introduced to Hive and its similarity with SQL and learn to write Hive Query Language (HQL) statements. Understand the architecture of Hive, databases creation, tables, and perform various operations using it.
HBase is a non-relational (NoSQL) database that runs on top of HDFS and is natively integrated with Hadoop. Learn how to combine data sources that use a wide variety of different structures and schemas.
Sqoop and Flume
Import and export data from traditional databases, like SQL, Oracle to Hadoop using Apache® Sqoop to perform various operations. Master how to import streaming data into Hadoop using Apache® Flume.
Learn about Oozie and implement it in the workflow to schedule a Hadoop job.
Basics of Linux and Java
The course begins with the fundamentals of Linux, Java and Scala that are necessary to learn Hadoop and Spark.