Apache Spark Developer Training Certification Course

  4.5 Ratings
  1000 Learners

The average salary of Apache Spark developer is $11k and Apache Spark today remains the most active open source project in Big Data with over 1000 contributors. Spark offers over 80 high-level operators that make it easy to build parallel apps. AcadGild’s course on Apache Spark offers in-depth exposure to the scopes of Apache Spark.

Featured In
Acadgild gets ranked as one of the Top 10 Worldwide Technology Boot Camps.
Course Overview
Introduction to Big Data and Apache Spark
Gain an insight into the present data landscape, the limitations of Hadoop how the introduction of Apache Spark mitigates these challenges.
Core functions of Spark and Spark Shell
Get acquainted with RDD’s and their importance in Spark, usage of Spark shell in various languages and execution of Spark’s core functions in the shell.
In-Memory Management
Understand how Spark runs its applications in memory and how RDD’s are useful in in-memory management and how to cache effectively.
Working with various API’s in Spark
Develop codes in various languages like Java, Scala, Python and R and many more to develop Spark applications.
Spark Streaming
Learn how to process the strong streaming framework embedded in Spark using various functions in Spark streaming to process the live data.
Spark SQL & MLlib
Learn how to run SQL queries using Spark SQL engine and to develop Machine Learning applications using Spark’s MLlib
Highly Experienced Mentors
Lifetime access to Dashboard
Develop 2 real time projects in Spark
24X7 Coding
Free Job Preparation Week
Course Syllabus
  • Overview of Big Data
  • Characteristics of Big Data
  • Types of data
  • Sources of Big Data
  • Big Data examples
  • Scaling
  • Hadoop batch processing o
  • Hadoop ecosystem
  • What is streaming data?
  • Batch vs streaming data processing
  • Real time analytics options
  • Map reduce limitations and motivation towards Spark
  • What is Spark?
  • Features
  • Spark unified platform
  • Spark in Hadoop ecosystem
  • Why in-memory processing?
  • Terasort wining
  • Most active project in Apache
  • Spark survery
  • Industries using Spark
  • Popular use cases across the industry wide
  • Spark components - Driver
  • Executor
  • Worker
  • Spark master
  • Significance of Spark context
  • Spark APIs overview
  • Resilient distributed datasets
  • Properties of RDD
  • Creating RDDs
  • Transformations in RDD
  • Actions in RDD
  • Saving data through RDD
  • Key-value pair RDD
  • Installing Spark locally (Live)
  • Invoking Spark shell
  • Loading a file in shell
  • Hands-on word count program
  • Performing some basic operations on files in Spark shell
  • Spark application overview
  • Job scheduling process
  • DAG scheduler
  • RDD graph and lineage
  • Narrow and wide dependencies
  • Life cycle of spark application
  • RDD lineage
  • Caching overview
  • Caching and persistence
  • Data locality
  • How to choose between the different persistence levels for caching RDDs
  • Spark memory allocation
  • Broadcast variables
  • Accumulators
  • Word count example in explanation and development in 3 APIs
  • Code walk-through on translating spark transformations to equivalent Java transformation
  • Spark packages
  • IDE integration
  • Building project with SBT
  • Building project with maven
  • Running the application in cluster
  • Submit in cluster mode
  • Web UI - application monitoring
  • Log files
  • Important spark configuration properties
  • Spark application execution on a cluster
  • Scheduling process
  • How a Spark application breaks down into jobs -> stages -> tasks
  • Cluster managers: Local mode
  • Standalone scheduler
  • YARN
  • Mesos
  • Serialization in Spark
  • How to implement custom input format
  • Partition transformations
  • Storing data in database
  • Mentee can select project from predefined set of AcadGild projects or they can come up with their own ideas for their projects
  • Mentee can select project from predefined set of AcadGild projects or they can come up with their own ideas for their projects
  • Best practices/ common mistakes
  • Optimization techniques
  • General troubleshooting
  • Memory (RAM) management
  • Spark streaming overview and architecture
  • Example: Streaming word count demo
  • DStreams
  • Breakdown of DStreams to RDD batches
  • Spark streaming example program demo and code walk through
  • Walkthrough of various Spark streaming sources
  • Custom receivers
  • Sliding window operations on DStreams
  • Streaming UI overview
  • Checkpointing
  • Multiple receivers and the Union transformation
  • Spark SQL overview
  • Spark SQL demo
  • Comparison of Apache Hive vs Spark SQL
  • SchemaRDD and data frames
  • Integration with Spark streaming
  • Spark SQL example program demo and code walk through
  • Demo on tools learnt in the session
  • Overview of Spark MLlib basics
  • Walkthrough of various algorithms and examples
  • Overview of Spark GraphX
  • Mentee can select project from predefined set of AcadGild projects or they can come up with their own ideas for their projects
  • Mentee can select project from predefined set of AcadGild projects or they can come up with their own ideas for their projects
  • Mentee can select project from predefined set of AcadGild projects or they can come up with their own ideas for their projects
  • Mentee can select project from predefined set of AcadGild projects or they can come up with their own ideas for their projects
Projects Which Students Will Develop
Airbnb is an online marketplace and hospitality service, enabling people to lease or rent short-term lodging including vacation rentals, apartment rentals, homestays, hostel beds, or hotel rooms. In this project, we will be analyzing the data of Airbnb and derive useful insights for the company's development
Movie Recommendation Engine
Building a movie recommendation engine using Matrix factorization with Alternating Least Squares.Using K-Means algorithm user and their ratings were given to the movies will be clustered.
Twitter sentiment Analysis
Real-time spam user filtering and sentiment analysis. Using spark streaming, real-time tweets will be collected and spam users will be filtered out.
Yelp Data Analysis
Drawing some insights for the business development of a company based on the given data.
Customers Feedback
Spark course is designed to help you gain expertise in Big Data Ecosystem and Spark essential skills like RDD, Spark Streaming, Spark SQL, Machine Learning, GraphX. This course will help the candidates to understand In-Memory Data Processing.
Anybody aiming to successfully build their career around Big Data can do this course. This course will be beneficial for:
  • Software Developers and Architects
  • Professionals with analytics and data management profile
  • Business Intelligence Professionals
  • Project Managers
  • Data Scientists
  • Professionals with Business Intelligence, ETL and data warehousing background
  • Professionals from testing and mainframes background
This course will equip the learners with skills that would help them in handling Big Data projects in various companies. We provide real time case studies, projects and assignments to equip our trainees with required skills to excel in projects related to Spark. Extra assistance like mock interviews sessions, resume building, career guidance related to openings in various companies would help you to land your dream Job in Big data and Spark Industry.
  • Microsoft® Windows® 7/8/10 (32- or 64-bit)
    • 4GB RAM minimum, 8 GB RAM recommended
    • I3 or higher processor
    • Intel® VT-x (Virtualization Technology) should be enabled
Hadoop is a combination of Filesystem and processing engine while Spark is an execution engine which works on different file systems, most preferably the one is HDFS (Hadoop File System). So learning Hadoop will be an additional advantage for Spark but is not mandatory.
Mentors are qualified developers in the field with a minimum of 5+ years of experience. A love for coding and a passion for teaching are essential prerequisites in all our mentors.
All you need is a Windows or Mac machine and an Internet connection of minimum speed of 500Kbps.
Besides the classes, spending around 3 hours each day will be enough
If you decide to leave within first week of class starts , we refund fully. If you decide to leave before the class starts, 50% of the total paid fee would be deducted and the remaining amount will be refunded to the user. The refund policy would be applied if the total amount paid is more than 50% of the course fees. If a user is opting for a complementary course, then the refund policy would be applied only on the 1st course.
The classes are held on weekends as well as on weekdays. You can enroll for a batch that is convenient to suit your personal schedule.