Spark & PySpark

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.

  • 32 Hours
  • 250
  • Basic to Advanced
Interested in this course?
Instructor Led
20,000
Online
20,000

Quick Stats

More than half (51%) of the respondents in surveys consider Spark Streaming as an essential component for building real-time streaming use cases, and 82% of respondents say the same for advanced analytics.

This year, the production use of Spark Streaming jumped from 14% (in 2015) to 22% (in 2016), along with Machine Learning from 13% (in 2015) to 18% (2016).

Spark in the public cloud, reaping its many benefits. Spark deployments in the cloud this year is at 61%, up from 51% last year.

Benefits

Practical Hands-On

Practical hands-on training on Spark cluster and complete ecosystem.

Real Life Case Studies

Live projects with real-time scenarios and examples that involves big data analytics platform/framework.

Practical Assignments

Practical assignments after every class.

Who Should Attend

  • Software Developers/Professionals
  • Analytics Professionals
  • Managers
  • Decision Makers
  • Technical Infrastructure Teams
  • Architects
  • BI /ETL/DW Developers/Professionals
  • Senior IT Professionals
  • Testing Professionals
  • Freshers

Course Outcome

  • Understanding of Big Data and Spark architecture.
  • Understanding of Spark cluster and various important configurations.
  • Complete setup of Spark ecosystem that includes various tools  such as PySpark, SparkSQL and DataFrames.
  • Understanding of Spark Python API (PySpark).
  • Understanding of R on Spark (SparkR).
  • Understanding of Spark RDDs (Resilient Distributed Datasets).
  • Understanding of Spark SQL and DataFrames.

Curriculum

Instructors

  • Gaurav Bansal

    Big Data and Machine Learning Consultant

    Gaurav is Big Data and Machine Learning Consultant and he has worked with multi national companies such as HCL Technologies, Samsung Research n' Development and Fidelity Investment Solutions in the past 10 years and has been working in Analytics industry since the beginning of his career.

    He has worked for international markets as an expert in Business Intelligence, Data Analytics, ....
    Gaurav is Big Data and Machine Learning Consultant and he has worked with multi national companies such as HCL Technologies, Samsung Research n' Development and Fidelity Investment Solutions in the past 10 years and has been working in Analytics industry since the beginning of his career.

    He has worked for international markets as an expert in Business Intelligence, Data Analytics, Machine and Deep Learning.

    He has worked extensively on Big Data technologies such as Apache Hadoop, Amazon Web Services, Apache Spark, Apache Flink, Apache Beam and Google Cloud Platform. He possesses good knowledge on programming languages such as Python, R and Scala. Read Less
    Read More

FAQS

Who are the Instructors?

All the instructors have a minimum of 10 years of experience in IT industries and are subject matter experts.

Can I attend a demo session before enrollment?

Yes, you can attend a demo session in any ongoing batch.

What if I miss classes?

You may choose either of the below options:

You may view the recorded session available at our management system. You may attend the missed session in other ongoing batch.