Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, Spark ML for machine learning, GraphX for graph data processing, and Spark Streaming for live data stream processing. With Spark running on Apache Hadoop YARN, developers can create applications to derive actionable insights within a single, shared dataset in Hadoop.
This training course will teach you how to solve Big Data problems using Apache Spark framework. The training will cover a wide range of Big Data use cases such as ETL, DWH, data virtualization, streaming, graph data structure, machine learning. It will also demonstrate how Spark integrates with other well established Hadoop ecosystem products. You will learn the course curriculum through theory lectures, live demonstrations and lab exercises. This course will be taught in Python programming language.
Lab Exercise
Lab Exercise
Lab Exercises
Lab Exercise
Lab Exercise
Lab Exercise
Lab Exercise
Lab Exercise
Lab Exercises
If you need further information about this course, please contact: