An in-depth course on the most powerful big data tools.
Who this course is for:
The course is designed for Data Engineers who want to learn more about Spark, but also Hadoop and Hive
In the course, you will learn the following basic topics:
- Hadoop (basic components, vendor distributions)
- HDFS architecture
- YARN architecture
- Data formats
- Spark
- Spark Streaming and Flink
- Hive
- Orchestration, Monitoring and CI/CD
etc.
Learn how to put it all into practice and consolidate with interesting and challenging homework assignments and a final project.
After taking this course, you will be able to:
- Use Hadoop to process data
- Interact with its components via console clients and APIs
- Work with loosely structured data in Hive
- Write and optimize applications on Spark
- Write tests for Spark applications
- Use Spark to process tabular, streaming, geo-data, and even graphs
- Configure CI and monitoring of Spark applications
Required Knowledge
Experience writing code in at least one of the following languages: Python, Java, Scala
Basic knowledge of SQL and experience with any relational database
A computer or a Linux-based virtual machine with at least 8 GB of RAM
- take a full set of training materials: video recordings of all webinars, presentations for classes, as well as solutions of problems and projects in the form of code on github and other additional materials;
- Receive a certificate of completion;
- receive an invitation to an interview in partner companies (the most successful students get this opportunity).