Spark & Kafka Streams for real-time processing

The association of Kafka and Apache Spark Streaming was until recently the most commonly used method to create data pipelines exploiting real-time data streaming functions, beyond Hadoop traditional batch modes.

Nowadays other technologies like Apache Spark Structured Streaming, Kafka Streams or Flink allow teams to go even further. For instance, real-time data processing pipelines can leverage Machine Learning to perform fraud detection, personalized recommendations or predictive maintenance, thus creating real business value.


Data Scientist
Data Scientist
Head of Research
Head of Research

Training plan

1 – Introduction to Kafka

2 – Exercise on Kafka

3 – Presentation of the real-time issues

4 – Presentation of Apache Spark Streaming / Structured Streaming

5 – Exercise of continuous calculation of indicators with Structured Streaming

6 – Presentation of Kafka Streams

7 – Exercise of continuous calculation of indicators with Kafka Streams

8 – General principles of Machine Learning

9 – Presentation of a Machine Learning algorithm

10 – Implementation of a complete data processing pipeline

Request a training session

Loading form...
Training goals
  • Implementing real-time data pipelines
  • Mastering the concepts of Apache Kafka
  • Manipulating new frameworks
  • Build a real-time fraud detection application
3 days
Needed skills
  • Basics in mathematics
  • Basics in programming (Java or Scala)
Can we use some cookies?

This site uses cookies. An explanation of their purpose can be found below. To comply with new EU regulation, please confirm your consent to their use by clicking "Accept". After consenting, you will not see this message again.

Know more about tracers