Spark & Kafka Streams for real-time processing

The association of Kafka and Apache Spark Streaming was until recently the most commonly used method to create data pipelines exploiting real-time data streaming functions, beyond Hadoop traditional batch modes.

Nowadays other technologies like Apache Spark Structured Streaming, Kafka Streams or Flink allow teams to go even further. For instance, real-time data processing pipelines can leverage Machine Learning to perform fraud detection, personalized recommendations or predictive maintenance, thus creating real business value.

Training plan

1 - Introduction to Kafka

2 - Exercise on Kafka

3 - Presentation of the real-time issues

4 - Presentation of Apache Spark Streaming / Structured Streaming

5 - Exercise of continuous calculation of indicators with Structured Streaming

6 - Presentation of Kafka Streams

7 - Exercise of continuous calculation of indicators with Kafka Streams

8 - General principles of Machine Learning

9 - Presentation of a Machine Learning algorithm

10 - Implementation of a complete data processing pipeline

Trainers

Sébastien
Sébastien
Data Scientist
Sébastien
Deep Pythonist

Scikit learn, deep learning frameworks. He knows how to push the platform to the limit.

Romain
Romain
Head of Research
Romain
Bac+12

Surfer and accomplished data scientist. Spent a lot of years at school. Specialist in recommendation.

Training goals
  • Implementing real-time data pipelines
  • Mastering the concepts of Apache Kafka
  • Manipulating new frameworks
  • Build a real-time fraud detection application
Duration
3 days
Needed skills
  • Basics in mathematics
  • Basics in programming (Java or Scala)
Can we use some cookies?

This site uses cookies. An explanation of their purpose can be found below. To comply with new EU regulation, please confirm your consent to their use by clicking "Accept". After consenting, you will not see this message again.

Know more about tracers