Open source streaming analytics – how to get fast predictions from real-time data with Flink, Kafka, and Cassandra

Streaming Analytics (or Fast Data) is becoming an increasingly popular subject in enterprise organizations. The reason for this is that customers want to have real-time experiences, such as notifications and advise based on their online behavior and other users’ actions. In this talk, I’ll present a streaming analytics engine that is powered by Apache Flink. Kafka is used for the message bus and Cassandra for the state management. The machine learning models are made with Knime and Spark, exported to PMML format, and evaluated using the Openscoring.io library.

A typical streaming analytics solution consists of three steps: reading event data, evaluating the events with the aid of business rules and machine learning alogithms, and producing meaningful output. All three steps will be covered with an example use case. The streaming analytics engine is powered by Flink. I’ll use Kafka for the message bus and Cassandra for the state management. The machine learning models are made with Knime and Spark, exported to PMML formant, and evaluated using the Openscoring.io library.