What exactly is a data product, and how to build one in a data driven manner? In this session I will dive into those questions. This will be done, based on recent experience in a project where a particular search technology was replaced by a data drive search pipeline.
First some context will be sketched by laying out the starting point of this project. How, we moved from a once-a-day update of the index, to real-time updates in an architecture based on ElasticSearch, kafka, microservices, command query responsibility separation, and real-time monitoring using kafka-streams.
Next, various parts of the new pipeline will be highlighted while discussing what kind of data was measured and how it steered the engineering efforts. Various lessons learned, such as in which order to do things when building a data product and how to deal with the relation between engineering and data science will be discussed.
Finally, we will have a look on how machine learning techniques such as learning-to-rank, and entity recognition, where added to the mix. After this session you will have a better understanding of the pitfalls when building a data product and some concrete anchors to drive the engineering efforts of your own team.