How does a POC work in a Data Science project?
Do you still need advice to make your Big Data/AI project a success ? You’ve come to the right place. We talked about the critical steps to build a Data Lab, now is the time to explain how the POC (Proof Of Concept) is going to go.
A POC is, as you may know, is a Proof Of Concept. It is an experimentation that aims to assess the value, relevance and/or feasibility of a solution or an idea before it is implemented. In Big Data and AI, it is typically used to test a use case. The hardest part in those types of projects is that Data Scientists need to work with data coming from a Data Lake, which requires data preparation and processing. This is why the Data Engineer is also involved and the two of them need to work closely to make it work.
The longest part : data preparation
Data Preparation represents almost 80% of the time that you will spend on the whole project. It may seem useless to waste so much time on it, but it is a critical step to understand your data, explore it and prepare it to be processed.
Understanding: It is a priority to know your data, how it can be used and identify the most valuable ones. Many may not be used to address the use case you chose and you might even need more. Although it can be frustrating, you can find other sources such as Open Data or data feeds that are commercialized by specialized firms.
Exploration: Once you have them, explore them. You will need to check their reliability using KPIs (average, variance, quartile, class, seasonality…). Then, it is time to identify data gaps (blank fields, anomalies) ; and let me tell you: it takes time. Most of the code your team writes will be used for exploration purposes (up to 75%!).
Preparation: getting data “prepared” is about pre-processsing and feature engineering. The former consists in deleting weak and aberrant values. The latter is used to highlight new features that could be useful when it will come to creating patterns.
Our piece of advice: Go agile !
“Fail Fast, Try Again”
Try adopting the “agile” approach (Scrum, Kanban) rather than going headstrong off in any direction. It means iterating and sharing results along the way with business teams. Every use case subpart then depends on the KPIs. We will not tell you much more about it as it is not the place, but we think it is the right way to do it so we selected a great article about it if you want to know more.
Making data valuable with algorithms:
Your data is now ready, all you have to do is to make it valuable to your business. The next and final step is creating algorithms, mostly using Machine Learning. It can be regression, clustering or classification, but we explained it all in our Machine Learning article so what we will focus on is what you can do with it:
- Prediction:you can forecast stock or sales numbers based on history and trends.
- Anomaly detection: it allows to identify outliers and inconsistent data in your datasets.
- Segmentation: it is about gathering patchy data that share identical features such as clients coming from different geographies, but who share the same budget.
Now you know about Data Labs and POC, but you still have a lot to learn to make your Data project succeed. Don’t miss our next article about how critical it can be to embed the business vision in a Data Science project, and be even more prepared.