The Data Lab : the key to your Big Data / AI project success
Now is the time to make your data valuable ! Don’t worry, we are here to help you do it. How? We will have an article about a tool, a method or just give you a piece of advice about succeeding a Big Data/ AI project once a week, for six weeks. This week is about Data Lab as it is the first piece of the Big data/ AI puzzle.
What is a Data Lab?
Let’s get to it ! A Data Lab is, as its name points out, a Lab involving data and people. It is called a lab because it is dedicated to experimentation and qualification of the company’s data. It allows you to explore datasets, to process them and to use them to test Machine Learning algorithms.
To picture it, a Data Lab is like an agile and evolving startup running within the company’s walls. It helps you be more data centric without disrupting the whole company’s organization.
Why is it a big deal?
In order to industrialize Data Science and to make Data & Analytics come true, what you really need is a clear strategic vision that everyone in the company can relate to. It seems obvious but, according to Gartner, it is the most common issue that finally leads to failure. So what does it take? Building a Data Lab.
How to build a Data Lab?
What you want to achieve with your Data Lab must be clearly defined before you do anything. As Gartner pointed it out, a project without full alignment is most likely going to fail.
Once you know what it takes in terms of resources, you need to put a team together. The hard part is to find the right balance between technical and business requirements. As you can see on the image below, it has 3 main teams. Most of the profiles are “technical” ones (Data Engineer, Data Scientist, Data Analyst…), that is why the “product owner” is here to make sure it all fits with the business vision.
First things first : the Data Lake
Before diving into data, you need to know where to store them. That is what the Data Lake is for: a place where data can be easily stored and accessible, whether they are structured or not, and where they can easily be enriched during your project.. For processing, you will need technologies like Spark, Talend, Atlas, Python, R, Hadoop and much more. Other tools will be required to extract and import them. In fact, for every step of the data life cycle, different technologies will be involved for data preparation, Data Governance and Machine Learning algorithms.
Which data to use?
Every piece of data can be useful to your business, it all depends on the use case you choose. But first, you need to ask yourself : Where does it come from? How to centralize it? Is it okay to use it? To answer these questions, a big part of what you need to do to get you started is listing.
Your internal data are the most obvious ones: logs left by users on your website or your application / software are a primary source. There are also external data that can be combined with your internal ones: one is “open data” which is free, but others come from organizations dedicated to collect and resell data.
In short, putting together a Big Data/AI project involves having a clear business vision, a multidisciplinary team and the technologies to process data. That is the reason why the Data Lab is a game changer and could allow your project to be part of the 20% that succeed. However, you will need to go through the critical POC step, and that is what we will walk you through next time.