Test Saagie in a few clicks with our interactive demo!

logo saagie red
illustration blog data lab definition

Everything you need to know about the Data Lab

It’s time to bring value to your data! That’s why we’ve decided to help you doing it! 

What is a Data Lab?

Let’s get to the heart of the matter: as its name suggests, the Data Lab is a true data laboratory. Why is it a data lab? Because it is a space exclusively dedicated to experimentation and the “functional” qualification of the company’s various data. Indeed, it allows to explore data sets, to process them but also to test Machine Learning algorithms.

To imagine it, the Data Lab is similar to a startup, agile and in constant evolution, but which would take place within the company itself. It can thus become more data centric or data driven, without disrupting the entire organization of a company.

To put a Data Science or Artificial Intelligence project into production, the key ingredient is a clear strategic vision, shared and supported by the entire company. If this seems obvious, it is, however, according to Gartner, what is lacking in the majority of cases. So much for theory, but in practice, what is needed? A data lab.

Its creation is essential to the success of such a project. The very challenge of such a structure is to take advantage of its data, to transform it into added value. To do this, the Data Lab enables you to define the most appropriate use cases for your company, whether it’s for trend forecasting or fraud detection.

How to set up a Data Lab?

To set up a Data Lab, you must first define its objectives beforehand. Indeed, as Gartner points out, without alignment of the entire company, the project will have less chance of reaching production. And then, of course, you need data.

Pre-requisite: a Data Lake

Before you start looking at the data, you need to know where to put it. This is the role of the “data lake”. All your data, structured or not, is grouped and accessible there, and can be enriched throughout the course of your project. As for data mining, it is only possible with a certain number of technologies (Spark, Talend, Avro, Atlas, Hadoop, Cassandra and many others). Thus, other tools will be required to ensure data extraction and import, data processing, data governance and data protection. In addition, different technologies will be available for Data Science projects. After having gathered all these technologies that will be the tools that will allow your teams to work, only the data to be integrated remains.

Once you have identified the necessary resources, setting up a team is then essential, and this is no small task. You will need to give priority to technical experts in their field (Data Architect, Data Scientist, Data Engineer and Data Analyst), in addition to the Product Owner, who will provide the product vision and the business profiles who will bring their knowledge of the business issues. If all the profiles need to interact at one time or another, some will even be required to collaborate closely.

The Chief Data Officer

The Chief Data Officer (CDO) ensures that the organization adopts a data-driven strategy and has a global knowledge of the different profiles. The CDO receives as input a customer requirement and must convey it to the whole team by transforming the customer’s discourse into a list of functionalities to be developed. It is therefore not uncommon to hear him talk about budget, costing and business needs!

Our advice: avoid talking to him at too precise a technical level, because even if he needs to be familiar with Data technologies, he may not necessarily be up to date with the latest technologies used.

The Data Scientist

The Data Scientist is at the heart of the data team, delivering models to the Data Engineers on the one hand, and making the results readable for the customer on the other hand – often together with other profiles such as the Data Analyst. He or she is aware of the customer’s needs through the CDO as well as the development guidelines implemented by the Data Engineer.

Our advice: With him you can talk about neural networks, Python or R code or how to present the results to the customer. On the other hand, he will not be as precise about commercial discussions and may not be able to answer all the subtleties of the data architecture.

The Data Engineer

The Data Engineer has a perfect mastery of the technical tools that allow the manipulation of data. He works closely with the Big Data Architect, with whom he shares the management of the infrastructure.
The Data Engineer provides simple data access to data scientists and integrates the data scientist’s models in production.

Our advice: database vocabulary has no secrets for him. You can also talk about the APIs he has created from the models developed by the Data Scientists. Avoid the notions of budget and customer presentation.

The (Business) Data Analyst

The Data Analyst stands between the Data Scientist and the end customer and is responsible for the perfect translation of the customer’s needs into technical instructions. As a specialist in business communication, the Data Analyst reports the technical results to the customer.

Our advice: he can ask the Data Engineer to perform simple analyses on a sample of data. This way, he will be at ease with certain notions of databases and data processing. On the other hand, avoid overly scientific vocabulary.

The Data Lab challenges

Once your Data Lab is in place, you will need to ensure that it is working properly. For this purpose, we have prepared a few tips and points of attention to make your task easier.
The political aspect

Even if from a strategic point of view the members of Comex understand the value of a digital transformation, in practice data labs often operate in isolation and have to fight for their budgets. When data labs answer directly to a member of the executive committee, the chances of them benefiting from sufficient supervision to be able to evolve according to the company’s needs are much greater.

Encouraging companies to change

The technologies related to Big Data seem very complex at first glance, but will undoubtedly become more and more popular. First of all, we need to focus on use cases and encourage employees to work together (even if this is easier said than done). The difficulty of building a relationship between IT and business goes well beyond the technical aspect: there are various elements such as differences in working methods, culture, and skill levels, in addition to the human and political aspects.

Making data accessible is a company-wide challenge

It seems tempting to turn directly to complicated data science projects. However, our experience has taught us that companies often start by automating BI chains in order to appropriate Big Data technologies. The next critical step is to make the datasets accessible to business users and connect the visualization tools in the house.

This requires a layer of shared metadata (or data dictionary). It is also necessary to set up a process that validates requests for datasets from business users, while respecting constraints such as data quantities, data quality, batch or real-time processing, the necessary infrastructure, and data security issues. Only a minority of companies have been able to implement this, which also confirms that the datalab alone is not the solution.

Data governance

This is the element that allows everything to work in a coordinated way. Even if the RGPD (Règlement Général sur la Protection des Données) is announced as a constraint, a shared understanding of all these processes is a prerequisite. Businesses need access to datasets, visualization tool analysts and developers need secure access to data via APIs.

In addition, data governance helps determine whether data is recent, of good quality, and provides auditing and management capability. Knowing who has had access to the data and the history of the algorithms used ensures peace of mind within teams and mutual trust.

Focusing on production from day one

A very common mistake is to work on POCs in IT shadow mode. Working with laborious methods and constraints imposed by the CISO is tiring and time consuming. It is like climbing a staircase and a step appears every time you are about to reach the top. Nevertheless, it is essential to involve the ISD from the start if you really want to change.

Understanding the technical debt

There have always been more or less successful data initiatives. On the other hand, the real issues are often evaded, because nobody likes to communicate about their failures. However, it is important to understand the elements that led to these situations and to draw the right lessons from them.

As you can see, the realization of a Big Data / AI project is complex because it involves a clear vision, a multidisciplinary team and a large number of technologies. This is why the implementation of a Data Lab is a determining element, which will allow your project to be part of the 20% that reach production. To find out more, don’t miss part 2 of this column which will be devoted to the development of a POC.