DataOps: DevOps 2.0
Is DataOps only DevOps applied to data or is it a new paradigm allowing you to accelerate Data & Analytics projects deployment?
The new emerging concept called DataOps was born from the complexity to deploy data projects. Figures don’t lie: 8 out of 10 projects don’t make it into production.
What is DevOps?
DevOps is the portmanteau of “Development” and “Operations”.
It is a technical, cultural and organizational approach that aims to accelerate features and applications delivery.
The main goal is to accelerate the “time-to-market” with shorter development cycles, more frequent deployments and continuous deliveries.
DevOps is based on two main concepts, Continuous Integration (CI) and Continuous Delivery (CD):
- Continuous Integration: it consists in building, integrating and testing new code in a repeated and automated way. It allows to quickly identify – and thus solve – potential issues.
- Continuous Deployment automates software delivery. Once an app has gone through every step of qualification testing, DevOps allows it go to production.
To put it simply, the DevOps approach ensures the development team and the operations team alignment and allows to automate every step of the software creation cycle, from its development and deployment to its management.
What is DataOps?
Gartner defines it as “a collaborative data management practice focused on improving the communication, integration and automation of data flows between data managers (Data Engineers, Data Architects, Data Stewards) and data consumers (Data Scientists, Business Analysts, Business teams) across an organization.”.
DataOps aims to improve and optimize Data & Analytics life cycle in terms of rapidity and quality.
DataOps uses technology to automate conception, deployment and management of data deliveries. It serves as a technological orchestrator for your project.
It shares with DevOps the goal to put collaboration at the heart of the project.
The Agile Manifesto recommends: “people over process over tools”. Many people are involved in Data & Analytics projects. Making them all work together is at least as important as what will be your choice of technologies.
The main common principles: Lean & Agile
A few practices are common to DevOps and DataOps:
- Automation (CI/CD)
- Unit tests
- Environnements management
- Versions management
These practices favor communication and collaboration between teams, allow quicker projects deployment and reduced costs.
The main distinctions
The two approaches are very much alike but also differ on a few matters.
DevOps offers automation and agility but shows limitations when it comes to creating applications that are meant to process data in real time. And Data & Analytics projects mean building and maintaining data pipelines (or data flows).
A data pipeline represents a data flow, from its conception to its consumption.
Data comes from one end of the pipeline, goes through numerous preparing and processing steps to exit as models, reports and dashboards. This pipeline is the “Ops” aspect of data analyze.
Other differences come from Data Science projects specificities :
- Results repeatability
- Model performances monitoring as models can quickly change depending on the data you use
- Models exposition to users
If you truly want to benefit from both DevOps and DataOps, you will need a technologies orchestrator. It will help you:
- Managing data from extraction to consumption – including storage, preparation, processing, visualization.
- Both ease and accelerate Data & Analytics projects deployment as all the technologies you need are gathered, updated and available (Elasticsearch, PostgreSQL, Talend, Java, Scala, Jupyter, Docker, Mongo DB et MySQL).
- Improve collaboration and communication within the company as every member of the team is involved and work on a unique centralized tool.
DataOps is a new emerging concept without any truly defined standards or boundaries yet. But the results are already getting noticed as data projects are deployed quicker when DataOps is involved.
To learn even more about it, make sure to read our last DataOps digest.