Supported technologies

Saagie provides the first-in-class open source technologies.

Apache Hadoop Distributed File System

Hadoop Distributed File System is a distributed, scalable and portable file system developed by the Apache Software Foundation.

You can store terabytes of data just by adding thousands of commodity servers. It also manages server failure, as every piece of data is replicated at least three times.

Apache Impala

Impala is an open source analytic data query engine that runs on Apache Hadoop.

You can now process terabytes of data with minimal skills in SQL.

Apache Hive

Hive is a data warehouse software facilitating querying and managing large datasets residing in distributed storage.

Hive is one of the most stable SQL query engine.

Apache Drill

Ready for data exploration? Drill allows you to query from heterogeneous data sources using a single SQL query. Works with HDFS, Mongo, Hive or Elasticsearch.

Apache Spark

Spark is an open source cluster computing framework developed by the UC Berkeley AMPLab, the Apache Software Foundation and Databricks to process huge data volumes.

Spark can distribute machine learning over a cluster of servers. The other benefit is that you can cover the whole data pipeline with a single technology. We have been supporting every version of Spark since 1.5.

Apache Kafka

Kafka is one of the most famous distributed message brokers.

It will help you design event stream pipelines to manage your real-time issues.

Apache Sqoop

Sqoop is an online command-line interface application for transferring data between relational databases and Hadoop developed by the Apache Software Foundation.

If you need to import a SQL database from Oracle, SQL Server, MySQL or PostgreSQL simply use Sqoop, and your data will be exported to your datalake.


Talend is a complete set of open source software to extract and integrate data.

If you are a data consultant, Talend will be your best friend to ingest data or make data aggregation.


Java and Scala jobs offer the ability to process content in the JVM.

Don’t forget that big data is just data. If you are a developer, you can write data ingestion & data aggregation in Java or Scala. Java 7 and 8 are supported.


R is a programming language and a statistic data analysis environment.

Use R to set up your tailor-made algorithms and statistic calculations. R has been rising up for the last three years.


Python is a programming language that lets you work more quickly and integrate your systems more effectively.

Python has been used for years in data science laboratories in universities. Python provides you with the most complete and stable data science libraries.


We provide several versions of the Jupyter notebook bundled with the best of breed of each language ecosystem (Python, R, Scala, Spark, Ruby, Haskell & Julia).

Notebooks allow you to test processing and machine learning algorithms over the real datalake. You can share your notebook files (including charts, maps) with your teammates to get feedbacks.


MongoDB is a cross-platform document-oriented database.

Mongo DB can be used as a datamart because the schema is flexible and it’s easy to use for dataviz app developers.


MySQL Community is the world’s most popular open-source database.

Sometimes you just need a plain simple SQL DB to store your results.


Another famous relational database.

Perfect to handle workloads for your business apps.


Elasticsearch is the most popular enterprise search engine with distributed, multitenant and full-text capabilities.

It can be used to search all kinds of documents.


Docker allows you to deploy dedicated dataviz applications or APIs. You can also deploy specific processing (Fortran, C++, Golang, Rust) or any Docker file as notebooks or special data applications.

Two benefits: firstly we maintain your docker alive so you can focus on your code, and secondly you can test anything that fits into Docker on the Saagie Data Fabric.


Kubernetes (a.k.a. K8s) is the new reference for Docker container management systems.

It is continuously adopted by mid and large companies to manage resources and applications deployment on several infrastructures.

Try Saagie now!
Request a demo
Can we use some cookies?

This site uses cookies. An explanation of their purpose can be found below. To comply with new EU regulation, please confirm your consent to their use by clicking "Accept". After consenting, you will not see this message again.

Know more about tracers