Hadoop 4 BI

The Hadoop open-source project is a Java framework aimed at facilitating data storage (HDFS) and processing (MapReduce) of huge data volumes. In Big Data, including in BI projects, Hadoop became a de facto solution allowing to work with petabytes of data by splitting files and distributing them on thousands of cluster nodes.


Data Architect
Data Architect
Data Architect
Data Architect

Training plan

1 – Hadoop environment

Saagie Data Fabric, Cluster, Capsules, other Big Data actors

2 – Data storage

Datalake, datamart

3 – Modeling

BI modeling, Big Data modeling, information flows

4 – Integration

Hue, Sqoop, HDFS tools, Talend, initiation to Kafka

5 – Processing

Impala, Hive, R, Talend, initiation to Spark & Python

6 – Visualization

Data visualization third-party solutions

7 – Operationalization

Scheduling, pipelines, Saagie API, promotion of jobs between environments, environment variables

8 – Optimizations

Impala, Sqoop split, Jobtracker, differential integration

9 – Security


Request a training session

Loading form...
Training goals
  • Adapting BI concepts to a Big Data environment based on Hadoop and its ecosystem
2 days
Needed skills
  • Prior knowledge in SQL
Can we use some cookies?

This site uses cookies. An explanation of their purpose can be found below. To comply with new EU regulation, please confirm your consent to their use by clicking "Accept". After consenting, you will not see this message again.

Know more about tracers