Hadoop 4 BI

The Hadoop open-source project is a Java framework aimed at facilitating data storage (HDFS) and processing (MapReduce) of huge data volumes. In Big Data, including in BI projects, Hadoop became a de facto solution allowing to work with petabytes of data by splitting files and distributing them on thousands of cluster nodes.

Training plan

1 - Hadoop environment

Saagie Data Fabric, Cluster, Capsules, other Big Data actors

2 - Data storage

Datalake, datamart

3 - Modeling

BI modeling, Big Data modeling, information flows

4 - Integration

Hue, Sqoop, HDFS tools, Talend, initiation to Kafka

5 - Processing

Impala, Hive, R, Talend, initiation to Spark & Python

6 - Visualization

Data visualization third-party solutions

7 - Operationalization

Scheduling, pipelines, Saagie API, promotion of jobs between environments, environment variables

8 - Optimizations

Impala, Sqoop split, Jobtracker, differential integration

9 - Security

Sentry

Trainers

Nicolas
Nicolas
Data Architect
Nicolas
Data Lover

With his huge dataset, Nicolas is the Romeo of the BI architecture and write poestry with the data.

Christophe
Christophe
Data Architect
Christophe
Data Lumberjack

No data is big enough to resist to Christophe strength. He will wood cut it log by log so data scientists can do their magic with it.

Training goals
  • Adapting BI concepts to a Big Data environment based on Hadoop and its ecosystem
Duration
2 days
Needed skills
  • Prior knowledge in SQL
Can we use some cookies?

This site uses cookies. An explanation of their purpose can be found below. To comply with new EU regulation, please confirm your consent to their use by clicking "Accept". After consenting, you will not see this message again.

Know more about tracers