Docker or containerization is one of the most trending topics in both software development and Data Science right now. Understanding the concepts underlying this technology is as important as knowing how to read and write. After a short introduction and why you should use it, we’ll explore its terminology such as Dockerfile, Docker image and Docker container.
What is Docker?
Wow, quite tough to get started, right? In simpler words, Docker is a platform to develop, deploy and run applications inside containers. It allows users (software developer or Data Scientist) to package an application with all of its dependencies into standardized units (containers).
But, is it a Sort of Virtual Machine (VM)?
This is one of the most asked questions about Docker. The answer is: actually, no.
Docker will only share the resources of the host machine to run its environments.
Unlike Docker, a virtual machine will include a complete operating system. It will work independently and act like a computer.
The Docker Terminology
Before creating your first Docker application, it’s helpful to be familiar with its terminology.
Dockerfile: A recipe for creating an Image with special Docker syntax.
From the official documentation:
“A Dockerfile is a text document that contains all the commands you would normally execute manually in order to build a Docker image.”
Dockerfile contains the necessary instructions to set the right environment to deploy your application. For example, a Dockerfile for a Machine Learning application could tell Docker to add NumPy, Pandas, and Scikit-learn in an intermediate layer. The first step to take when you create a Dockerfile is to access the DockerHub website. It contains many pre-designed images that will help you save.
Docker image: Once your code is ready with your application, and the Dockerfile is written, all you have to do is create your image to contain your app.
“An image contains the Dockerfile, libraries, and code your application needs to run, all bundled together.”
To get a new Docker image, you can either get it from a registry (such as the DockerHub) or create your own. There are tens of thousands of images available on DockerHub. You can also search for images directly from the command line using docker search. An important distinction to be aware of when it comes to images is the difference between base and child images:
- Base images are images that have no parent image, usually images with an OS like ubuntu, python or Debian.
- Child images are images that build on base images and add additional functionalities.
Then there are official and user images, which can be both base and child images:
- Official images are images that are officially maintained and supported by the company. These are typically one word long. For example, the python, ubuntu and hello-world images are official images.
- User images are images created and shared by users. They build on base images and add additional functionality.
Once the image is created, your code is ready to be launched.
Docker container: “A Docker container is a Docker image brought to life.”
A container contains:
- A Docker image;
- An execution environment;
- A standard set of instructions.
Executing a Docker image with a dedicated command will create and start a container from that image.
Docker Hub: A registry of Docker images. You can think of the registry as a directory of all available images. If required, one can host their own registries and can use them for pulling images.
How Does it Work Step-by-Step?
1 – First, you can download and install Docker Community Edition for free.
2 – Then, you create your first Docker image by writing then building a Dockerfile.
Tip: you can copy an existing Dockerfile or get inspired with various examples available on DockerHub and create your own. A Dockerfile defines the image.
3 – After that, you create and run a container from your Docker image.
That’s it, your container should be up and running!
We only scratched the surface of Docker, and there is so much more you need to learn. We recommend reading the official tutorial on Docker’s website to get started on Docker. The tutorial will explain to you how to create and run your first Docker application in a short time! It’s unbelievable; you would actually have to do it by yourself to believe it!
Docker is one of the tools that as a software engineer and now data scientists should have in their portfolio of skills, in the same way as git to version your application.