It’s all started with a pressure of splitting the monolithic implementation of Docker and Moby Project as result. Now Docker consist of several components on every particular machine. Confusion happens when people are talking about these different components of the Docker. So let’s improve the situation…
Docker CLI (docker)
Docker is used as a reference to the whole set of docker tools and at the beginning, it was a monolith. But now
docker-cli is only responsible for user-friendly communication with docker.
So the command’s like
docker build ...
docker run ... are handled by Docker CLI and result in the invocation of dockerd API.
The Docker daemon - dockerd listens for Docker API requests and manages the host’s Container life-cycles by utilizing contanerd
dockerd can listen for Docker Engine API requests via three different types of Socket: unix, tcp, and fd. By default, a unix domain socket is created at
/var/run/docker.sock, requiring either root permission, or docker group membership. On Systemd based systems, you can communicate with the daemon via Systemd socket activation, use
dockerd -H fd://.
There are many configuration options for the daemon, which are worth checking if you work with docker (dockerd).
My impression is that
dockerd is here to serve all the features of Docker (or Docker EE) platform, while actual container life-cycle management is “outsourced” to containerd.
containerd was introduced in Docker 1.11 and since then took the main responsibility of managing containers life-cycle.
containerd is the executor for containers, but has a wider scope than just executing containers. So it also takes care of:
- Image push and pull
- Managing storage
- Of course executing of Containers by calling runc with the right parameters to run containers…
- Managing of network primitives for interfaces
- Management of network namespaces containers to join existing namespaces
containerd fully leverages the OCI runtime specification1, image format specifications, and OCI reference implementation (runc). Because of its massive adoption, containerd is the industry standard for implementing OCI. It is currently available for Linux and Windows.
As shown in the picture above,
contained includes a daemon exposing gRPC API over a local UNIX socket. The API is a low-level one designed for higher layers to wrap and extend.
RunC to run containers according to the OCI specification.
containerd is based on the Docker Engine’s core container runtime to benefit from its maturity and existing contributors, however,
containerd is designed to be embedded into a larger system, rather than being used directly by developers or end-users.
Well, now other vendors can use containers without having to deal with docker-related parts.
let’s go through some subsystems of
runc (OCI runtime) canbe seen as component of containerd.
runc is a command-line client for running applications packaged according to
the OCI format and is a compliant implementation of the OCI spec.
Containers are configured using bundles. A bundle for a container is a directory that includes a specification file named “config.json” and a root filesystem. The root filesystem contains the contents of the container.
Assuming you have an OCI bundle you can execute the container
# run as root cd /mycontainer runc run mycontainerid
(docker-)containerd-ctr - it’s barebone CLI (ctr) designed specifically for development and debugging purposes for direct communication with
containerd. It’s included in the releases of
By that less interesting for docker users.
The shim allows for daemon-less containers. According to Michael Crosby it basically sits as the parent of the container’s process to facilitate a few things.
- First it allows the runtimes, i.e.
runc, to exit after it starts the container. This way we don’t have to have the long-running runtime processes for containers.
- Second it keeps the STDIO and other fds open for the container in the case
containerdand/or docker both die. If the shim was not running then the parent site of the pipes or the TTY master would be closed and the container would exit.
- Finally it allows the container’s exit status to be reported back to a higher level tool like docker without having the be the actual parent of the container’s process and do a wait4.
How it all works together
We can do an experiment. First we check what Docker processes are running right after Docker installation.
ps fxa | grep docker -A 3 # prints: 2239 ? Ssl 0:27 /usr/bin/dockerd -H fd:// 2397 ? Ssl 0:18 \_ docker-containerd -l unix:///var/run/docker/libcontainerd/docker-containerd.sock ... ...
well at this point we see that dockerd is started and containerd is running as a child process too. Like described,
Now let’s run one simple container, that executes for a minute and then exits.
docker run -d alpine sleep 60
Now we should see it in the process list in the next 60 seconds. Let’s check again:
ps fxa | grep dockerd -A 3 #prints 2239 ? Ssl 0:28 /usr/bin/dockerd -H fd:// 2397 ? Ssl 0:19 \_ docker-containerd -l unix:///var/run/docker/libcontainerd/docker-containerd.sock ... 15476 ? Sl 0:00 \_ docker-containerd-shim 3da7... /var/run/docker/libcontainerd/3da7.. docker-runc 15494 ? Ss 0:00 \_ sleep 60
Now we see the whole process chain:
dockerd –> containerd –> containerd-shim –> “sleep 60” (desired process in the container).
We do not see
runc in the chain, we know
containerd-shim takes over after
runc has started the container. Also, we know that theoretically
containerd-shim can survive the crash of
containerd. But in the current docker version, it’s not activated by default.
However, it’s a pretty long chain with possible disadvantages that such chains might have.
How it all works in Kubernetes
You might imagine that Kubernetes do not need Docker-specific parts. As of now, it’s exactly the case…
Kubernetes “speaks” with
contanerd directly as depicted in the picture. If interested, check how it was in between.
I hope this might help all Docker users. Give me a hint if something is not precise.
The OCI Runtime Specification outlines how to run a containers “filesystem bundle” that is unpacked on disk. At a high level, an OCI implementation would download an OCI Image (OCI Image Specification) then unpack that image into an OCI Runtime filesystem bundle. At this point, the OCI Runtime Bundle would be run by an OCI Runtime. ↩︎