Baseimage-docker, fat containers and "treating containers as VMs"

Baseimage-docker is a minimal Ubuntu base image that is modified for Docker-friendliness. People can pull Baseimage-docker from the Docker Registry and use it as a base image for their own images.

We were early adopters of Docker, using Docker for continuous integration and for building development environments way before Docker hit 1.0. We developed Baseimage-docker in order to solve some problems with the way Docker works, most notably the PID 1 zombie reaping problem.

We figured that:

The problems that we solved are applicable to a lot of people.
Most people are not even aware of these problems, so things can break in unexpected ways (Murphey's law).
It's inefficient if everybody has to solve these problems over and over.

So in our spare time we extracted our solution into a reusable base image that everyone can use: Baseimage-docker. We didn't want to see the community reinventing the wheel over and over. Our solution seems to be well-received: we are the most popular third party image on the Docker Registry, only ranking below the official Ubuntu and CentOS images.

Fat containers, "treating containers as VMs"

Over time, many people got the impression that Baseimage-docker advocates "fat containers", or "treating containers as VMs". The Docker developers strongly advocate small, lightweight containers where each container has a single responsibility. The fact that Baseimage-docker advocates the use of multiple processes seems to go against this philosophy.

However, what the Docker developers advocate is running a single logical service per container. Or more generally, a single responsibility per container. Baseimage-docker does not dispute this. Consider that a single logical service can consist of multiple OS processes. Baseimage-docker does not advocate fat containers or treating containers as VMs at all.

Does Baseimage-docker advocate running multiple logical services in a single container? Not necessarily, but we do not prohibit it either. Although the Docker philosophy advocates slim containers, we believe that sometimes it makes sense to run multiple services in a single container, and sometimes it doesn't.

Why multiple processes?

The most important reason why Baseimage-docker advocates multiple OS processes is because it's necessary to solve the PID 1 zombie reaping problem. If you're not familiar with it, you should have a look.

Zombies

The second reason is that splitting your logical service into multiple OS processes also makes sense from a security standpoint. By running different parts of your service as different processes with different users, you can limit the impact of security vulnerabilities. Baseimage-docker provides tools to encourage running processes as different users, e.g. the setuser tool.

The third reason is to automatically restart processes that have crashed. We saw that a lot of people use Supervisord for this purpose, but Baseimage-docker advocates Runit instead because we think it's easier to use, more efficient and less resource-hungry. Before Docker 1.2, if your main process crashes then the container is down. With the advent of Docker 1.2 -- which introduced automatic restarts of containers -- this has reason has become less relevant. However, Runit is still useful for the purpose of running different parts of your service as different users, for security reasons. And sometimes it may make sense to restart only a part of the container instead of the container as a whole.

Baseimage-docker is about freedom

Although following the Docker philosophy is a good thing, we believe that ultimately you should decide what makes sense. We see Docker more as a general-purpose tool, comparable to FreeBSD jails and Solaris zones. Our primary use cases for Docker include:

Continuous integration.
Building portable development environments (e.g. replacing Vagrant for this purpose).
Building controlled environments for compiling software (e.g. Traveling Ruby and passenger_rpm_automation).

For these reasons, Baseimage-docker was developed to accept the Docker philosophy where possible, but not to enforce it.

How does Baseimage-docker play well with the Docker philosophy?

So when we say that Baseimage-docker modifies Ubuntu for "Docker friendliness" and that it "accepts the Docker philosophy", what do we mean? Here are a few examples.

Environment variables

Using environment variables to pass parameters to Docker containers is very much in line with "the Docker way". However, if you use multiple processes inside a container then the original environment variables can quickly get lost. For example, if you use sudo then sudo will nuke all environment variables for security reasons. Other software, like Nginx, nuke environment variables for security reasons too.

Baseimage-docker provides a mechanism for accessing the original environment variables, but it is sufficiently secured so that only processes that you explicitly allow can access them.

"docker logs" integration to become better

Baseimage-docker tries its best to integrate with docker logs where possible. Daemons have the tendency to log to log files or to syslog, but logging to stdout/stderr (which docker logs exposes) is much more in line with the Docker way.

In the next version of Baseimage-docker, we will adhere better to the Docker philosophy by redirecting all syslog output to docker logs.

SSH to be replaced by "docker exec"

Baseimage-docker provides a mechanism to easily login to the container using SSH. This also contributes to why people believe that Baseimage-docker advocates fat containers.

However, fat containers have never been the reason why we include SSH. The rationale was that there should be some way to login to the container for the purpose of debugging, inspection or maintenance. Before Docker 1.4 -- which introduced docker exec -- there was no mechanism built into Docker for logging into a container or running a command inside a container, so we had to introduce our own.

There are people who advocate that containers should be treated as black boxes. They say that if you have to login to the container, then you're designing your containers wrong. Baseimage-docker does not dispute this either. SSH is not included because we encourage people to login. SSH is included mainly to handle contingencies. No matter how well you design your containers, if it's used seriously in production then there will come one day when you have to look inside it in order to debug a problem. Baseimage-docker prepares for that day.

Despite this, the SSH mechanism has been widely criticized. Before Docker 1.4, most critics advocated the use of lxc-attach and nsenter. But lxc-attach soon became obsolete because Docker 0.7 moved away from LXC as backend. Nsenter was a better alternative, but suffered from its own problems, such as the fact that it was not included in most distributions which were widely used back then, as well as the fact that using nsenter requires root access on the Docker (which, depending on your requirements, may or may not be acceptable). Of course, SSH also had its own problems. We knew that there is no one-size-fits-all solution. So instead of replacing SSH with lxc-attach/nsenter, we chose to support both SSH and nsenter, and we clearly documented the pros and cons of each approach.

Docker 1.4 finally introduced the docker exec command. This command is like nsenter; indeed, it appears to be a wrapper around a slightly modified nsenter binary that is included by default with Docker. This is great: it means that for a large number of use cases, neither SSH nor nsenter are necessary. However, some of the issues that are inherent with nsenter are still applicable. For example, running docker exec requires access to the Docker daemon, but users who have access to the Docker daemon effectively have root access.

However, we definitely acknowledge "docker exec" as more in line with "the Docker way". So in the next version of Baseimage-docker, we will adopt "docker exec" as the default mechanism for logging into a container. But because of the issues in "docker exec", we will continue to support SSH as an alternative, although it will be disabled by default. And we will continue to clearly document the pros and cons of each approach, so that users can make informed decisions instead of blindly jumping on bandwagons.

Conclusion

Baseimage-docker is not about fat containers or about treating containers as VMs, and the fact that it encourages multiple processes does not go against the Docker philosophy. Furthermore, the Docker philosophy is not binary, but a continuum. So we are even actively developing Baseimage-docker to become increasingly in line with the Docker philosophy.

Is Baseimage-docker the only possible right solution?

Of course not. What Baseimage-docker aims to do is:

To make people aware of several important caveats and pitfalls of Docker containers.
To provide pre-created solutions that others can use, so that people do not have to reinvent solutions for these issues.

This means that multiple solutions are possible, as long as they solve the issues that we describe. You are free to reimplement solutions in C, Go, Ruby or whatever. But why should you when we already have a perfectly fine solution?

Maybe you do not want to use Ubuntu as base image. Maybe you use CentOS. But that does not stop Baseimage-docker from being useful to you. For example, our passenger_rpm_automation project uses CentOS containers. We simply extracted Baseimage-docker's my_init and imported it there.

So even if you do not use, or do not want to use Baseimage-docker, take a good look at the issues we describe, and think about what you can do to solve them.

Happy Dockering.