Probably Done Before: 2015

This post is meant as a Docker 102-level post. If you are unaware of what Docker is, or don't know how it compares to virtual machines or to configuration management tools, then this post might be a bit too advanced at this time.

This post hopes to aid those struggling to internalize the docker command-line, specifically with knowing the exact difference between a container and an image. More specifically, this post shall differentiate a simple container from a running container.

I do this by taking a look at some of the underlying details, namely the layers of the union file system. This was a process I undertook for myself in the past few weeks, as I am relatively new to the docker technology and have found the docker command-lines difficult to internalize.

Tangent: In my opinion, understanding how a technology works under the hood is the best way to achieve learning speed and to build confidence that you are using the tool in the correct way. Often a technology is released with a certain breathless and hype that make it difficult to really understand appropriate usage patterns. More specifically, technology releases often develop an abstraction model that can invent new terminologies and metaphors that might be useful at first, but make it harder to develop mastery in latter stages.

A good example of this is Git. I could not gain traction with Git until I understood its underlying model, including trees, blobs, commits,tags, tree-ish, etc. I had written about this before in a previous post, and still remain convinced that people who don't understand the internals of Git cannot have true mastery of the tool.

Image Definition

The first visual I present is that of an image, shown below with two different visuals. It is defined as the "union view" of a stack of read-only layers.

On the left we see a stack of read-layers. These layers are internal implementation details only, and are accessible outside of running containers in the host's file system. Importantly, they are read-only (or immutable) but capture the changes (deltas) made to the layers below. Each layer may have one parent, which itself may have a parent, etc. The top-level layer may be read by a union-ing file system (AUFS on my docker implementation) to present a single cohesive view of all the changes as one read-only file system. We see this "union view" on the right.

If you want to see these layers in their glory, you might find them in different locations on your host's files system. These layers will not be viewable from within a running container directly. In my docker's host system I can see them at /var/lib/docker in a subdirectory called aufs.

# sudo tree -L 1 /var/lib/docker/

/var/lib/docker/

├── aufs

├── containers

├── graph

├── init

├── linkgraph.db

├── repositories-aufs

├── tmp

├── trust

└── volumes

7 directories, 2 files

Container Definition

A container is defined as a "union view" of a stack of layers the top of which is a read-write layer.

I show this visual above, and you will note it is nearly the same thing as an image, except that the top layer is read-write. At this point, some of you might notice that this definition says nothing about whether this container is running, and this is on purpose. It was this discovery in particular that cleared up a lot of confusion I had up to this point.

Takeaway: A container is defined only as a read-write layer atop an image (of read-only layers itself). It does not have to be running.

So if we want to discuss containers running, we need to define a running container.

Running Container Definition

A running container is defined as a read-write "union view" and the the isolated process-space and processes within. The below visual shows the read-write container surrounded by this process-space.

It is this act of isolation atop the file system effected by kernel-level technologies like cgroups, namespaces, etc that have made docker such a promising technology. The processes within this process-space may change, delete or create files within the "union view" file that will be captured in the read-write layer. I show this in the visual below:

To see this at work run the following command: docker run ubuntu touch happiness.txt. You will then be able to see the new file in the read-write layer of the host system, even though there is no longer a running container (note, run this in your host system, not a container):

# find / -name happiness.txt

/var/lib/docker/aufs/diff/860a7b...889/happiness.txt

Image Layer Definition

Finally, to tie up some loose ends, we should define an image layer. The below image shows an image layer and makes us realize that a layer is not just the changes to the file system.

The metadata is additional information about the layer that allows docker to capture runtime and build-time information, but also hierarchical information on a layer's parent. Both read and read-write layers contain this metadata.

Additionally, as we have mentioned before, each layer contains a pointer to a parent layer using the Id (here, the parent layers are below). If a layer does not point to a parent layer, then it is at the bottom of the stack.

Metadata Location:
At this time (and I'm fully aware that the docker developers could change the implementation), the metadata for an image (read-only) layer can be found in a file called "json" within /var/lib/docker/graph at the id of the particular layer:

/var/lib/docker/graph/e809f156dc985.../json

where "e809f156dc985..." is the elided id of the layer.

The metadata for a container seems to be broken into many files, but more or less is found in /var/lib/docker/containers/<id> where <id>is the id of the read-write layer. The files in this directory contain more of the run-time metadata needed to expose a container to the outside world: networking, naming, logs, etc.

Visualizing Docker Containers and Images

Image Definition

Container Definition

Running Container Definition

Image Layer Definition