“You should really be looking at Vagrant” — he said while I struggled with the keyboard, as if pressing the keys harder was going to magically make it work.
I had recently completed a utility — you know, one of those things that slurp in data from some black hole in a distant corner of the universe, marries it with the structure of a different time-space continuum, and magically spawns a pretty visual representation for mere mortals to easily consume — and I was having the hardest time getting all the pieces together for a demo on a laptop running that OS (yes, that one).
My buddy was trying to point me to a system that helps provision virtual machines from a simple configuration file. A solution which was becoming common in application distribution and deployment, as well as a method to standardize development environments.
I looked into Vagrant and found it very interesting, but at the time I was testing applications that used the virtual machining model and — while very aware of the benefits — I perceived it as fairly cumbersome, mostly because of the size of a VM and some setup problems (not specific to Vagrant itself) which varied depending on the host OS.
So I filed my new knowledge away as something to fiddle with in the future and kept at the problem I was trying to solve. Indeed, we managed to get my seemingly complex solution working and moved onto other problems in our galaxy.
The Whale in the Machine
Over time the buzz started accumulating and I found this other thing called Docker. If you don’t know what that is, then you need to stop reading this article and head over to www.docker.com, I promise it’s a better use of your time. Feel free to come back after your paradigm has shifted.
The idea of container-izing (yes, it’s now a word) system processes is not new. Jail environments have existed for years, but Docker has found a good way of simplifying things such that getting everything setup and distributed is extremely simple. To top it all off, layering the concept on top of git to provide version control for your image has added a completely new dimension to the idea of redundancy and backup. Especially when marrying it to a centralized distribution system in the form of Docker Registry and Docker Hub.
After a few months of messing around, I picked up the technology and started using it in my next set of projects to provide partitioning, security, load balancing, resiliency and backups. We even tied the system to our own git repositories and started monitoring for branch commits that automatically triggered rebuilds of our images and replaced running containers (at the time, continuous delivery / integration was still in its infancy)
It was while writing this build code that we started to look into directly connecting to Docker APIs instead of “subprocess-ing” out our work as shell commands.
As it turns out, Docker makes a RESTful API available through its socket, providing most of the functionality you’re looking for to manipulate images and manage containers. There are several existing web interfaces that take advantage of this, giving you a decent GUI for managing container deployments.
The folks at Docker have already written a Python module named docker-py for using the interface. Below are my observations as I wrapped it in order to simplify interaction and track the things I care about a little better.
To go over it quickly, docker-py uses the requests library to send commands directly to the docker host socket. All that’s needed is to establish a connection and call the methods that tie into whatever command you’re trying to execute.
However, if the application is intended as a generic solution that should function across different environments, deciding at the on how to connect to the socket at the application level can get tricky. This is mainly because exposing the underlying unix socket is a fairly big security concern — there are plenty of articles out there explaining it in detail — so I chose to go with a utility function that passes the connection arguments to the client object based on your environment variables.
from docker import Client from docker.utils import kwargs_from_env client = Client(**kwargs_from_env())
This mechanism leaves connection configuration to the user of your application, making it operate with whatever setup is in place for the regular command line.
The docker-py client object exposes a method equivalent to almost every function available at through the shell. Anything from downloading images, to building them, to starting containers and managing their state.
So what did the Python say to the Whale?
While a state-less API has its advantages, we really needed something that could represent images and containers as objects. In principle, this seemed as simple as tracking a string with image:tag- like references, but there were several cases where that label would encompass several running containers, and some containers weren’t named, so identifiers needed to be used directly. To help with this, I created DockerImage and DockerContainer classes that wrap the commonly used methods.
DockerImage(repository, tag) is all you need to download or build an image, as well as check whether it exists in your docker host. It also provides a method to add or reassign tags.
The base image APIs provided by docker-py were fairly simple to interact with. They use an iterator interface when performing a build or download, something that makes perfect sense as these actions are essentially streams of data or build steps which you may want for debug purposes. However, I decided to dump them to logging.debug until they complete, therefore masking the wall of text that you usually don’t care about if using a proven image.
DockerContainer(image, name) creates an object that will allow you to inspect the container it represents, and also provides a way to run it (which creates the container if it doesn’t exist) and functions that start, stop, remove and restart a running container.
Given that containers may or may not be named, interfacing with them — while not complicated — can be hard to follow in code because most of the docker-py methods require the container id, not the name. This was abstracted out, along with container status checks, in places where it made sense.
Once I reached the point of adding an attach method — which subprocesses out the command so that getting into your container is like starting a new shell — I realized it could do something fairly neat: replace python virtual environments with containers. And so capsule was born.
The capsule module is essentially a command line interface to docker with which to start containers for the purpose of fiddling with code. The containers are named like you would a virtual environment, and exist until you explicitly remove them. The default image is based on a custom Python 2.7 container I pushed to DockerHub, but you can provide your own image with the –baseimage parameter. It’s also possible to take the python history generated from your tinkering and export it to a Jupyter Notebook for later use.
Usage: capsule make <name> [options] capsule workon <name> [options] capsule remove <name> [options] capsule list [options] capsule pyhistory <name> Options: —-baseimage --debug Print debug messages. -h --help Show this screen. --version Show version.
While using containers instead of virtual environments does present its limitations, my thought is that sometimes we actually want that separation, especially when you might be testing code that performs potentially OS damaging operations or if you just need to try things out in a different environment.
Regardless, this process provided a fun little learning experiment that helped discover and poke at docker-py’s capabilities.