Dockerizing RightScale

Lessons learned while adopting Docker at RightScale

Dockerizing RightScale

Continuous deployment is all the rage and for good reasons. The idea that a piece of functionality should be deployed as soon as it is “ready” makes a lot of sense. I remember a time when QA would find bugs during a release in code that had been written weeks prior. Troubleshooting these issues required a complete context switch very similar to working a production escalation and with the same overhead. There’s obviously a lot more to continuous deployment, but even narrowing it down to just engineering, the benefits are significant.

It’s no secret that the key to making continuous deployment work is automation. This doesn’t solely mean automated deployment, but automation across the full development to production lifecycle including continuous integration. So when the RightScale team started working on some of our new initiatives, a lot of emphasis was put on continuous integration (CI) and the continuous delivery (CD) story as a whole.

Introducing Self-Service

One such initiative is Self-Service: at a high level Self-Service provides a way to describe cloud application deployments using a rich language (think CloudFormation on steroids), associate policies and governance rules (scheduling, ACL, quotas etc.).

The product is exposed through REST APIs with a powerful UI built on top. A key challenge we faced when building CI for Self-Service was integration testing. The language we chose to describe cloud applications is a full-blown imperative language that allows orchestrating arbitrary external services, so testing all the various permutations is no trivial task.

Docker to the Rescue

Around the same time that we were looking into the CI challenge for Self-Service, Docker was gaining a lot of traction and it made sense to try it out and assess whether it could help us. A few months later the answer is a resounding yes! but as is often the case the path to success wasn’t a straight line.

There were two key challenges we had to solve: one was how to emulate the external services being orchestrated, and the other was how to deploy the system, which is composed of a dozen of completely different apps all interconnected in various ways during a Jenkins run. We developed our own home grown solution for the former (similar to VCR but with some added functionality we needed), and leveraged Docker for the later, which is what this blog post covers.

The end result is quite nice: as a developer working on Self-Service, you run a Ruby script and after a few minutes have the full system running on your laptop ready for development. The only dependencies are Docker and fig. The same script runs on a Jenkins slave to setup the full system as well as the external service mocks so that all integration tests run with a clean setup each time.

Lessons Learned

We learned a lot about Docker in the process. And, as with any new technology, there weren’t a lot of best practice that we could just Google. Here are some of the most interesting challenges we ran across and the corresponding lessons learned.

1. Building Docker images containing private code

Docker images are built using Dockerfiles or by “committing” (snapshotting) an existing container. A Dockerfile consists of a sequence of commands that are executed to produce the final image. One of the key ingredients of Docker is how it leverages overlay file systems (aufs or devicemapper by default): Dockerfile commands that change the file system result in a “layer” in the final image. A running container adds a top level read/write layer. Layers make it possible to “commit” a container, meaning freeze the top read/write layer into a read only layer and create an image out of the result that contains all the base image layers plus the newly created one. Layers get cached on the host so that downloading a newly committed image only requires downloading the top layer if the base image was already downloaded.

One “interesting” consequence of using a file overlay filesystem that way is that any file ever created when building an image is stored forever in it. Consider the following Dockerfile:

1
2
3
4
5
6
7
8
9
10
11
# base image (ubuntu 14.04 + ruby 2.1.2)
FROM rightscale/ruby-212
# copy ssh key into container so we can clone private repo
ADD ~/.ssh/id_rsa /root/.ssh/id_rsa
# prime known_hosts with the github.com public key
RUN ssh-keyscan -t rsa,dsa github.com >> /root/.ssh/known_hosts
# clone repo
RUN git clone git@github.com:rightscale/docker_demo
# remove ssh key from image
RUN rm /root/.ssh/id_rsa
# … other steps needed to build/setup application

Even though the SSH key gets deleted by the last command the image contains the layer created by the second command and therefore the SSH key. If that image is pushed to Docker hub then anyone pulling it will have access to the SSH key (check /var/lib/docker/aufs/diffs on the host if you’re curious).

The net result is that you can’t build images that contain private code (or whose build process requires any type of credential) “in one shot”. We ended up writing scripts that set up the Docker image build context and remove any credentials from it prior to invoking docker build.

Another trick we have been using is to mount the SSH agent socket from the host into a container. The container deploys the application and is then committed to create the image. The Docker run command used to launch this builder container looks like this:

docker run --env SSH_AUTH_SOCK:/mnt/ssh-agent/$SSH_SOCK_FILE --volume $SSH_SOCK_DIR:/mnt/ssh-agent \
rightscale/ruby-212 “sh -c $BUILD_SCRIPT

where $SSH_SOCK_FILE and $SSH_SOCK_DIR contain the host SSH agent socket name and path respectively (obtained through the host SSH_AUTH_SOCK environment variable). $BUILD_SCRIPT contains the script used to deploy the application, which at that point can leverage the SSH agent to clone private repos.

Long term there are various proposals in the GitHub issues (such as #7115) that allow for nested builds: the build context of the final image is provided by outer Dockerfile commands. This makes it possible to copy credentials into the outer container without having to bake them into the final image produced from the inner container. This also helps in making sure that the final image doesn’t contain any build artefacts.

2. Logging

The built-in Docker log support provided via docker logs is fairly limited: it only supports logging to stdout and stderr, has no log rotation support and more importantly does not support tailing to ship the logs. Github issue #7195 aims at addressing these limitations. Another challenge with logging is that developers expect logs to be generated on their host file systems for troubleshooting, but staging or production environments typically use some log aggregation solution such as syslog or logstash.

The development use case is fairly simple to support with Docker: simply mount the log directory via a volume into the container using a command like the following (this assumes that the “demo” app logs to /home/docker_demo/log):

docker run --volume /Users/raphael/src/docker_demo/log/:/home/docker_demo/log rightscale/docker_demo demo

The production use case is a little bit trickier. We use rsyslog at RightScale so needed to figure out how to set it up so that apps running in Docker containers could log to it. The idea is to run rsyslogd in its own container and mount a host directory that contains the rsyslog Unix domain socket located in /dev/log. Containers that run apps that need to send logs to rsyslog then mount the socket itself.

Jérôme Petazzoni from Docker explains the whole setup in details in his blog.

3. Configuration

Configuring apps is also done very differently whether the app is running on a developer laptop or in CI, staging or production. Our apps configure themselves using environment variables when run in the development environment. The script that launches them simply leverages the --env Docker run option. The more interesting case is staging and production where apps rely on configuration files written in well known locations.

We use RightScale to manage RightScale, and configuration files are generated via ServerTemplate boot scripts using values fed through RightScript inputs. We end up with configuration files written in the host that we bind-mount inside the container in the location that the app expect.

4. Mac OS X

Docker requires a Linux kernel, this means that developers using Macs need to use Docker through a virtual machine. Thankfully Docker provides “boot2docker” which leverages VirtualBox and makes the experience seamless and almost transparent. The biggest limitation used to be the inability to mount volumes from the host directly through the --volume option of Docker run but this has been addressed with boot2docker 1.3 where mounting volumes that exists under /Users “just works”.

The biggest difference remaining when running Docker though boot2docker has to do with networking: the boot2docker VM has its own secondary interface and IP. This means in particular that any container that run apps that listen on TCP ports must be accessed from the host using the VM IP rather than the traditional localhost. This is fairly innocuous when running simple apps but requires special attention when setting up multiple applications that need to send requests to each other.

Before explaining why that’s the case I need to make a little detour and talk a little bit about links: Docker makes it possible to create links between containers running on the same host. A container that is linked to another can access it via a secure tunnel established by Docker that does not need to expose any port to the host. This is the perfect way to connect all these distributed apps running on the same host in the CI and production environments. However links are not that great during development because devs need to be able to run any of the apps directly on the host to quickly test and iterate. Apps running in containers won’t be able to link to apps running on the host so we ended up using port mapping via the Docker run --expose option for all apps during development. This way all traffic goes through ports exposed to the host and it doesn’t matter to the client app whether the servicing app is running in a container.

You may now be able to guess where the complication comes from with boot2docker: Apps can’t simply rely on using localhost when sending requests. If a client app needs to connect to an app running in a container it needs to use the boot2docker VM IP but when that app runs on the host then the client needs to use localhost.

There are a couple of workarounds that make it possible to always use localhost (and by doing so play nice with devs that use a Linux based system and don’t need to run a VM): one is to use a proxy like socat to forward traffic from the host to the VM IP, this works and can be scripted fairly easily. The command looks like the following:

socat -d -d TCP-LISTEN:$PORT,fork TCP:`boot2docker ip 2>/dev/null`:$PORT,reuseaddr

where $PORT contains the port exported by the Docker container.

socat is a great tool that has a number of interesting applications, it’s worth bookmarking the socat man page.

Another more “permanent” solution is to configure the VM to forward the ports to localhost, this can be done while the VM is running using VBoxManage modifyvm as described in the boot2docker workaround document

5. Big images, slow downloads

We were expecting that using containers would speed up the time it takes to configure VMs in the cloud. Instead of running a series of bash script to install a bunch of packages, download assets, compile and configure apps we’d just have to download ready-to-go images, that ought to be faster right?

It turns out not to be the case. It’s not just that images can be big it’s also and mainly that Docker is not very good at downloading them. This is known issue that’s being worked on (GitHub issue #7291) but in the mean time extra care needs to be given to producing lean images that share a common base. This is especially critical in the case of CI where multiple apps get downloaded on the same host so making sure they all share the same base image drastically reduces the time it takes to configure the Jenkins slave.

Where we are and what’s next?

Even with the challenges listed above we found the benefits one gets from using Docker well worth it. Developers can set up a complete development environment in minutes and more importantly with the same exact code, version of language runtimes, dynamic libraries etc. as production. The images are built and tested by the CI system then deployed to production which removes an entire class of potential issues that come with having to reproduce the same setup in different environments. This also makes the multi-cloud story easier since there is less dependencies on the host environment (e.g., OS distribution).

We are still just getting started with Docker though, the CI setup is now well understood but we still need to finalize the full continuous delivery story. There are also many more apps that could benefit from it that we haven’t looked at yet.

My colleague and RightScale CTO Thorsten von Eicken and I presented a list of “10 things not to forget before deploying Docker in production” at a local Docker meetup that covered some of the challenges presented above and a few others. If you’ve already gone through your own “Dockerization”, we would love to hear how you solve these same issues. Please leave a comment below!