Running RightLink on CoreOS

Running the RightLink Agent in a container on CoreOS

If you like to stir up the way you deploy applications by using containers then you may also like to mix up the way you run an operating system on the server itself. We see a healthy split in our user base between those that just trim down their favorite OS to run containers and those that jump onto one of the new container-only operating systems, specifically CoreOS and RancherOS. My previous blog post describes how the RightScale RightLink agent can be used on traditional Linux distros to inventory containers and perform container-level monitoring as well as application-level monitoring of apps in containers. But what if you’re using CoreOS or RancherOS?

In CoreOS there are actually two options: running RightLink at the host level and running it in a container. It is not obvious which is better and, in fact, it really depends on what your goals are.

RightLink in a Container

The RightScale RightLink (version 10) agent is a lightweight agent that connects servers to the RightScale platform and enables remote management of the servers as well as monitoring and other metric collection. Generally, monitoring uses the standard Linux collectd agent, but recent versions of RightLink also incorporate some basic monitoring directly and this now extends to containers.

Running RightLink at the host level is fairly straightforward thanks to the fact that it is a (mostly) statically linked executable. The next release of RightLink will include some changes and options that remove some roadblocks, such as locating config files in read-only filesystems. Running RightLink at the host level keeps a number of tasks simple. For example, adding a user account can be accomplished by running a RightScript with the typical useradd commands. Applications can be launched using either Docker commands to start containers or by installing systemd unit files that then in turn run the apps in containers and can restart them automatically on failure.

The downside of running RightLink at the host level is that RightScripts have a limited toolset available to them. For example, there is no Ruby or Python available, so scripts are pretty much restricted to bash and the small set of executables installed with the OS.

This leads to the question whether RightLink can be productively run in a container on CoreOS, which would allow the container to include any desired tools such as a scripting language or other familiar tools.

Running RightLink in a container is simple per se; the tricky part is to figure out how the container needs to be set up so all the desired functionality of RightLink continues to work. This comes down to the following aspects:

None of these aspects are difficult to deal with if you follow these steps.

Running RightLink in a container is most easily accomplished by creating a systemd unit that uses Docker commands to download an image with RightLink and then runs it and restarts it in the event of a failure. Making this happen occurs in three steps.

The following cloud-config snippet placed in the AWS instance’s userdata will download the RightLink container image and run an install script contained in the image:

coreos:
  units:
    - name: "rightlink-install.service"
      command: "start"
      content: |
        [Unit]
        Description=Transient RightLink install service
        After=docker.service
        [Service]
        ExecStart=/usr/bin/bash -c 'source <(/usr/bin/curl -Ls https://rightlink.rightscale.com/rll/docker/rightlink.coreos.boot.sh)'

2. Create and launch the systemd unit

The install script, which runs in a temporary container, copies the systemd unit file from within the container to the host’s /etc/systems/system directory and tells systemd to start the real RightLink container as a service:

#! /bin/bash
# determine where RightLink is going to listen (the host's interface on the Docker bridge)
export hostip=$(ip route show 0.0.0.0/0 | grep -Eo 'via \S+' | awk '{ print $2 }')

# fix-up the system unit and place in host's systemd directory
sed -e "s/172.17.42.1/$hostip/" </root/rightlink.service >/etc/systemd/system/rightlink.service

# tell systemd to start the real rightlink container as a service
systemctl enable /etc/systemd/system/rightlink.service
systemctl start rightlink.service

At this point systemd takes over and starts the unit, the key lines of which are the following:

ExecStartPre=/usr/bin/docker pull rightscale/rightlink:docker
ExecPre=mkdir -p /etc/collectd # monitoring config files

# Explanation for docker run options below:
# - /var/run/dbus and /run/systemd are required socket directories to talk to systemd
# - /etc/systemd/system is the unit file directory and required to install new units. 
# - /var/run/rightlink has state files that need to be preserved across container restarts
# - /proc and /sys/block are used by the rightlink monitoring
# - /var/run/docker.sock is used by the rightlink monitoring to get stats about containers
# - 172.17.42.1 is the default address of the host's interface on the docker bridge, this gets
#   replaced by the actual (if different) in the rightlink.coreos.install.sh script
ExecStart=/usr/bin/docker run --env-file /var/lib/rightscale-identity --name rightlink \
  -v /var/run/rightlink:/var/run/rightlink \
  -v /etc/collectd:/etc/collectd \
  -v /var/run/dbus:/var/run/dbus \
  -v /run/systemd:/run/systemd \
  -v /etc/systemd/system:/etc/systemd/system \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v /proc:/host/proc:ro \
  -v /sys/block:/host/sys/block:ro \
  -p 172.17.42.1:88:88 \
  rightscale/rightlink:docker /usr/local/bin/rightlink -listen 0.0.0.0:88 -rootfs /host

ExecStop=/usr/bin/docker stop rightlink
Restart=on-failure
RestartSec=13s

This unit file sets up a whole slew on bind-mounts (the -v options to docker) in order to allow RightLink to persist state, create systemd units, launch containers and more.

Persisting state across RightLink restarts and across OS reboots allows options configured into RightLink not to be inadvertently lost and prevents boot scripts from being erroneously re-run in the event that RightLink crashes. RightLink stores a small amount of state in /var/run/rightlink and a volume bind-mount can be used to persist it.

To enable RightLink to create systemd units and launch Docker containers it needs to be able to write units to /etc/systemd/system and communicate with the systemd daemon via its socket and dbus, and with the Docker daemon via its socket. All this is enabled using bind-mounts of /etc/systemd/system, /var/run/dbus, /run/systemd and /var/run/docker.sock.

The bind-mount of the Docker socket also enables the monitoring of containers. For host-level monitoring RightLink needs read-only access to the /proc and /sys/block filesystems, which is enabled using yet another two bind-mounts.

Finally, some attention needs to be placed on the base image used for the RightLink container. RightLink itself only requires a shared library for DNS name resolution and TLS certificate authority certs, making almost any image usable. However, in order to run RightScripts, most users need a minimal set of tools, possibly a scripting language such as Python or Ruby. So while a busybox-based base image augmented with the TLS CA certs is certainly possible, most users are better served with a minimal Debian, Ubuntu, or CentOS base image. These base images also leave the option open to use the built-in package manager to install any additional tools that may be needed.

All this enables RightLink to run in a container under CoreOS and to manage system services and application containers, plus perform simple host level and container monitoring.

Collectd In a CoreOS Container

Running collectd on CoreOS turns out to be slightly more challenging than running RightLink itself, especially if one wants to use it for host-level monitoring as well as for application-level monitoring.

First of all, why not run collectd at the host level instead of trying to fit it into a container? The reason is that collectd’s internal architecture makes heavy use of shared libraries, many of which are not installed in CoreOS. The shared libraries are required for many highly desirable plugins, so making do without is not an attractive option. The upshot is that collectd pretty much has to run in a container that can provide all the required dependencies.

Unfortunately it is not possible to configure a standard binary collectd package to run in a container because certain paths, in particular /proc, are hard-coded in the source and while bind-mounting /proc is required under any scenario it is not practical to bind-mount the host’s /proc onto the container’s /proc. Instead some other directory needs to be used and a convention seems to be emerging to use a /host prefix for such purposes, i.e., to bind-mount /proc onto /host/proc as shown above in the systemd unit for RightLink itself.

To run in a container collectd must thus be built from slightly modified sources in order to refer to /host/proc instead of /host. The Dockerfile that does this is available at (https://github.com/rightscale/collectd-container/blob/master/Dockerfile). It starts from a standard Debian image, installs the collectd dependencies (such as libraries to connect to MySQL, Postgres, etc.), downloads the collectd sources, makes the /host/proc modification and compiles collectd.

The next issue that needs solving is the configuration of collectd. The approach taken here is to add a standard configuration into /etc/collectd that is structured such that additional files can be added to enable new plugins. While this sounds simple, there is an additional wrinkle, which is that collectd must be restarted after any config changes: There is no HUP signal or similar to make it re-read the config. This means that either collectd runs under some monitoring process inside the container and can be restarted that way or it runs as pid 1 (i.e., top process) in the container and then the container must be restarted. Since the container will already run as a systemd unit it seemed simpler to avoid another level of monitoring inside the container and run just collectd as pid 1. But this now means that the configuration needs to be persisted across container restarts, which requires a bind-mount, for example, from the host’s /etc/collect into the container. Because it’s also desirable to preserve the configuration across host reboots, this bind-mount is most likely required regardless of the approach.

All this means that in order to make a configuration change, the files in the host’s /etc/collectd directory need to be updated and then systemd needs to be signaled to restart the collectd service, which causes a container restart. A RightScript that can be used to create the systemd unit file and start collectd can be found at (https://github.com/rightscale/collectd-container/blob/master/rightscript.sh).

Monitoring Applications In Containers with Collectd

The final hurdle toward a really powerful monitoring solution is to perform application-level monitoring of apps running in their own containers. As an example, the MySQL plugin built into collectd expects to connect to MySQL’s port 3306 in order to issue a series of show status type queries and then parse the results. The trick with containers is to figure out how to establish that connection. The standard method when launching multiple containers is to use the container linking feature of Docker (i.e., the -l flag to docker run) but that doesn’t work here because collectd would have to be relaunched with a different -l flag every time a MySQL container (or any other monitored container) is launched or relaunched.

A more static solution is to map the MySQL port to a persistent port, which insulates its clients (collectd among them) from changes due to container restarts. The default practice would be to launch MySQL to listen on localhost port 3306 but that doesn’t work with Docker because each container has its own localhost loopback interface and cannot get to the host’s loopback interface. In other words, there are many localhosts! Having MySQL listen on the host’s public interface may be acceptable in some cases but not all, so a different solution is needed.

The emerging localhost replacement is to use the host’s interface on the Docker bridge. Recall that Docker by default creates a local bridge network on which each container receives an interface with its own IP address. That’s what all the 172.17.x.x IP addresses you see for your containers refer to. The host itself also has an interface on that network and it is in fact the default gateway for all the containers. This means that all containers can reach the host’s interface and it turns out that the docker run -p option can be used to map a container’s port to that interface. Thus if the MySQL container’s port 3306 is mapped to the host’s 172.17.x.x interface’s port 3306 then all other containers, including collectd, can easily reach it and MySQL container restarts don’t require the clients to discover any new port.

This configuration is shown in the diagram below, with port 88 on the host’s interface forwarded to RightLink and port 3306 forwarded to MySQL.

Three containers running

The use of the host’s interface as a replacement for the inaccessible localhost solves a number of issues and enables fairly simple linking between containers through restarts. It is certainly not the only option, especially if container cluster management software is used. But for this post, it provides a simple to understand solution that you can customize or morph into a more dynamic solution if you prefer.

The final piece of the puzzle then is a RightScript that pulls down a MySQL container image, launches it, and configures collectd with the appropriate MySQL plugin to show all the MySQL metrics. A simple form of this script is:

#! /bin/bash -e

# get the host IP on the docker bridge (something like 172.17.0.4)
# this gets passed to the collectd container so it can talk to mysql’s port 3306
rll_ip=$(ip route show 0.0.0.0/0 | grep -Eo 'via \S+' | awk '{ print $2 }')

# launch mysql
docker run --name mysql -d -e MYSQL_ROOT_PASSWORD=$MYSQL_PASS -p $rll_ip:3306:3306 mysql

# add mysql plugin to collectd config
plugins=/etc/collectd/plugins
cat <<-EOF >$plugins/mysql.conf
LoadPlugin mysql
<Plugin mysql>
  <Database mysql>
    Alias "$RS_INSTANCE_UUID"
    Host "$rll_ip"
    User "root"
    Password "$MYSQL_PASS"
  </Database>
</Plugin>
EOF

# monitor the mysql process itself also
sed -i 's/<Plugin processes>/<Plugin processes> \n  process "mysqld"/' $plugins/processes.conf

# restart collectd container to pick up new config
/bin/systemctl restart collectd

This RightScript really should create a systemd unit to run and restart MySQL, but that, in itself, is the subject for another blog post.

Putting It All Together

All the RightLink, collectd and MySQL pieces can be seen working together in the RL10 MySQL CoreOS Container ServerTemplate. A slightly simpler RL10.2.docker1 CoreOS Container ServerTemplate without the MySQL pieces is also available. Using the first ServerTemplate, a CoreOS image can be launched in AWS, which at boot time runs RightLink in a container, then starts executing the RightScripts that run collectd in another container, run MySQL, and configure collectd to monitor MySQL. In the end, the RightScale Cloud Management dashboard shows the information about the containers running on the server (SHAs shortened here):

rs_docker:c-174a8f8=["/rightlink","c951953",0,0]      
rs_docker:c-77ae3bd=["/collectd","944ebc4",0,0]       
rs_docker:c-9f457e6=["/mysql","a07681a",0,0]

and the images they use:

rs_docker:i-944ebc4=["rightscale/collectd:latest"]     
rs_docker:i-a07681a=["mysql:latest"]   
rs_docker:i-c951953=["rightscale/rightlink:docker"]

and it shows the MySQL monitoring graphs:

Screen shot displaying mysql monitoring graphs

This ServerTemplate is a sample starting point from which you can further customize; you would want to customize the configuration of MySQL in particular. Also, tools such as fleet or Docker compose may come into play in order to deploy application containers on servers. The setup described here tries to be as simple and generic as possible so you know what all the pieces of the puzzle are and can then adapt them to your own deployment methodology.

In the end you are left with the question whether to run RightLink at the host level or in a container as shown in this blog post. This question is not unique to RightLink but presents itself for most systems daemons. Which option you pick depends on what you expect RightScripts to do. If you mostly perform host-level operations, such as managing user accounts, mounting disk volumes, or tweaking network configuration then it is easier to run RightLink at the host level. If you mostly perform application-level operations, such as installing, configuring, launching applications, or if your RightScripts make use of scripting languages (Python, Ruby, Perl, …) then running RightLink in a container is the way to go. In addition, you may have policies or comfort levels that make you want to run as much as possible in a container. Fortunately we give you the choice!

Coming up next will be a post on running RightLink in RancherOS: Stay tuned!