Docker healthchecks – things to consider

While working on a recent docker project I encountered an issue when it comes to health checks. You may define a health check for a container from within your Dockerfile, such as

HEALTHCHECK --interval=5s --timeout=5s --start-period=20s
         --retries=5 CMD wget --output-document=- 
         --quiet --tries=1 http://127.0.0.1

The syntax is quite simple to understand, you basically define how and when the health check should be executed and which command(s) to use. Breaking things down from the example

  • –interval=5s – run the check every 5 seconds
  • –timeout=5s – maximum time to wait for the health check to complete
  • –start-period=20s – allow some time for the process to be monitored to come to a functioning state, avoids false alarms during startup, in this case we allow 20 seconds
  • –retries=5 – only trigger an alarm after 5 unsuccessful tries (non zero exit code) – this helps avoiding false alarms in case the process is under heavy load
  • After the CMD follows the command to execute, this is done from inside the container, in the given example we check for the webserver to deliver something indicating the server is working properly

However every one should consider on how flexible the health check should be to match as many use cases as possible. This is especially true if you are dealing with web-based endpoints as pointed out in the example.

IPs may change, you know?

So what is wrong about the example (which was deliberately taken from: https://github.com/PowerDNS-Admin/PowerDNS-Admin/blob/d255cb3d16380ff3e956829fb67d8f00bb0ff2bb/docker/Dockerfile)?

First thing I stumbled into was the IP defined: „127.0.0.1“ – everybody not being totally new to IP protocol (or internet) will immediately identify this one: It is the so called „localhost“-interface, an IP that points to the current machine or container, officially named „local loopback address“. Don’t forget: there is a complete network of 127.0.0.0/8 possible local loopback addresses, but in the wild you will rarely encounter anything different than 127.0.0.1. Of course there is also an equivalent thing for the more modern IPv6 protocol: „::1“ – a the „single“ local loopback address, not to be confused with the so called link local address network (fe80::/10) which is used on a per interface basis.

The local loopback interface is essential to almost any IP based network stack, it really facilitates a lot of checking and finding out ones own external IP address, so this seems a very logical idea to use in a container where the author has not much control about the assigned IP(v4/v6) address. To make things even easier to remember there is usually a „localhost“ name defined pointing to 127.0.0.1 (IPv4) or ::1 (IPv6).

/ $ ping -6 localhost
PING localhost (::1): 56 data bytes
64 bytes from ::1: seq=0 ttl=64 time=0.049 ms
/ $ ping -4 localhost
PING localhost (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: seq=0 ttl=42 time=0.070 m

I think you already know by know what is not ideal about using „127.0.0.1“ hard coded in your health check. Depending on the software used an how it is configured, the process may or may not bind to both all interfaces or protocols. As said you do not have much control about the actual configuration of the way the image is used as an author.

First thing to consider is to possibly make your software listen to all available IPs, most of the time called „dual stack“. For the actual example you may provide the underlying gunicorn server with a comma separated list of addresses to bind to such as:

gunicorn -t 30 --workers 5 --bind [::]:80,0.0.0:80 --log-level WARN

Which makes the application a dual stack application and you now may use the address of 127.0.0.1 for the health check again. Remember that with the bind definition you also need to specify the port (in this case 80 for unencrypted HTTP for both IPv4 and IPv6).

However I do not recommend to do so, as this is again a configuration that the user of your docker image might not use (or even worse/better: there may be a docker version that ditches IPv4-support …). So the best solution to define the health check currently is to rely on the localhost name resolution. This has a certain advantage: It automatically has a graceful fallback: Every operating system should prefer using IPv6 if available, but will auto-fallback to IPv4. You if you do not use IPv6 with the image, it will still work with IPv4 without any further issues.

However this is not true the other way round, but given the current state of IPv6 support in docker, if a user enables IPv6 for a container, chances are very high that the application inside should aim for being IPv6 compatible as well. See a bit further down in this article to see how checking both cases based on environment variables may be achieved.

Protocols and Ports – they may change as well

So we’ve taken care of the resolution of using the right IP version. But this is not the only thing that may change: As noted the example of the Dockerfile as well as in the bind-parameters for the gunicorn server process, up to now we do everything unencrypted (HTTP without an „S“). As you should know, unencrypted network traffic is no longer state of the art, nowadays most of the traffic should (and hopefully is) encrypted using TLS, for most of the users indicated by using HTTPS („S“ for secure).

So this is one thing you need to keep in mind with the health-check: Most of the time your application will not serve both versions (except you are building and running a configurable full-features webserver such as NGINX or Apache).

Actually it does not make much sense for an application server with specialized software to provide both versions, if you want security you should not forget to lock the unencrypted port (door). Most of the time for fully featured webservers today the only thing the available unencrypted endpoint will provoke is a permanent redirect to the secured version. For apache you frequently find a VHost Definition like:

<VirtualHost 127.0.0.1:80 [::1]:80>
    ServerName www.example.com
    DocumentRoot /var/www/
    Redirect permanent / https://www.example.com/
</VirtualHost>

Depending on how the container is configured and used, unencrypted traffic may seem like an option: Just place some sort of load balancer / reverse proxy in front of your service that takes care of handling TLS. While this works, it is not an optimal solution, but a thing that we got used to by the limited (and partly already exhausted) supply of IPv4 addresses. What happens behind the scenes is know as (port-)forwarding to an internal network. If not configured otherwise, docker will assign create networks from so called private IPv4-Ranges (default is 172.17.0.0/16 and using /24 segments from this range). By convention those ranges are not routed on the internet, they are not globally unique in comparison to public IPv4 addresses. I wrote a whole article about the common misunderstandings with IPv6 and architectures you need to rethink when using IPv6.

So what will happen is, that your traffic is secured up to the point it enters the loadbalancer. Communication between the loadbalancer and the application itself is unencrypted. This may seem to not be much of an issue, because normally the traffic towards docker will not leave the machine, but it is possible to interfere with the unencrypted traffic if you attach another container to your network that runs something like tcpdump or other nasty stuff. In a small-scale setup you may be in control, but once more teams are using the machine encryption becomes a necessity. There might even be a security flaw in the network driver implementation, leaking traffic to where it should not go. If it is encrypted at least the fallout is limited.

If you assign global IPv6 addresses to your containers and setup routing right, there is another possibility: You could access the application server directly without using the reverse proxy (which might still be in place an working on IPv4 and IPv6). Basically this ist what the internet used to be back in the beginning: if you want to use a service: contact it directly, no need for a proxy of any kind. With this scenario encryption is no longer optional, it’s a must for any system exposed to the internet today.

This is why setting the health check to a hard coded http address might not work or report false positives.

A possible solution

As there are multiple factors to consider, I ended up in writing a small health check shell script that takes care of the checking. You may even extend this one further to suit your needs like testing specific endpoints or even expected answers from a certain endpoint.

#!/bin/sh
if [ $TLS_ENBALE = true]; then
   exec curl https://localhost/
else
   exec curl http://localhost/
fi

Is it a good practice?

In general like the idea of including at least a basic health check in a container, it might not be perfect and monitor your application in all detail (there is still lots of things that can be broken inside the software but the checked endpoint still says everything is ok…). But at least you are able to capture the most basic things. If you need a more sophisticated monitoring you should look at external tools that help you to do so, such running an ELK-Stack (Elastic, Logstash, Kibana) which will also give you a nice graphical interface, statistics and much more. Still for a quick check on the console, the built-in check gives at least a hint on broken containers.

On the net I also found a warning about this approach: https://blog.sixeyed.com/docker-healthchecks-why-not-to-use-curl-or-iwr/ So I looked into the details of this article as well (not because something works, it is a good practice). However I must strongly disagree with the advice given in the article.

The article is right, that you add some additional size to your image – but nothing comes for free, and I think the roughly 2,5MB in the resulting image is neglectable. Also when it comes to security issues I expect a long proven command line tool as curl is, to be much more reliable than a self-written node.js script. It does not add that much to the size of your container if you already run a node application within, but just installing node for the health check seems to much of a burden in comparison. And if you look at the example code provided: it is not even configurable, you will have to adjust the script every time you change the ports of your application (which should not be hard coded but configurable).

The point given on making the container less portable if you rely on a certain tool is just nonsense, in fact it is the idea of containers to be portable, but this is because everything you need is backed into the image. So you already ensure that a certain tool like curl is installed (you should not rely on the fact that some images come with curl pre-installed). Also with the definition of your docker file, you specify what the operating system is going to be – so if you choose a Windows one, your subsequent tooling should be based on Windows. With the advent of the Windows Subsystem for Linux (WSL) it is even more common to actually build and run linux applications on a windows based docker host.

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert

Diese Website verwendet Akismet, um Spam zu reduzieren. Erfahre mehr darüber, wie deine Kommentardaten verarbeitet werden.