A 10-steps checklist on how to dockerize any application.
There are already many tutorials on how to dockerize applications available on the internet, so why I am writing another one?
Most of the tutorials I see are focused on a specific technology (say Java or Python) which may not cover what you need. They also do not address all the relevant aspects that are necessary to establish a well defined contract between Dev and Ops teams (which is what containerization is all about).
I compiled the steps below based on my recent experiences and lessons learned. It is a checklist of details and things that are overlooked by the other guides you will see around.
Disclaimer: This is NOT a beginners guide. I recommend you learn the basics of how to setup and use docker at first, and come back here after you have created and launched a few containers.
Let’s get started.
There are many technology specific base images, such as:
If none of they works for you, you need to start from a Base OS and install everything by yourself.
Most of the tutorials out there, will start with Ubuntu (e.g. ubuntu:16.04), which is not necessarily wrong.
My advice is for you to consider to use Alpine images:
https://hub.docker.com/_/alpine/
They provide a much smaller base image (as small as 5 MB).
Note: “apt-get” commands will not work on those images. Alpine use its own package repository and tool. For details see:
https://wiki.alpinelinux.org/wiki/Alpine_Linux_package_management
https://pkgs.alpinelinux.org/packages
This is usually trivial. Some details you may be missing:
a-) You need to write apt-get update and apt-get install on the same line (same if you are using apk on Alpine). This is not only a common practice, you need to do it, otherwise the “apt-get update” temporary image (layer) can be cached and may not update the package information you need immediately after (see this discussion https://forums.docker.com/t/dockerfile-run-apt-get-install-all-packages-at-once-or-one-by-one/17191).
b-) Double check if you are installing ONLY what you really need (assuming you will run the container on production). I have seen people installing vim and other development tools inside their images.
If necessary, create a different Dockerfile for build/debugging/development time. This is not only about image size, think about security, maintainability and so on.
A few hints to improve your Dockerfiles:
a-) Understand the different between COPY and ADD:
https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#add-or-copy
b-) (Try to) Follow File System conventions on where to place your files:
E.g. for interpreted applications (PHP, Python), use /usr/src folder.
c-) Check the attributes of the files you are adding. If you need execution permission, there is no need to add a new layer on your image (RUN chmod +x …). Just fix the original attributes on your code repository.
There is no excuse for that, even if you are using Windows, see:
How to create file execute mode permissions in Git on Windows?_There's no need to do this in two commits, you can add the file and mark it executable in a single commit…_stackoverflow.com
First, take a break and read the following great article:
Understanding how uid and gid work in Docker containers_Understanding how usernames, group names, user ids (uid) and group ids (gid) map between the processes running inside a…_medium.com
After reading this you will understand that:
a-) You only need to run your container with a specific (fixed ID) user if your application need access to the user or group tables (/etc/passwd or /etc/group).
b-) Avoid running your container as root as much as possible.
Unfortunately, it is not hard to find popular applications requiring you to run them with specific ids (e.g. Elastic Search with uid:gid = 1000:1000).
Try not to be another one…
This is usually a very trivial process, please, just don’t create the need for your container to run as root because you want it to expose a privileged low port (80). Just expose a non privileged port (e.g. 8080) and map it during the container execution.
This differentiation comes from a long time ago:
https://www.w3.org/Daemon/User/Installation/PrivilegedPorts.html
The vanilla way: just run your executable file, right away.
A better way: create a “docker-entrypoint.sh” script where you can hook things like configuration using environment variables (more about this below):
This is a very common practice, a few examples:
elastic/elasticsearch-docker_elasticsearch-docker - Official Elasticsearch Docker image_github.com
docker-library/postgres_postgres - Docker Official Image packaging for Postgres_github.com
Every application requires some kind of parametrization. There are basically two paths you can follow:
1-) Use an application specific configuration file: them you will need to document the format, fields, location and so on (not good if you have a complex environment, with applications spanning different technologies).
2-) Use (operating system) Environment variables: Simple and efficient.
If you think this is not modern or recommended approach, remember this is part of The Twelve-Factors:
The Twelve-Factor App_A methodology for building modern, scalable, maintainable software-as-a-service apps._12factor.net
This does not mean that you need to throw away your configuration files and refactor the config mechanism of your application.
Just use a simple envsubst command to replace a configuration template (inside the docker-entrypoint.sh, because it needs to be performed on run time).
Example:
nginx_Official build of Nginx. GitHub repo: https://github.com/nginxinc/docker-nginx Library reference This content is…_docs.docker.com
This will encapsulate the application specific configuration file, layout an details inside the container.
The golden rule is: do not save any persistent data inside the container.
The container file system is supposed and intended to be temporary, ephemeral. So any user generated content, data files, process output should be saved either on a mounted volume or on a bind mounts (that is, on a folder on the Base OS linked inside the container).
I honestly do not have a lot of experience on mounted volumes, I have always preferred to save data on a bind mounts, using a previously created folder carefully defined using a configuration management tool (such as Salt Stack).
As carefully created, I mean the following:
I am aware that my previous “persistent data” is far from being a precise definition, and logs sometimes fall into the grey area. How should you handle them?
If you are creating a new app and want it to stick to docker conventions, no logs files should be written at all. The application should use stdout and stderr as an event stream. Just like the environment variables recommendation, it is also one of The Twelve-Factors. See:
The Twelve-Factor App_A methodology for building modern, scalable, maintainable software-as-a-service apps._12factor.net
Docker will automatically capture everything you are sending to stdout and make it available through “docker logs” command:
https://docs.docker.com/engine/reference/commandline/logs/
There are some practical cases where this is particularly difficult though. If you are running a simple nginx container, you will have at least two different types of log files:
With different structures, configurations and pre existing implementations, it may not be trivial to pipe them on the standard output.
In this case, just handle the log files as described on the previous section, And make sure you rotate them.
If your application is writing log files or appending any files that can grow indefinitely, you need to worry about file rotation.
This is critical for you to prevent the server running out of space, apply data retention policies (which is critical when it comes to GDPR and other data regulations).
If you are using bind mounts, you can count on some help from the Base OS and use the same tools you would use for a local rotation configuration, that is logrotate (manual here).
A simple yet complete example I found recently is this one:
Configure - Log Rotate_Manage log rotations using the Linux tool, logrotate, with the Aerospike in-memory NoSQL database._www.aerospike.com
Another good one:
How To Manage Logfiles with Logrotate on Ubuntu 16.04 | DigitalOcean_Logrotate is a system utility that manages the automatic rotation and compression of log files. If log files were not…_www.digitalocean.com
—
Let me know if you have any feedbacks. Check it out my other technical articles on https://hackernoon.com/@htssouza