Professional software developer for over 25 years
Like many tools in the software developer's toolbox, docker is relatively easy to jump into and takes some time to master. Using it for a variety of projects over the years I've learned a few lessons along the way.
command within a dockerfile produces a new disk image layer. These layers are cached to optimize rebuilding. As explained in their documentation, "the layers are stacked and each one is a delta of the changes from the previous layer."
There are interesting consequences to this. Running many small commands in your build process can produce more changes and therefore larger layers. Altering any command since the last build will require docker to recreate that layer and all subsequent layers. It needs to execute all commands further down the dockerfile even if they haven't changed.
After some research and much trial and error I've learned a pattern that helps my builds run faster and smaller:
command to update the OS package manager and install OS dependencies. The last part should be a cleanup command to remove temporary and cached files.
command to configure and enable the appropriate servers, e.g. a web server.
any specialized configuration files and dependencies. During development I change these more often than standard OS dependencies.
command for application runtime environments and package dependencies, such as Python's
. These are also combined into one command string.
my application's code. That's what changes most often so save it for the end.
This solved a major performance bottleneck I found with many of my builds. It's common practice to commit application package dependency information with a codebase, such as Python's
file or Node's
. Therefore my first instinct was to run the runtime package manager after copying my codebase into the image. Even if the required packages didn't change, the update to my codebase would force docker to trigger a re-install of all packages.
commands for dependency manager files, such as
, into the image during step 3 and executing the package manager on the next step, but before the custom code is copied, means it now only runs when package requirements change.
Docker's disk images are (thankfully) cached on local disk to save time during subsequent builds. During development I find myself rebuilding quite often. I'm upgrading or changing dependencies. And I'm tweaking the dockerfile itself. In addition I'll experiment with 3rd party containers for just a day or two and forget about them. Unused images will therefore pile up quickly.
While my project is up and running in my development environment I'll run the following command about once a week.
> docker system prune
This will remove cached images not tied to any containers that are currently running. If you have a build server such as Jenkins you'll want to perform similar cleanup periodically or after each build.
docker-compose is a great addition in that it saves us from writing long and complicated docker commands. It's perfect for local development and for sharing with others to spin up containers for tests and demonstrations.
But my focus is in "enterprise" grade software. If you're hosting your own large SaaS application you're most likely running a container orchestration platform through a vendor. If you distribute your docker images to big companies / clients, they likely are doing the same. Don't expect docker-compose to satisfy these requirements. Plan to write and test deployment scripts and leverage the orchestration system's features to optimize scaling and uptime unique to each situation.
It's sometimes tempting to include connections from within docker containers out to their runtime environment. An obvious example is application logging. If you run a consolidated logging and analysis system it's usually trivial to hook this directly into your application. Beware this makes your container very dependent on its environment and much harder to test and share.
It's a best practice to keep your container as agnostic as possible to its environment. Use generic solutions when possible. Logs, for example, can be output to the console and redirected to a central logging system by the runtime environment outside the container. Even something as complex as authentication over the web can be handled by external systems and the necessary information passed into the container through HTTP headers. This makes local development and testing much simpler and independent of those requirements.
There's always more to learn with every technology we touch. Docker's own list of best practices is a very informative. From there I suggest focusing on best practices specific to your orchestration platform.
Whatever you do, don't stop reading and experimenting.
I learned many of these lessons while building SocialSentiment.io, an application which performs social media sentiment analysis of stocks. I plan on sharing more soon.