I’ve been meaning to write about docker and CFS (completely fair scheduler) for a long time , but I’ve been busy with work etc.
I’m gonna use Docker to limit process’s cpu usage, and we gonna explore what kind of metrics do we have to maybe troubleshoot an under-provisioned application , we’re going to be playing with fceux and mario
CFS has been the default scheduler of the linux kernel for a while , this isn’t an attempt to explain it in depth , but there’s a lot of interesting data about this , especially something about the main developer coming from the medical area or something like that.
Scheduling periods are expressed in cpu.cfs_period_us , and they are really a unit of time expressed in microseconds , you can call it the length of a cpu cycle , but the important thing is that
it can be made longer or shorter. Periods are nothing without cpu.cfs_quota_us , as in , whats the point of accounting for time if we’re not gonna be throttling or doing anything with that time.
In the following example a given process A , has configured a:
cpu.cfs_period_us = 100
cpu.cfs_quota_us = 200
That might seem confusing but the perdiod accounts per cpu (core, as seen by the OS) , meaning that that configuration will allow process A to “burst” and run for 2 periods (1 in each cpu) without being throttled.
So all the above is the theory , in practice you can achieve the same by using some flags , for example:
docker run -ti --cpu-period=50000 --cpu-quota=1000 alpine sh
And that will create a new cgroup leaf , this is how you can find it:
Having the container ID we will to the specific cgroup leaf:
There’s a lot of really interesting stuff here , but we’re gonna focus on cpu.stat, it looks like this:
Basically this means that we’ve ran for 51 periods we got throttled 45 of them and the throttling time was 2313838319 us.
This is very useful when profiling apps that you want to dockerize for example, adjusting limits metrics should get you to a level where your process isn’t throttled that much.
We’ll i wanted to pick a process that would let me show this graphically , instead of a python script finding prime numbers or anything like that , so Mario it is:
Mario period= 50000 quota = 1000 [[~7 FPS]] (almost 100% throttling)
docker run -ti — cpu-period=50000 — cpu-quota=1000 -e DISPLAY=$DISPLAY -v /home/jgarcia/Projects/games/games/:/games -v /tmp/.X11-unix:/tmp/.X11-unix -v /run/dbus/:/run/dbus/ — privileged 352b46a178cb
Mario period= 50000 quota = 2000 [[~15 FPS]] (still bad..)
docker run -ti --cpu-period=50000 --cpu-quota=2000 -e DISPLAY=$DISPLAY -v /home/jgarcia/Projects/games/games/:/games -v /tmp/.X11-unix:/tmp/.X11-unix -v /run/dbus/:/run/dbus/ --p
Mario period= 50000 quota = 5000 [[~40 FPS]] (very good)
docker run -ti --cpu-period=50000 --cpu-quota=5000 -e DISPLAY=$DISPLAY -v /home/jgarcia/Projects/games/games/:/games -v /tmp/.X11-unix:/tmp/.X11-unix -v /run/dbus/:/run/dbus/ --privileged 352b46a178cb
Mario running quota less [[~62 FPS]] (perfect!!)
I am not advocating to run limitless mario , but just wanted to show how easy is to actually get some data that it used to be very hard to collect , and make minor adjustments about it.
Sorry about the size of the gifs i did all i could.