In the last couple of weeks I’ve given talks at DockerCon and Craft Conference where I’ve shown how a container works by building one from scratch.
When I run my home-grown container it has always slightly bothered me that there are more Linux processes created than I can account for. Someone in the audience spotted it too, and asked why there are more processes than we can see in ps.
$ go run main.go run /bin/bash
Running [/bin/bash] as PID 21569
Running [/bin/bash] as PID 1
root@container:/# ps
PID TTY TIME CMD
1 ? 00:00:00 exe
4 ? 00:00:00 bash
9 ? 00:00:00 ps
root@container:/#
My code does a fork/exec to run /proc/self/exe within a new set of namespaces, which is to say, it runs the same program again within these namespaces. This explains the process with ID 1 that’s running exe.
This time the program is given a different command (child instead of run) which causes it to fork/exec to run whatever arbitrary command it has been given — in this case /bin/bash. As a child process this inherits the same set of namespaces as its parent.
We can see bash in the process list, but why is it given process ID 4? This happens every single time. What happens to processes 2 and 3?
To find out, I’m going to run my container code under the system call tracing utility, strace. First I want to compile the code; in talks I usually invoke the code with go run main.go ... to save having a separate compile step, but I don’t want to strace the compilation.
$ go build -o container .
I can now invoke the code with ./container run <cmd> <args>. So:
$ strace ./container run echo hello
Which tells us there are quite a lot of syscalls being called! Let’s grep for clone which is the syscall that creates a new process.
$ strace ./container run echo hello 2>&1 | grep clone
clone(child_stack=0xc820033fc0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD) = 21928
clone(child_stack=0xc820035fc0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD) = 21929
clone(child_stack=0, flags=CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWPID|SIGCHLD) = 21930
The clone that corresponds to my new namespace is the last of these three — I can tell that because of the CLONE_NEW* flags that I passed in. So where did the other two come from?
To find out, I built a minimal Go program that does nothing.
package main
func main() {
return
}
Building that and running it under strace…
$ go build -o minimal .
$ strace ./minimal 2>&1 | grep clone
clone(child_stack=0xc820031fc0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD) = 22066
clone(child_stack=0xc820033fc0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD) = 22067
…it appears we will always see two calls to clone whenever we run a Go executable!
I think that the CLONE_THREAD flag explains why we don’t see these processes in the output from ps. It sets up the child processes in the same thread group as the parent. From the clone man page:
Thread groups were a feature added in Linux 2.4 to support the POSIX threads notion of a set of threads that share a single PID. Internally, this shared PID is the so-called thread group identifier (TGID) for the thread group. Since Linux 2.4, calls to getpid(2) return the TGID of the caller.
So I’ve proven to myself that we get these extra processes – or perhaps we should just call them threads – when we run a Go executable, but I haven’t explained why (I did say at the top that I had only sort-of figured it out!)
The full output from strace suggests that it’s something to do with signal handling — there are a lot of calls to rt_sigaction, rt_sigprocmask and sigaltstack before we get the process that really does the work. Perhaps it’s related to Go’s concurrency handling?
Edit: Phil Pearl pointed out that one of these threads will be Go’s garbage collector.
Know more? I’d love to hear about it!
Let me know if you found this helpful or interesting by hitting the recommend button 💚_. Thanks!_