2019-12-09

Zombies

That's also a typical unix topic that is talked about during interviews. When reading the linux programming interface I realized that things are sometimes a bit subtle.

What is a zombie?

A zombie is a process that is terminated, but which has not been acknowledged by its parent. The parent is supposed to use one of the wait() system call: wait(), waitpid(), waitid(), wait3(), wait4().

This mechanism is meant for the parent process to be notified when a child has finished its execution, but also to get some CPU statistics.

So the natural question is: are zombies piling up good or bad? The answer is that the amount of resource kept around is minimal, but one may want to prevent pid exhaustion, and also keep ps output clean.

Asynchronous wait

In a previous article, I wrote (then removed) there was a way to prevent zombies from being created altogether: ignoring the SIGCHLD signal in the parent.

Indeed when a child exits or is killed (or sometimes when it's stopped / continued), a SIGCHLD is sent to its parent.

It's then possible, but a bit tricky, to catch the signal, and call one of the wait() call inside the signal handler. As with every asynchronous calls, one must pay attention to race conditions, and global state (think errno). Also traditional signals are not enqueued on Unix systems, so one signal may refer to several dead children.
If one does not want to track its children, there is a trick.

Automatic reap

That signal is by default ignored. However, when we explicitly ignores the signal in the parent process, the behavior changes slightly: in that case no zombie is generated. That's not extremely portable (to old unices), and usually already existing zombies are not ripped off. 
Also one should be aware that on modern systems the CPU usage for the children returned by getrusage() (and times()) will not be accounted for.

So in short, it looks like using wait() should still be the most usual way for a process to wait for its children.

There are more details about this in the book. For instance, it's possible to use sigaction to catch the signal and call a handler, but still have the child process being auto-reaped (using SA_NOCLDWAIT flag).


No comments:

Post a Comment