2019-12-09

Zombies

That's also a typical unix topic that is talked about during interviews. When reading the linux programming interface I realized that things are sometimes a bit subtle.

What is a zombie?

A zombie is a process that is terminated, but which has not been acknowledged by its parent. The parent is supposed to use one of the wait() system call: wait(), waitpid(), waitid(), wait3(), wait4().

This mechanism is meant for the parent process to be notified when a child has finished its execution, but also to get some CPU statistics.

So the natural question is: are zombies piling up good or bad? The answer is that the amount of resource kept around is minimal, but one may want to prevent pid exhaustion, and also keep ps output clean.

Asynchronous wait

In a previous article, I wrote (then removed) there was a way to prevent zombies from being created altogether: ignoring the SIGCHLD signal in the parent.

Indeed when a child exits or is killed (or sometimes when it's stopped / continued), a SIGCHLD is sent to its parent.

It's then possible, but a bit tricky, to catch the signal, and call one of the wait() call inside the signal handler. As with every asynchronous calls, one must pay attention to race conditions, and global state (think errno). Also traditional signals are not enqueued on Unix systems, so one signal may refer to several dead children.
If one does not want to track its children, there is a trick.

Automatic reap

That signal is by default ignored. However, when we explicitly ignores the signal in the parent process, the behavior changes slightly: in that case no zombie is generated. That's not extremely portable (to old unices), and usually already existing zombies are not ripped off. 
Also one should be aware that on modern systems the CPU usage for the children returned by getrusage() (and times()) will not be accounted for.

So in short, it looks like using wait() should still be the most usual way for a process to wait for its children.

There are more details about this in the book. For instance, it's possible to use sigaction to catch the signal and call a handler, but still have the child process being auto-reaped (using SA_NOCLDWAIT flag).


2019-12-04

Interview question: signals

10 years ago

An interrupt is usually generated by the hardware when the processor needs to stop whatever it is doing to deal with some urgent mater. Strangely enough, we also have software interrupts. On unix, we have some high level ones called signals.

One typical interview question is: which signals can't be caught or ignored?

When I was asked this for the first time a long time ago, my answer was... SIGBUS. I thought that there was not much that could be done about a hardware error. I made two mistakes that day.

The first one is brought by the signal (2) man page, that gives the answer the interviewer was surely waiting for: "The signals SIGKILL and SIGSTOP cannot be caught or ignored". signal() and sigaction() will return an error if we try to change their disposition.

But there is more

A couple of years ago, an ex-googler told me that SIGCONT can not be blocked or masked when the process is sleeping.

When reading the very good The linux programming interface from Michael Kerrisk all of this is explained and much more. It turns out that my intuition was not totally wrong. According to the book:
the Single Unix Specification (SUSv3) let undefined what happens when we return from a signal handler for hardware generated signals: SIGBUS, SIGFPE, SIGSEGV and SIGEMT. 
They can't be blocked on modern Linux systems and trying to ignore them will simply terminate those programs.

So when asked about anything...

...don't just answer. Say a bit more about your reasoning. And read books like Michael Kerrisk's one, it's definitely worth the investment. I'll try to post some more about it in a following post.

PS:

  • Another oddity about signals a bit later in the book:

[Originally on BSD], a process could be stopped in one of two ways: as a consequence of being traced by the ptrace() system call, or by being stopped by a signal [...]. When a child is being traced by ptrace(), then delivery of any signal (other than SIGKILL) causes the child to be stopped, and a SIGCHLD signal is consequently sent to the parent. This behavior occurs even if the child is ignoring the signal. However, if the child is blocking the signal, then it is not stopped.

  •  And another one, this time from the abort() man page. [when calling abort()], if the SIGABRT signal is ignored, or caught by a handler that returns, the abort() function will still terminate the process. It does this by restoring the default disposition for SIGABRT and then raising the signal for a second time.

2019-12-03

Docker and Ctrl-P

Ctrl-P is annoying in docker, as it does not do what one might think. On bash it should scroll up, and in vim (in insert mode) search for the current word backward. It is by default used in docker to issue some commands, notably to detach the current container:
CTRL-P CTRL-Q
. Now it is possible to change it in the
.docker/config.json
file, as suggested at codelearn blog:
"detachKeys": "ctrl-@"