Shell 1: Process creation and destruction

The shell unit is all about how to use the operating systems primitives that you built in the kernel unit. We’ll learn this in the context of one specific class of program, the shell, which is a program whose raison d’être is managing other processes. You’re already familiar with shells, because you use them constantly. Now you’ll write one!

First, we look at some of the basics system calls: fork, exec, and waitpid.

`fork`: creating processes

The fork system call is the Unix mechanism for creating a new process. It works by cloning the current state of the current process. Its signature:

pid_t fork(void);

where pid_t is a synonym for int. The fork() operation returns twice with two different values, once in the parent process and once in the child process. Specifically, if pid_t p = fork(), then:

If fork() succeeds, then in the parent process, p > 0 equals the child process’s ID.
If fork() succeeds, then in the child process, p == 0.
If fork() fails, then in the parent process, p \< 0 and the error code is stored in errno. Note that if fork() fails, there is no child.

A process can obtain its own ID by calling the getpid() system call, and any process can obtain its parent process’s ID by calling the getppid() system call.

But fork() does not clone everything. In metaphorical terms, fork() copies the parent’s internal state, but not the external world. In technical terms, fork() copies the parent process’s registers, primary memory, and internal per-process operating system structures, such as its file descriptor table; but it does not copy the structures corresponding to hardware devices, such as files or the terminal. Instead, those hardware-like structures are shared between the parent and the child.

For instance, consider fork1.c:

int main(void) {
    pid_t p1 = fork();
    printf("Hello from pid %d\n", getpid());
}

This will print two messages, one from the parent and one from the child. These two messages might appear in either order. In this case the Linux kernel typically runs the parent process first (which we can tell by noting that the line with the numerically-lower process ID appears first, or, in more a bulletproof way, by adding more code), but that’s not a guarantee: once fork() returns, there are two isolated process, and the OS can decide to run them in any order.

Also, both messages appear on the terminal (the console). The fork() does not create a new terminal! (It’s not clear how it possibly could.) Instead, the terminal is shared between the two processes.

The fork() copies all primary memory state in the parent process. This includes stdio buffers, which can lead to some weird surprises! fork2b.c:

int main(void) {
    pid_t initial_pid = getpid();
    printf("Hello from initial pid %d\n", initial_pid);
    pid_t p1 = fork();
    pid_t p2 = fork();
    if (getpid() != initial_pid) {
        printf("Hello from child pid %d\n", getpid());
    }
}

Run with output sent to the terminal, this prints pretty much what you’d expect. But redirected to a pipe (via ./fork2b \| less) or file (./fork2b \> f), the “initial pid” line is duplicated 4 times:

Hello from initial pid 3626
Hello from initial pid 3626
Hello from child pid 3629
Hello from initial pid 3626
Hello from child pid 3628
Hello from initial pid 3626
Hello from child pid 3630

When output is redirected, the “Hello from initial pid” line is being buffered in the process’s primary memory, in a stdio buffer much like the one you implemented in pset 2. And this buffer is copied along with the rest of the process’s primary memory! The buffer is flushed when each process exits, so the line is printed 4 times—once by each process. On the other hand, when output is sent to the terminal, stdio decides not to buffer output, and it emits each line as it is completed (via a write system call).

(How can stdio tell the difference between terminal and file output? When it starts up, it uses an fstat(STDOUT_FILENO...) system call to check what kind of file standard output is. If the answer is “character device”—including a terminal—stdio dials down its buffering. We figured this out using—you guessed it—strace!)

`exec`

The exec system call is used to start a new program. But, perhaps unlike your expectation, it does not start a new process (fork does that). Instead, exec replaces the current process’s image (that is, its primary memory and registers) with a fresh copy of the specified program, running with the specified arguments.

exec is really a family of system calls, not just one. Read the man execv page to learn about them. There are six varieties:

System call	Signature	Program location	Argument list	Environment?
`execl`	`int execl(const char\* path, const char\* arg, ...)`	Direct	Function parameters	Implicit
`execlp`	`int execlp(const char\* file, const char\* arg, ...)`	Search `\$PATH`	Function parameters	Implicit
`execle`	`int execle(const char\* path, const char\* arg, ..., char\* const envp[])`	Direct	Function parameters	Explicit

`execv`	`int execv(const char\* path, char\* const argv[])`	Direct	Array argument	Implicit
`execvp`	`int execvp(const char\* file, char\* const argv[])`	Search `\$PATH`	Array argument	Implicit
`execvpe`	`int execvpe(const char\* file, char\* const argv[], char\* const envp[])`	Search `\$PATH`	Array argument	Explicit

The differences among them:

The execl\* functions pass some of their parameters along to the new program as arguments. The execv\* functions instead take an explicit array of arguments to pass.
The exec\*p functions search for an executable to run, using the current \$PATH environment variable. For instance, you can say execlp("ls", "ls", NULL) to run ls; the library function will search \$PATH for the first-occurring ls executable (which is usually /bin/ls). The non-p functions instead take a full pathname for the executable; if that pathname’s invalid, the functions will fail.
The exec\*e functions take an explicit envp argument to pass environment variables to the child.

The exec system calls return a value, but that value can only be observed on failure: if exec works as intended, then it does not return—the currently running process is instead replaced with a new program image. Thus, if exec returns, then it returns -1.

Aside: `exit`

The exit system call exits the current process. But actually the name of the system call is _exit; exit is a library function that calls _exit after performing some cleanup actions, such as flushing any stdio buffers that contain data. This implicit flush behavior explains why the fork2b.c program produces any output at all.

`waitpid`

fork lets us create a new process, exec lets us run a new program, _exit quits a process. But once a process has started, how can we tell when that process dies? For this, we need a new system call, waitpid. Its signature:

pid_t waitpid(pid_t child, int* status, int flags);

waitpid has complicated semantics that depend on the flags. If child == -1 and flags == 0, then waitpid:

Returns -1, with error code ECHILD, if the current process has no outstanding children.
Otherwise, if any child has exited but has not yet been waited for, waitpid returns that child’s ID and sets \*status to its status. (status lets the parent distinguish normal exits from other kinds of death, such as segmentation faults; and status lets the parent determine the child’s exit code. See man waitpid to learn about the macros required to deal with status, such as WIFEXITED and WEXITSTATUS.)
Otherwise, at least one outstanding child has not yet exited. waitpid will block (occupying no CPU) until a child exits.

If a specific child process’s status is desired, just pass that process ID as child argument.

Note that waitpid only works on children. A waiting process

p1` will *only* receive wait reports for child processes that `p1

itself forked. It is illegal for a child to wait for a parent, or a parent to wait for a grandchild, etc.

The waitdemo.c file shows how waitpid works: the parent process waits for its child to die, who exits after 0.5sec. But what about a more complex semantic? Is it possible to wait for a child to die or 0.75sec to elapse, whichever comes first?

Arguments to waitpid make this possible. Specifically, if WNOHANG is passed for flags, then waitpid will never block: it will always return immediately, but it’ll return 0 if no outstanding child has died. The waittimeout.c file implements this idea, but this polling mechanism wastes a ton of CPU resources.

Our last try was the waitblock.c file. In the parent, a usleep call is used to block for 0.75 seconds. But if the child dies during this call, the usleep call is interrupted by a signal! This signal concept is the model of hardware interrupts implemented by Unix operating systems for processes. The signal causes the usleep system call to return early, printing the message just as we want.

Except it doesn’t work reliably. Uh oh!

Shell 1: Process creation and destruction

fork: creating processes

exec

Aside: exit

waitpid

`fork`: creating processes

`exec`

Aside: `exit`

`waitpid`