Shell 1: Process creation and destruction
The shell unit is all about how to use the operating systems primitives that you built in the kernel unit. We’ll learn this in the context of one specific class of program, the shell, which is a program whose raison d’être is managing other processes. You’re already familiar with shells, because you use them constantly. Now you’ll write one!
First, we look at some of the basics system calls: fork, exec,
and waitpid.
fork: creating processes
The fork system call is the Unix mechanism for creating a new
process. It works by cloning the current state of the current process.
Its signature:
pid_t fork(void);
where pid_t is a synonym for int. The fork() operation
returns twice with two different values, once in the parent process
and once in the child process. Specifically, if pid_t p = fork(),
then:
- If
fork()succeeds, then in the parent process,p> 0 equals the child process’s ID. - If
fork()succeeds, then in the child process,p == 0. - If
fork()fails, then in the parent process,p \< 0and the error code is stored inerrno. Note that iffork()fails, there is no child.
A process can obtain its own ID by calling the getpid() system call,
and any process can obtain its parent process’s ID by calling the
getppid() system call.
But fork() does not clone everything. In metaphorical terms,
fork() copies the parent’s internal state, but not the external
world. In technical terms, fork() copies the parent process’s
registers, primary memory, and internal per-process operating system
structures, such as its file descriptor table; but it does not copy
the structures corresponding to hardware devices, such as files or the
terminal. Instead, those hardware-like structures are shared between
the parent and the child.
For instance, consider fork1.c:
int main(void) {
pid_t p1 = fork();
printf("Hello from pid %d\n", getpid());
}
This will print two messages, one from the parent and one from the
child. These two messages might appear in either order. In this case
the Linux kernel typically runs the parent process first (which we can
tell by noting that the line with the numerically-lower process ID
appears first, or, in more a bulletproof way, by adding more code), but
that’s not a guarantee: once fork() returns, there are two isolated
process, and the OS can decide to run them in any order.
Also, both messages appear on the terminal (the console). The fork()
does not create a new terminal! (It’s not clear how it possibly could.)
Instead, the terminal is shared between the two processes.
The fork() copies all primary memory state in the parent process.
This includes stdio buffers, which can lead to some weird surprises!
fork2b.c:
int main(void) {
pid_t initial_pid = getpid();
printf("Hello from initial pid %d\n", initial_pid);
pid_t p1 = fork();
pid_t p2 = fork();
if (getpid() != initial_pid) {
printf("Hello from child pid %d\n", getpid());
}
}
Run with output sent to the terminal, this prints pretty much what you’d
expect. But redirected to a pipe (via ./fork2b \| less) or file
(./fork2b \> f), the “initial pid” line is duplicated 4 times:
Hello from initial pid 3626
Hello from initial pid 3626
Hello from child pid 3629
Hello from initial pid 3626
Hello from child pid 3628
Hello from initial pid 3626
Hello from child pid 3630
When output is redirected, the “Hello from initial pid” line is being
buffered in the process’s primary memory, in a stdio buffer much like
the one you implemented in pset 2. And
this buffer is copied along with the rest of the process’s primary
memory! The buffer is flushed when each process exits, so the line is
printed 4 times—once by each process. On the other hand, when output is
sent to the terminal, stdio decides not to buffer output, and it emits
each line as it is completed (via a write system call).
(How can stdio tell the difference between terminal and file output?
When it starts up, it uses an fstat(STDOUT_FILENO...) system call to
check what kind of file standard output is. If the answer is “character
device”—including a terminal—stdio dials down its buffering. We figured
this out using—you guessed it—strace!)
exec
The exec system call is used to start a new program. But, perhaps
unlike your expectation, it does not start a new process (fork
does that). Instead, exec replaces the current process’s image
(that is, its primary memory and registers) with a fresh copy of the
specified program, running with the specified arguments.
exec is really a family of system calls, not just one. Read the
man execv page to learn about them. There are six varieties:
| System call | Signature | Program location | Argument list | Environment? |
|---|---|---|---|---|
execl |
int execl(const char\* path, const char\* arg, ...) |
Direct | Function parameters | Implicit |
execlp |
int execlp(const char\* file, const char\* arg, ...) |
Search \$PATH |
Function parameters | Implicit |
execle |
int execle(const char\* path, const char\* arg, ..., char\* const envp[]) |
Direct | Function parameters | Explicit |
execv |
int execv(const char\* path, char\* const argv[]) |
Direct | Array argument | Implicit |
execvp |
int execvp(const char\* file, char\* const argv[]) |
Search \$PATH |
Array argument | Implicit |
execvpe |
int execvpe(const char\* file, char\* const argv[], char\* const envp[]) |
Search \$PATH |
Array argument | Explicit |
The differences among them:
- The
execl\*functions pass some of their parameters along to the new program as arguments. Theexecv\*functions instead take an explicit array of arguments to pass. - The
exec\*pfunctions search for an executable to run, using the current\$PATHenvironment variable. For instance, you can sayexeclp("ls", "ls", NULL)to runls; the library function will search\$PATHfor the first-occurringlsexecutable (which is usually/bin/ls). The non-pfunctions instead take a full pathname for the executable; if that pathname’s invalid, the functions will fail. - The
exec\*efunctions take an explicitenvpargument to pass environment variables to the child.
The exec system calls return a value, but that value can only be
observed on failure: if exec works as intended, then it does not
return—the currently running process is instead replaced with a new
program image. Thus, if exec returns, then it returns -1.
Aside: exit
The exit system call exits the current process. But actually the
name of the system call is _exit; exit is a library function
that calls _exit after performing some cleanup actions, such as
flushing any stdio buffers that contain data. This implicit flush
behavior explains why the fork2b.c program produces any output at
all.
waitpid
fork lets us create a new process, exec lets us run a new
program, _exit quits a process. But once a process has started, how
can we tell when that process dies? For this, we need a new system call,
waitpid. Its signature:
pid_t waitpid(pid_t child, int* status, int flags);
waitpid has complicated semantics that depend on the flags. If
child == -1 and flags == 0, then waitpid:
- Returns -1, with error code
ECHILD, if the current process has no outstanding children. - Otherwise, if any child has exited but has not yet been waited for,
waitpidreturns that child’s ID and sets\*statusto its status. (statuslets the parent distinguish normal exits from other kinds of death, such as segmentation faults; andstatuslets the parent determine the child’s exit code. Seeman waitpidto learn about the macros required to deal withstatus, such asWIFEXITEDandWEXITSTATUS.) - Otherwise, at least one outstanding child has not yet exited.
waitpidwill block (occupying no CPU) until a child exits.
If a specific child process’s status is desired, just pass that process
ID as child argument.
Note that waitpid only works on children. A waiting process
p1` will *only* receive wait reports for child processes that `p1
itself forked. It is illegal for a child to wait for a parent, or a parent to wait for a grandchild, etc.
The waitdemo.c file shows how waitpid works: the parent process
waits for its child to die, who exits after 0.5sec. But what about a
more complex semantic? Is it possible to wait for a child to die or
0.75sec to elapse, whichever comes first?
Arguments to waitpid make this possible. Specifically, if
WNOHANG is passed for flags, then waitpid will never
block: it will always return immediately, but it’ll return 0 if no
outstanding child has died. The waittimeout.c file implements this
idea, but this polling mechanism wastes a ton of CPU resources.
Our last try was the waitblock.c file. In the parent, a usleep
call is used to block for 0.75 seconds. But if the child dies during
this call, the usleep call is interrupted by a signal! This
signal concept is the model of hardware interrupts implemented by Unix
operating systems for processes. The signal causes the usleep system
call to return early, printing the message just as we want.
Except it doesn’t work reliably. Uh oh!