Shell 1: Process creation and destruction
The shell unit is all about how to use the operating systems primitives that you built in the kernel unit. We’ll learn this in the context of one specific class of program, the shell, which is a program whose raison d’être is managing other processes. You’re already familiar with shells, because you use them constantly. Now you’ll write one!
First, we look at some of the basics system calls: fork
, exec
,
and waitpid
.
fork
: creating processes
The fork
system call is the Unix mechanism for creating a new
process. It works by cloning the current state of the current process.
Its signature:
pid_t fork(void);
where pid_t
is a synonym for int
. The fork()
operation
returns twice with two different values, once in the parent process
and once in the child process. Specifically, if pid_t p = fork()
,
then:
- If
fork()
succeeds, then in the parent process,p
> 0 equals the child process’s ID. - If
fork()
succeeds, then in the child process,p == 0
. - If
fork()
fails, then in the parent process,p \< 0
and the error code is stored inerrno
. Note that iffork()
fails, there is no child.
A process can obtain its own ID by calling the getpid()
system call,
and any process can obtain its parent process’s ID by calling the
getppid()
system call.
But fork()
does not clone everything. In metaphorical terms,
fork()
copies the parent’s internal state, but not the external
world. In technical terms, fork()
copies the parent process’s
registers, primary memory, and internal per-process operating system
structures, such as its file descriptor table; but it does not copy
the structures corresponding to hardware devices, such as files or the
terminal. Instead, those hardware-like structures are shared between
the parent and the child.
For instance, consider fork1.c
:
int main(void) {
pid_t p1 = fork();
printf("Hello from pid %d\n", getpid());
}
This will print two messages, one from the parent and one from the
child. These two messages might appear in either order. In this case
the Linux kernel typically runs the parent process first (which we can
tell by noting that the line with the numerically-lower process ID
appears first, or, in more a bulletproof way, by adding more code), but
that’s not a guarantee: once fork()
returns, there are two isolated
process, and the OS can decide to run them in any order.
Also, both messages appear on the terminal (the console). The fork()
does not create a new terminal! (It’s not clear how it possibly could.)
Instead, the terminal is shared between the two processes.
The fork()
copies all primary memory state in the parent process.
This includes stdio buffers, which can lead to some weird surprises!
fork2b.c
:
int main(void) {
pid_t initial_pid = getpid();
printf("Hello from initial pid %d\n", initial_pid);
pid_t p1 = fork();
pid_t p2 = fork();
if (getpid() != initial_pid) {
printf("Hello from child pid %d\n", getpid());
}
}
Run with output sent to the terminal, this prints pretty much what you’d
expect. But redirected to a pipe (via ./fork2b \| less
) or file
(./fork2b \> f
), the “initial pid” line is duplicated 4 times:
Hello from initial pid 3626
Hello from initial pid 3626
Hello from child pid 3629
Hello from initial pid 3626
Hello from child pid 3628
Hello from initial pid 3626
Hello from child pid 3630
When output is redirected, the “Hello from initial pid” line is being
buffered in the process’s primary memory, in a stdio buffer much like
the one you implemented in pset 2. And
this buffer is copied along with the rest of the process’s primary
memory! The buffer is flushed when each process exits, so the line is
printed 4 times—once by each process. On the other hand, when output is
sent to the terminal, stdio decides not to buffer output, and it emits
each line as it is completed (via a write
system call).
(How can stdio tell the difference between terminal and file output?
When it starts up, it uses an fstat(STDOUT_FILENO...)
system call to
check what kind of file standard output is. If the answer is “character
device”—including a terminal—stdio dials down its buffering. We figured
this out using—you guessed it—strace!)
exec
The exec
system call is used to start a new program. But, perhaps
unlike your expectation, it does not start a new process (fork
does that). Instead, exec
replaces the current process’s image
(that is, its primary memory and registers) with a fresh copy of the
specified program, running with the specified arguments.
exec
is really a family of system calls, not just one. Read the
man execv
page to learn about them. There are six varieties:
System call | Signature | Program location | Argument list | Environment? |
---|---|---|---|---|
execl |
int execl(const char\* path, const char\* arg, ...) |
Direct | Function parameters | Implicit |
execlp |
int execlp(const char\* file, const char\* arg, ...) |
Search \$PATH |
Function parameters | Implicit |
execle |
int execle(const char\* path, const char\* arg, ..., char\* const envp[]) |
Direct | Function parameters | Explicit |
execv |
int execv(const char\* path, char\* const argv[]) |
Direct | Array argument | Implicit |
execvp |
int execvp(const char\* file, char\* const argv[]) |
Search \$PATH |
Array argument | Implicit |
execvpe |
int execvpe(const char\* file, char\* const argv[], char\* const envp[]) |
Search \$PATH |
Array argument | Explicit |
The differences among them:
- The
execl\*
functions pass some of their parameters along to the new program as arguments. Theexecv\*
functions instead take an explicit array of arguments to pass. - The
exec\*p
functions search for an executable to run, using the current\$PATH
environment variable. For instance, you can sayexeclp("ls", "ls", NULL)
to runls
; the library function will search\$PATH
for the first-occurringls
executable (which is usually/bin/ls
). The non-p
functions instead take a full pathname for the executable; if that pathname’s invalid, the functions will fail. - The
exec\*e
functions take an explicitenvp
argument to pass environment variables to the child.
The exec
system calls return a value, but that value can only be
observed on failure: if exec
works as intended, then it does not
return—the currently running process is instead replaced with a new
program image. Thus, if exec
returns, then it returns -1.
Aside: exit
The exit
system call exits the current process. But actually the
name of the system call is _exit
; exit
is a library function
that calls _exit
after performing some cleanup actions, such as
flushing any stdio buffers that contain data. This implicit flush
behavior explains why the fork2b.c
program produces any output at
all.
waitpid
fork
lets us create a new process, exec
lets us run a new
program, _exit
quits a process. But once a process has started, how
can we tell when that process dies? For this, we need a new system call,
waitpid
. Its signature:
pid_t waitpid(pid_t child, int* status, int flags);
waitpid
has complicated semantics that depend on the flags
. If
child == -1
and flags == 0
, then waitpid
:
- Returns -1, with error code
ECHILD
, if the current process has no outstanding children. - Otherwise, if any child has exited but has not yet been waited for,
waitpid
returns that child’s ID and sets\*status
to its status. (status
lets the parent distinguish normal exits from other kinds of death, such as segmentation faults; andstatus
lets the parent determine the child’s exit code. Seeman waitpid
to learn about the macros required to deal withstatus
, such asWIFEXITED
andWEXITSTATUS
.) - Otherwise, at least one outstanding child has not yet exited.
waitpid
will block (occupying no CPU) until a child exits.
If a specific child process’s status is desired, just pass that process
ID as child
argument.
Note that waitpid
only works on children. A waiting process
p1` will *only* receive wait reports for child processes that `p1
itself forked. It is illegal for a child to wait for a parent, or a parent to wait for a grandchild, etc.
The waitdemo.c
file shows how waitpid
works: the parent process
waits for its child to die, who exits after 0.5sec. But what about a
more complex semantic? Is it possible to wait for a child to die or
0.75sec to elapse, whichever comes first?
Arguments to waitpid
make this possible. Specifically, if
WNOHANG
is passed for flags
, then waitpid
will never
block: it will always return immediately, but it’ll return 0 if no
outstanding child has died. The waittimeout.c
file implements this
idea, but this polling mechanism wastes a ton of CPU resources.
Our last try was the waitblock.c
file. In the parent, a usleep
call is used to block for 0.75 seconds. But if the child dies during
this call, the usleep
call is interrupted by a signal! This
signal concept is the model of hardware interrupts implemented by Unix
operating systems for processes. The signal causes the usleep
system
call to return early, printing the message just as we want.
Except it doesn’t work reliably. Uh oh!