Processes 2: Inter-process communication

Overview

In this lecture, we discuss inter-process communication in general, concentrating on two kinds: termination notification and stream communication.

Draft lecture notes — Textbook readings

Inter-process communication

Process coordination requires communication among processes
Many forms of communication
1. Termination notification: parent process informed when child terminates
2. Streaming communication: data written by one process, read by another
3. Interruption of normal processing

Metrics for communication

Bandwidth
Latency
Preemption (voluntary vs. involuntary communication)
- Voluntary (non-preemptive): process controls when it receives communication
- Involuntary (preemptive): process can be interrupted for communication
- Like device access vs. timer interrupts
Blocking
- Blocking: process waits for communication
- Nonblocking: process does not wait for communication (e.g., short read)

Termination notification question

Parent process informed when child terminates
Which metrics matter?
- Bandwidth?
- Latency?
- Preemption?
- Blocking?

Voluntary termination: Exit

exit (= return from main)
- Process transcends the earthly sphere, leaving with its last breath a message for its parent
- exit(STATUS) library function performs cleanup actions, such as flushing stdio files
- _exit(STATUS) system call exits without cleanup
The STATUS is the process’s ‘last words’
- Remembered by kernel
- Can be collected by the process’s parent
Status convention
- 0 (EXIT_SUCCESS) means success, non-zero (1-255) means failure

Involuntary termination

A process can transcend involuntarily (segmentation fault; being killed)
- This is called termination by signal
The exit status indicates the reason for transcendence (exit vs. signal)

Collecting a termination notification

waitpid(pid, &status, options)
- Complex system call! Read the manual page! Check the return values!
- options == 0 means blocking, options == WNOHANG means nonblocking
- pid == -1 means wait for any child process, pid > 0 means wait for that specific process
- Returns the terminated process’s ID (or 0 or -1)
- Can only wait for children
On success, int status filled in with the status
- WIFEXITED(status) ≠ 0 iff process terminated by exit
- If WIFEXITED, then WEXITSTATUS(status) is the exit status

Example: `waitmyecho`

What’s weird about Unix termination notification

Only the parent can collect a termination notification
The parent should collect a termination notification
- Otherwise the child becomes a zombie process
- makezombies
Can you do better?

Speed of exit notification

storage1/r-exitbyte 🤡🤪

Streaming communication question

Goal: Open a private high-throughput communication stream among processes
- Private: Only accessible to processes set up in advance (unlike, say, a broadly-accessible file)
- High-throughput
What API do you suggest?

The `pipe` system call

pipe(int pfd[2]) system call
- Creates a pipe, which is a high-throughput communication stream
- The pipe is assocated with two descriptors, pfd[0] and pfd[1]
- Data written to pfd[1] (the “write end”) may be read from pfd[0] (the “read end”)
- Returns 0 on success, -1 on failure

`pipe` illustration

`selfpipe`

About pipes

Why are pipes private?
- Only accessible to processes that inherit one or more ends from a parent
- Once a file descriptor is closed, it cannot always be recovered!
Why are pipes high-throughput?
- The in-kernel buffer supports batching

Pipe buffering

Bytes written to a pipe are buffered in the kernel until read
How many bytes?

`childpipe`

Pipe end-of-file

A read from a pipe returns end-of-file when all file descriptors for the write end are closed
A write to a pipe terminates the writer when all file descriptors for the read end are closed (more later)
Pipe hygiene: Clean up pipe ends when no longer needed

Pipes and the shell

Have a shell process, sh
Want to set up a pipeline, /bin/echo 61 | wc -c

Problem: How to move file descriptors around?

`dup2`

dup2(oldfd, newfd): Make newfd point to the same file description as oldfd
- If oldfd is not open, returns -1
- If newfd == oldfd, does nothing; else closes newfd first

Pipe dance

Reach this goal using the system calls you know!

Pipe dance explained

Here’s the initial state of the shell.
Child processes inherit file descriptors from their parents only at fork time. Since pipes must be accessible to children in the pipeline, the pipe must be created first, before the child processes.
Now the pipe exists, we can create the left-hand child. It is still running shell code.
We switch our attention to the left-hand child. Inside the code for the child (if (result_of_fork == 0) { ... }), we must reconfigure the process environment to match the requirements of the pipeline. We start by setting up the standard output file descriptor with dup2(pfd[1], 1) (recall that standard output’s file descriptor number is 1). It’s important not to do this in the parent shell—the parent needs its stdout!
That done, we no longer need the pfd[1] file descriptor…
…or the pfd[0] file descriptor. (Not all these steps are order-sensitive—we could have closed pfd[0] first.)
Now the child’s environment matches all requirements, it is safe to replace its program image via execvp. We must set up the environment before calling execvp because the new program, /bin/echo, can run in many environments and simply accepts its initial environment as a given. An advantage of this separation of concerns is that the shell does not rely on /bin/echo (a program it does not control) to perform the dup2/close system calls correctly.
The left-hand child is on its merry way and is no longer running shell code. (It may already execute, writing its output into the pipe buffer!) We switch our attention back to the parent shell, which has been continuing in the meantime.

The pipe’s write end is no longer needed once the child on the left-hand side of that pipe is forked. Once a file descriptor is no longer needed by the parent shell or any of its future children, it’s safe for the parent to close it with close(pfd[1]). (You could close that file descriptor later, too, but that might affect future child processes: remember that all redundant file descriptors to pipe ends require closing.)
The parent can now create the right-hand child. In your code, this will typically happen with another call to command::run. You’ll need to figure out how to pass the read end of the pipe to that call.
The right-hand child now proceeds to reconfigure its environment, first with dup2, …
…and then close.
The child’s environment matches our goal, so the child calls execvp to replace its program image with the new program.
Meanwhile, the parent shell closes its now-redundant read end of the pipe with close.
And we have arrived at our goal.

This presentation has showed the steps required to set up a two-process pipeline. Can you work out how to set up a three-process pipeline on your own?

Processes 2: Inter-process communication

Overview

Inter-process communication

Metrics for communication

Termination notification question

Voluntary termination: Exit

Involuntary termination

Collecting a termination notification

Example: waitmyecho

What’s weird about Unix termination notification

Speed of exit notification

Streaming communication question

The pipe system call

pipe illustration

selfpipe

About pipes

Pipe buffering

childpipe

Pipe end-of-file

Pipes and the shell

Problem: How to move file descriptors around?

dup2

Pipe dance

Pipe dance explained

Example: `waitmyecho`

The `pipe` system call

`pipe` illustration

`selfpipe`

`childpipe`

`dup2`