Process Section 1 and Storage Exercises

College students: Attendance at section is required and recorded for all college students. All college students should fill out this attendance form when attending section. For more on the attendance policy and what to do if you miss section refer to the syllabus here and here.

Extension School students: Extension School students are required to self-report weekly section participation using this participation form. There are two ways to participate in section: (1) attend section live (for most students, this is one of the zoom sections, but local students are welcome to attend in-person sections) OR (2) watch a section recording (Canvas > Zoom > Published Recordings). Both participation methods are equally valid towards your participation grade.

In today’s section, we'll investigate the behavior of a shell to kick off the process control unit and problem set 5. Then we’ll walk through a selection of our storage exercises, which are problems taken from prior exams. Also take the opportunity to ask questions about the problem set and class material.

Command line breakdown

The pset in this unit introduces a bunch of process control system calls through the medium of a shell implementation. A shell is a program responsible for running other programs; it’s the kind of program behind the command line terminal.

You’ll implement many interesting shell features, including complex command lines. So it’s important to understand how complex command lines are built.

A command is a basic request to execute a program with some arguments. echo foo, for example, is a command. Pipelines, conditionals, and command lines combine commands to build more complex behaviors.

A precise way to explain command lines is with a BNF grammar. Here’s the shell grammar:

commandline ::= (empty)
          |  conditional
          |  conditional ";" commandline
          |  conditional "&" commandline

conditional ::=  pipeline
          |   conditional "&&" pipeline
          |   conditional "||" pipeline

pipeline ::=  command
          |   pipeline "|" command

command  ::=  word
          |   redirection
          |   command word
          |   command redirection

redirection  ::=  redirectionop filename
redirectionop  ::=  "<"  |  ">"  |  "2>"

The grammar says, for instance, that a commandline is zero or more conditionals, separated (and optionally terminated) by ; or &. It also defines conditionals in terms of pipelines, and pipelines in terms of commands.

EXERCISE. Test your understanding of the command line grammar. What are the conditionals in the following command line?

echo foo & echo bar | grep b && echo hello, there ; echo done < /dev/null

(It might be helpful to underline them.)

EXERCISE. What are the pipelines in that command line?

EXERCISE. What are the commands in that command line?

EXERCISE. What are the redirections in that command line, if any?

EXERCISE. Can any pipeline span across more than one conditional? Why or why not?

Command line scientists

That helps show how command lines are built, but how do they behave? It’s possible to learn a lot by constructing “experiments”, namely command lines, and observing their behavior on a real shell!

Unix offers a toolbox of utilities useful for command line science.

Program Description
true Silently succeed (exit with status 0).
false Silently fail (exit with status 1).
cat Write file(s) to standard output.
wc Count lines, words, and characters in standard input.
head -n N Print first N lines of standard input.
tail -n N Print last N lines of standard input.
yes Print y (or any arguments) forever.
sleep N Pause for N seconds.
exit [STATUS] Exit the shell with status STATUS (default 0).
sh -c "COMMANDLINE" Run another shell executing "COMMANDLINE".

For example, the behaviors of the && and || conditional operators can be explored with different combinations of programs like false, true, and echo.

# Hypothesis: `A && B` runs `B` iff `A` exits with status 0.
$ true && echo X
X
$ false && echo X
$ sh -c "exit 2" && echo X
# Hypothesis not falsified!


# Hypothesis: `A || B` runs `B` iff `A` does not exit with status 0.
$ true || echo X
$ false || echo X
X
$ sh -c "exit 2" || echo X
X
# Hypothesis not falsified!


# Hypothesis: The `sleep` command exits with status 1.
$ sleep 10 && echo Done
[10 seconds go by, then:] Done
# Hypothesis falsified 😢: it looks like `sleep` exits with status 0.

Indeed, it is true that A && B executes B only when A exits with status 0, and A || B executes B only when A does not exit with status 0!

In these exercises, you’ll attempt to explore more complex shell properties.

EXERCISE. Design command lines that test whether commands that are part of a single conditional are grouped left to right. That is, try and determine whether, given the chain A || B && C, commands are left associative:

  1. A || B runs first.
    1. So A runs first.
    2. Then, depending on A’s status, B runs or is skipped.
  2. Then, depending on A || B’s status, C runs or is skipped.

Or right associative:

  1. A runs first.
  2. Then, depending on A’s status, B && C runs or is skipped.

Or something entirely different.

EXERCISE. Design command lines that test whether two-process pipelines, such as A | B, execute A and B concurrently (at the same time) or serially (B does not start until A has completed).

EXERCISE. Design command lines that test whether the exit status of a conditional equals the exit status of the last-executed command in that conditional.

Hint. The $? shell variable equals the exit status of the most-recently-executed foreground conditional. For instance:

$ true
$ echo $?
0
$ false
$ echo $?
1
$ sh -c "exit 2"
$ echo $?
2

EXERCISE. Design command lines that test whether the exit status of a pipeline equals the exit status of its rightmost command. (Try to avoid using $?.)

Storage Exercises

All exercises assume a 64-bit x86-64 architecture unless otherwise stated.

The main subjects of these exercises are:

IO-19. File system calls

QUESTION IO-19A. A program makes these system calls:

int fd = open("f.txt", O_WRONLY | O_CREAT | O_TRUNC);
ssize_t nw = write(fd, "CS121 is awesome!", 17); // returned 17

What following series of system calls would ensure that, after all system calls complete, the file f.txt contains the text “CS 61 is terrible” (without the quotation marks)? Minimize the number of bytes written.

QUESTION IO-19B. Which of the following file access patterns might have similar output from the strace utility? List all that apply or say “none.”

  1. Sequential byte writes using stdio
  2. Sequential byte writes using mmap
  3. Sequential byte writes using system calls

QUESTION IO-19C. Which of the following file access patterns might have similar output from the strace utility? List all that apply or say “none.”

  1. Sequential byte writes using stdio
  2. Sequential block writes using stdio
  3. Sequential byte writes using system calls
  4. Sequential block writes using system calls

QUESTION IO-19D. Which of the following file access patterns might have similar output from the strace utility? List all that apply or say “none.”

  1. Reverse-sequential byte writes using stdio
  2. Reverse-sequential block writes using stdio
  3. Reverse-sequential byte writes using system calls
  4. Reverse-sequential block writes using system calls

IO-20. Caches Big, Fast, and Cheap

QUESTION IO-20A. We discussed several kinds of computer storage in class, including:

  1. Drives (disks and SSDs)
  2. primary Memory
  3. processor Cache memory
  4. Registers

Put those storage technologies in order by latency, slowest first. (An answer might be “DMCR”.)

QUESTION IO-20B. Put those storage technologies in order by cost per byte, cheapest first.

QUESTION IO-20C. Put those storage technologies in order by capacity in bytes on a typical computer, smallest first.

QUESTION IO-20D. Which storage technology serves as the underlying storage for each of the following caches? If unsure, explain briefly.

  1. Buffer cache
  2. Stdio cache
  3. Processor cache

QUESTION IO-20E. True or false? Given a cache with two or more blocks, implementing it as a fully associative cache would always produce as good or better hit rates than a direct-mapped implementation.

QUESTION IO-20F. True or false? Prefetching bytes from a file on disk into the buffer cache can cause the buffer cache to become incoherent.

IO-4. I/O caching and strace

Elif Batuman is investigating several program executables left behind by her ex-roommate Fyodor. She knows these executables perform byte-at-a-time I/O, but she’s not sure what kinds of caches they use. To find out, she runs each executable under strace in the following way:

strace -o strace.txt ./EXECUTABLE files/text1meg.txt > files/out.txt

Help her figure out properties of these programs based on their system call traces.

QUESTION IO-4A. Program ./mysterya:

open("files/text1meg.txt", O_RDONLY)    = 3
brk(0)                                  = 0x8193000
brk(0x81b5000)                          = 0x81b5000
read(3, "A", 1)                         = 1
write(1, "A", 1)                        = 1
read(3, "\n", 1)                        = 1
write(1, "\n", 1)                       = 1
read(3, "A", 1)                         = 1
write(1, "A", 1)                        = 1
read(3, "'", 1)                         = 1
write(1, "'", 1)                        = 1
read(3, "s", 1)                         = 1
write(1, "s", 1)                        = 1
...

List at least one option in each column.

  1. Sequential reads
  2. Reverse sequential reads
  3. Strided reads
  1. No read cache
  2. Unaligned read cache
  3. Aligned read cache
  1. No write cache
  2. Write cache
  1. Cache size 4096
  2. Cache size 2048
  3. Cache size 1024
  4. Other or no cache

QUESTION IO-4B. Program ./mysteryb:

open("files/text1meg.txt", O_RDONLY)    = 3
brk(0)                                  = 0x96c5000
brk(0x96e6000)                          = 0x96e6000
read(3, "A\nA's\nAA's\nAB's\nABM's\nAC's\nACTH'"..., 2048) = 2048
write(1, "A\nA's\nAA's\nAB's\nABM's\nAC's\nACTH'"..., 2048) = 2048
read(3, "kad\nAkron\nAkron's\nAl\nAl's\nAla\nAl"..., 2048) = 2048
write(1, "kad\nAkron\nAkron's\nAl\nAl's\nAla\nAl"..., 2048) = 2048
...

List at least one option in each column.

  1. Sequential reads
  2. Reverse sequential reads
  3. Strided reads
  1. No read cache
  2. Unaligned read cache
  3. Aligned read cache
  1. No write cache
  2. Write cache
  1. Cache size 4096
  2. Cache size 2048
  3. Cache size 1024
  4. Other or no cache

QUESTION IO-4C. Program ./mysteryc:

open("files/text1meg.txt", O_RDONLY)    = 3
brk(0)                                  = 0x9064000
brk(0x9085000)                          = 0x9085000
fstat64(3, {st_mode=S_IFREG|0664, st_size=1048576, ...}) = 0
lseek(3, 1046528, SEEK_SET)             = 1046528
read(3, "ingau\nRheingau's\nRhenish\nRhianno"..., 2048) = 2048
write(1, "oR\ntlevesooR\ns'yenooR\nyenooR\ns't"..., 2048) = 2048
lseek(3, 1044480, SEEK_SET)             = 1044480
read(3, "Quinton\nQuinton's\nQuirinal\nQuisl"..., 2048) = 2048
write(1, "ehR\neehR\naehR\ns'hR\nhR\nsdlonyeR\ns"..., 2048) = 2048
lseek(3, 1042432, SEEK_SET)             = 1042432
read(3, "emyslid's\nPrensa\nPrensa's\nPrenti"..., 2048) = 2048
write(1, "\ns'nailitniuQ\nnailitniuQ\nnniuQ\ns"..., 2048) = 2048
lseek(3, 1040384, SEEK_SET)             = 1040384
read(3, "Pindar's\nPinkerton\nPinocchio\nPin"..., 2048) = 2048
write(1, "rP\ndilsymerP\ns'regnimerP\nregnime"..., 2048) = 2048
...

List at least one option in each column.

  1. Sequential reads
  2. Reverse sequential reads
  3. Strided reads
  1. No read cache
  2. Unaligned read cache
  3. Aligned read cache
  1. No write cache
  2. Write cache
  1. Cache size 4096
  2. Cache size 2048
  3. Cache size 1024
  4. Other or no cache

QUESTION IO-4D. Program ./mysteryd:

open("files/text1meg.txt", O_RDONLY)    = 3
brk(0)                                  = 0x9a0e000
brk(0x9a2f000)                          = 0x9a2f000
fstat64(3, {st_mode=S_IFREG|0664, st_size=1048576, ...}) = 0
lseek(3, 1048575, SEEK_SET)             = 1048575
read(3, "o", 2048)                      = 1
lseek(3, 1048574, SEEK_SET)             = 1048574
read(3, "Ro", 2048)                     = 2
lseek(3, 1048573, SEEK_SET)             = 1048573
read(3, "\nRo", 2048)                   = 3
...
lseek(3, 1046528, SEEK_SET)             = 1046528
read(3, "ingau\nRheingau's\nRhenish\nRhianno"..., 2048) = 2048
write(1, "oR\ntlevesooR\ns'yenooR\nyenooR\ns't"..., 2048) = 2048
lseek(3, 1046527, SEEK_SET)             = 1046527
read(3, "eingau\nRheingau's\nRhenish\nRhiann"..., 2048) = 2048
lseek(3, 1046526, SEEK_SET)             = 1046526
read(3, "heingau\nRheingau's\nRhenish\nRhian"..., 2048) = 2048
...

List at least one option in each column.

  1. Sequential reads
  2. Reverse sequential reads
  3. Strided reads
  1. No read cache
  2. Unaligned read cache
  3. Aligned read cache
  1. No write cache
  2. Write cache
  1. Cache size 4096
  2. Cache size 2048
  3. Cache size 1024
  4. Other or no cache

QUESTION IO-4E. Program ./mysterye:

open("files/text1meg.txt", O_RDONLY)    = 3
brk(0)                                  = 0x93e5000
brk(0x9407000)                          = 0x9407000
read(3, "A", 1)                         = 1
read(3, "\n", 1)                        = 1
read(3, "A", 1)                         = 1
...
read(3, "A", 1)                         = 1
read(3, "l", 1)                         = 1
write(1, "A\nA's\nAA's\nAB's\nABM's\nAC's\nACTH'"..., 1024) = 1024
read(3, "t", 1)                         = 1
read(3, "o", 1)                         = 1
read(3, "n", 1)                         = 1
...

List at least one option in each column.

  1. Sequential reads
  2. Reverse sequential reads
  3. Strided reads
  1. No read cache
  2. Unaligned read cache
  3. Aligned read cache
  1. No write cache
  2. Write cache
  1. Cache size 4096
  2. Cache size 2048
  3. Cache size 1024
  4. Other or no cache

QUESTION IO-4F. Program ./mysteryf:

open("files/text1meg.txt", O_RDONLY)    = 3
brk(0)                                  = 0x9281000
brk(0x92a3000)                          = 0x92a3000
read(3, "A\nA's\nAA's\nAB's\nABM's\nAC's\nACTH'"..., 4096) = 4096
write(1, "A", 1)                        = 1
write(1, "\n", 1)                       = 1
write(1, "A", 1)                        = 1
...
write(1, "A", 1)                        = 1
write(1, "l", 1)                        = 1
read(3, "ton's\nAludra\nAludra's\nAlva\nAlvar"..., 4096) = 4096
write(1, "t", 1)                        = 1
write(1, "o", 1)                        = 1
write(1, "n", 1)                        = 1
...

List at least one option in each column.

  1. Sequential reads
  2. Reverse sequential reads
  3. Strided reads
  1. No read cache
  2. Unaligned read cache
  3. Aligned read cache
  1. No write cache
  2. Write cache
  1. Cache size 4096
  2. Cache size 2048
  3. Cache size 1024
  4. Other or no cache

IO-14. Cache code

Several famous musicians have just started working on the standard I/O problem set. They share the following code for their read-only, sequential, single-slot cache:

struct io61_file {
    int fd;
    unsigned char buf[4096];
    size_t pos;    // position of next character to read in `buf`
    size_t sz;     // number of valid characters in `buf`
};

int io61_readc(io61_file* f) {
    if (f->pos >= f->sz) {
        f->pos = f->sz = 0;
        ssize_t nr = read(f->fd, f->buf, sizeof(f->buf));
        if (nr <= 0) {
            f->sz = 0;
            return -1;
        } else {
            f->sz = nr;
        }
    }
    int ch = f->buf[f->pos];
    ++f->pos;
    return ch;
}

But they have different io61_read implementations. Donald (Lambert)’s is:

ssize_t io61_read(io61_file* f, char* buf, size_t sz) {
    return read(f->fd, buf, sz);
}

Solange (Knowles)’s is:

ssize_t io61_read(io61_file* f, char* buf, size_t sz) {
    for (size_t pos = 0; pos < sz; ++pos, ++buf) {
        *buf = io61_readc(f);
    }
    return sz;
}

Caroline (Shaw)’s is:

ssize_t io61_read(io61_file* f, char* buf, size_t sz) {
    if (f->pos >= f->sz) {
        return read(f->fd, buf, sz);
    } else {
        int ch = io61_readc(f);
        if (ch < 0) {
            return 0;
        }
        *buf = ch;
        return io61_read(f, buf + 1, sz - 1) + 1;
    }
}

You are testing each of these musicians’ codes by executing a sequence of io61_readc and/or io61_read calls on an input file and printing the resulting characters to standard output. There are no seeks, and your test programs print until end of file, so your tests’ output should equal the input file’s contents.

You should assume for these questions that no read system call ever returns -1.

QUESTION IO-14A. Describe an access pattern—that is, a sequence of io61_readc and/or io61_read calls (with lengths)—for which Donald’s code can return incorrect data.

QUESTION IO-14B. Which of these musicians’ codes can generate an output file with incorrect length?

QUESTION IO-14C. Give a sequence of io61 calls for which Solange’s code will outperform Donald’s, or vice versa, and explain briefly.

QUESTION IO-14D. Suggest a small change (≤10 characters) to Caroline’s code that could make it perform at least as well as both Solange’s and Donald’s codes on all access patterns. Explain briefly.

IO-17. Reference strings and hit rates

QUESTION IO-17A. Write a purely-sequential reference string containing at least five accesses.

QUESTION IO-17B. What is the hit rate for this reference string? Tell us the eviction algorithm and number of slots you’ve chosen.

Parts C and D concern this ten-element reference string:

1 2 1 2 3 4 1 5 1 1

We consider executing this reference string starting with different cache contents.

QUESTION IO-17C. A three-slot LRU cache processes this reference string and observes a 70% hit rate. What are the initial contents of the cache?

QUESTION IO-17D. A three-slot FIFO cache with initial contents 4, 1, and 2 (in slots A, B, and C, respectively) processes the reference string and observes a 60% hit rate. Which slot was next up for eviction when the reference string began? (Assume the slots were initially filled in alphabetical order.)

The eviction algorithms we saw in class are entirely reactive: they only insert a block when that block is referenced. This limits how well the cache can perform. A read cache can also be proactive by inserting blocks before they’re needed, possibly speeding up later accesses. This is the essence of prefetching.

In a proactive caching model, the cache can evict and load two or more blocks per access in the reference string. A prefetching policy decides which additional, non-accessed blocks to load.

QUESTION IO-17E. Describe an access pattern for which the following prefetching policy would be effective: When filling a cache with block A, also load block A+1 into another slot.

QUESTION IO-17F. Write a reference string and name an eviction policy and/or cache size for which this prefetching policy would be less effective (have a lower hit rate) than no prefetching at all.