Synchronization 1: Bounded buffer implementation
Synchronization is a general term for the set of techniques we use to make parallel code correct.
What makes synchronization difficult is the exponential increase in possible behaviors that happens when multiple programs are interacting or, worse, operating on the same data. Coordination is a hard problem! This makes intuitive, or even proverbial, sense: idioms like “too many cooks in the kitchen”, “too many cooks spoil the broth”, and “herding cats” remind us of the difficulty of coordinating parallelism and the importance of synchronization in all aspects of life.
Our study of synchronization covers multithreaded parallelism, the system calls, library functions, and operating system features that allow a single process to use the resources of multiple processors. It also covers networking, the system calls and operating system features that allow programs to communicate with one another over a network. Many programs that perform networking tasks use threads to do it. But we begin with another instance of synchronization, one that we have not necessarily thought of as a synchronization problem: kernel code that handles inter-process communication.
WeensyOS pipes and bounded buffers
The pipe mechanism for IPC is generally implemented with a data structure called a bounded buffer. This data structure is interesting both for pipes and for synchronization, since a full implementation must solve many interesting synchronization challenges.
We start with a simple OS that supports a pipe.
WeensyOS pipe kernel
Our OS supports these system calls:
-
ssize_t sys_pipewrite(const char\* buf, size_t sz)
-
Write the
sz
bytes of data starting atbuf
into the pipe. Return the number of characters written, which can besz
; a number less thansz
, indicating a partial write; or-1
, indicating that the kernel’s buffer is full and the writer should try again later. Only returns0
ifsz == 0
. -
ssize_t sys_piperead(char\* buf, size_t sz)
-
Read up to
sz
bytes of data from the pipe intobuf
. Return the number of characters read, which can besz
; a number greater than zero and less thansz
, indicating a partial read;0
, meaning either thatsz == 0
or that the pipe is closed; or-1
, indicating that the kernel’s buffer is empty and the reader should try again later.
These are like read
and write
system calls on Unix, but rather
than taking a file descriptor argument, our system calls apply to the
OS’s single pipe. (It would be easy to generalize to multiple pipes.)
But the kernel’s implementation of these system calls differs from
Linux’s in important ways.
- Linux, as we’ve seen, has a default pipe buffer of 65536 characters. Our initial pipeos has a pipe buffer of one character!
- Linux blocks
read
andwrite
system calls as appropriate: aread
system call will block on an empty pipe buffer, and awrite
system call will block on a full pipe buffer. Our pipeos system calls never block, requiring programs that call them to poll instead.
We can see both of these features by looking at the kernel’s pipe implementation, which looks like this:
// GLOBALS
// memory representing a pipe buffer
static char pipebuffer[1];
// number of characters currently in pipe buffer
static size_t pb_len; /* always ≥0 and ≤1 */
` // CASES IN `exception` `
case INT_SYS_PIPEWRITE: {
const char* buf = (const char*) current->p_registers.reg_rdi;
size_t sz = current->p_registers.reg_rsi;
if (!check_memory(kernel_pagetable, (uintptr_t) buf, sz,
PTE_P | PTE_U)) {
// bad memory
current->p_registers.reg_rax = -1;
} else if (sz == 0) {
// nothing to write
current->p_registers.reg_rax = 0;
} else if (pb_len == sizeof(pipebuffer)) {
// no room, try again later
current->p_registers.reg_rax = -1;
} else {
// write 1 character
pipebuffer[0] = buf[0];
pb_len = 1;
current->p_registers.reg_rax = 1;
}
break;
}
case INT_SYS_PIPEREAD: {
char* buf = (char*) current->p_registers.reg_rdi;
size_t sz = current->p_registers.reg_rsi;
if (!check_memory(kernel_pagetable, (uintptr_t) buf, sz,
PTE_P | PTE_W | PTE_U)) {
// bad memory
current->p_registers.reg_rax = -1;
} else if (sz == 0) {
// empty read buffer
current->p_registers.reg_rax = 0;
} else if (pb_len == 0) {
// nothing to read, try again later
current->p_registers.reg_rax = -1;
} else {
// read 1 character
buf[0] = pipebuffer[0];
pb_len = 0;
current->p_registers.reg_rax = 1;
}
break;
}
User-level programs
The OS runs two programs, p-pipewriter
and p-pipereader
, that
behave as follows.
p-pipewriter
repeatedly writes a randomly-chosen newline-terminated message to the pipe, then spins (waits) for about 1.25 seconds.p-pipereader
repeatedly reads a (newline-terminated) line from the pipe and prints it.
On Linux, we’d expect that each message should be transferred using just two system calls: one to write the message and one to read it. WeensyOS will require more. But how many more? A lot more:
16498 syscalls: wrote I am a plumber.
15940 syscalls: read I am a plumber.
32100 syscalls: wrote Hello, friend!
163173 syscalls: read Hello, friend!
48391 syscalls: wrote I am a plumber.
308110 syscalls: read I am a plumber.
It appears to take roughly 1000 pipe system calls per character to write a message, and many more to read a message. That’s many, many system calls to accomplish a single unit of work! Polling is often inefficient, leading to much wasted work.
In the remainder of class, we improved the operating system to reduce the number of pipe system calls per operation. This requires several changes, including making the bounded buffer larger than one character (not really a synchronization-related change) and making system calls block (absolutely a synchronization-related change).
Improvement 1: Scheduling order
The minimum number of system calls required to write an N-character message, using a one-character buffer, is N. The pipewriter is making many more than N system calls per write.
The reason is that the kernel is making a bad scheduling decision. Check
out the SYS_PIPEWRITE
code. When there’s no room in the pipe, the
kernel returns -1 to the calling process—and then runs the calling
process again (there’s a run(current)
call at the end of
exception
). Though correct—it is always correct to run any runnable
process—this is an inefficient scheduling order. The pipewriter’s
sys_pipewrite
function will always fail until the pipereader runs
to free up some space. In general, when a system call fails on some
pipe-like object that’s used for inter-process communication, it often
makes sense to run the other process.
Here’s the revised SYS_PIPEWRITE
code:
case INT_SYS_PIPEWRITE: {
const char* buf = (const char*) current->p_registers.reg_rdi;
size_t sz = current->p_registers.reg_rsi;
if (!check_memory(kernel_pagetable, (uintptr_t) buf, sz,
PTE_P | PTE_U)) {
// bad memory
current->p_registers.reg_rax = -1;
} else if (sz == 0) {
// nothing to write
current->p_registers.reg_rax = 0;
} else if (pb_len == sizeof(pipebuffer)) {
// no room, try again later
current->p_registers.reg_rax = -1;
schedule();
} else {
// write 1 character
pipebuffer[0] = buf[0];
pb_len = 1;
current->p_registers.reg_rax = 1;
}
break;
}
A similar change can be made in SYS_PIPEREAD
. The result is many
fewer system calls per write and for the first read:
31 syscalls: wrote I am a plumber.
31 syscalls: read I am a plumber.
60 syscalls: wrote Hello, friend!
72422 syscalls: read Hello, friend!
91 syscalls: wrote I am a plumber.
145328 syscalls: read I am a plumber.
Improvement 2: Buffer size
We can’t reduce the number of system calls per write much further without a larger pipe buffer. Linux uses a 65536-character pipe buffer by default. We don’t need that big a buffer; let’s start with a 16-character buffer:
static char pipebuffer[16];
static size_t pb_len; /* always ≥0 and ≤16 */
But coding this buffer isn’t trivial. We want “sliding window” of at most 16 characters. An initial attempt might work like this, with each read moving any remaining characters down to the beginning of the buffer:
pipebuffer pb_len
1. empty buffer ................ 0
2. write("A", 1) A............... 1
3. write("Hello", 5) AHello.......... 6
4. read(buf, 2) ello............ 4
5. write("What", 4) elloWhat........ 8
6. write("Abracadabra", 11) [returns 8]
elloWhatAbracada 16
7. read(buf, 12) cada............ 4
8. read(buf, 12) [returns 4]
................ 0
This move-based implementation is, however, quite expensive. In the worst case, reading N characters will do O(N2) units of work!
A smart bounded buffer implementation can read N characters with the
minimum O(N) units of work. What we need is a new variable, the
position. Our implementation will call this pb_pos
.
Infinite buffer
First off, let’s pretend that pipebuffer
is stored in an array of
infinite size. An infinite-sized array is a good match for the I/O
stream abstraction! However, we’ll store at most 16 characters in the
array at a time.
In this model, the pb_pos
variable represents the number of
characters read so far. Write system calls will add new characters to
the end of the pipebuffer and increase pb_len
. Read system calls,
however, will consume characters from the beginning of the pipe
buffer. The read system call will copy data out of the buffer and shrink
pb_len
. It will also increase pb_pos
, thus representing the
read. Pictorially, this looks like the following. Since characters
disappear from the buffer after being read, we mark the read places with
“#”:
pipebuffer pb_pos pb_len
1. empty buffer ........................ ... 0 0
2. write("A", 1) A....................... ... 0 1
3. write("Hello", 5) AHello.................. ... 0 6
4. read(buf, 2) ##ello.................. ... 2 4
5. write("What", 4) ##elloWhat.............. ... 2 8
6. write("Abracadabra", 11) [returns 8]
##elloWhatAbracada...... ... 2 16
7. read(buf, 12) ##############cada...... ... 14 4
8. read(buf, 12) [returns 4]
##################...... ... 16 0
The code for such a buffer is not too difficult to write; all we need is infinite memory.
size_t pipebuffer[∞];
size_t pb_pos, pb_len;
case INT_SYS_PIPEWRITE: { ...
} else /* sz > 0 && pb_len < 16 */ {
// write characters
size_t ncopy = sz;
if (ncopy > 16 - pb_len) {
ncopy = 16 - pb_len;
}
memcpy(&pipebuffer[pb_pos + pb_len], buf, ncopy);
pb_len += ncopy;
current->p_registers.reg_rax = ncopy;
}
break; ...
case INT_SYS_PIPEREAD: { ...
} else /* sz > 0 && pb_len > 0 */ {
// read characters
size_t ncopy = sz;
if (ncopy > pb_len) {
ncopy = pb_len;
}
memcpy(buf, &pipebuffer[pb_pos], ncopy);
pb_pos += ncopy;
pb_len -= ncopy;
current->p_registers.reg_rax = ncopy;
}
break; ...
Unlike the inefficient implementation above, which touches all characters in the buffer on read, this implementation only touches the number of characters actually read. That’s key to achieving O(N) read performance.
Some things to note: Both functions use all four components of the pipe
state (the buffer, the capacity [here, 16 bytes], the position, and
the length). Write puts data at the end of the buffer, while read takes
data from the beginning. The buffer starts at index pb_pos
.
Circular bounded buffer
But of course we don’t have infinite memory. And we don’t need infinite
memory—the buffer has at most 16 characters at a time. The insight
behind the circular bounded buffer implementation technique is that
modular arithmetic can easily model this infinite array. We map the
infinite array onto a finite array with 16 characters by storing
infinite-array index i
at character position i % 16
. Since the
buffer contains at most 16 contiguous characters at a time, collisions
never occur: every valid pipe buffer uses each character position at
most once. Note, here, how the active characters are always at different
positions:
pipebuffer positions (mod 16):
0 1 2 3 4 5 6 7 8 9101112131415 0 1 2 3 4 5 6 7 8 ...
1. empty buffer . . . . . . . . . . . . . . . . . . . . . . . . . ...
2. write("A", 1) A . . . . . . . . . . . . . . . . . . . . . . . . ...
3. write("Hello", 5) A H e l l o . . . . . . . . . . . . . . . . . . . ...
4. read(buf, 2) # # e l l o . . . . . . . . . . . . . . . . . . . ...
5. write("What", 4) # # e l l o W h a t . . . . . . . . . . . . . . . ...
6. write("Abracadabra", 11) [returns 8]
# # e l l o W h a t A b r a c a d a . . . . . . . ...
7. read(buf, 12) # # # # # # # # # # # # # # c a d a . . . . . . . ...
However, this technique does make reading and writing more complex,
because a single write or read may take up to 2 calls to memcpy
. In
our running example, write("Abcracadabra")
must perform two
memcpy
calls, one to put Abraca
at the end of the buffer and one
to put da
at the beginning. Similarly, the last read
must first
read ca
from the end of the buffer and then read da
from the
beginning.
Our circular bounded buffer code ensures that 0 ≤ pb_pos \< 16
at
all times. Rather than if
statements to distinguish different copy
amounts, we use a loop; the result is cleaner. Here’s the write code:
// write characters
size_t pos = 0;
while (pos < sz && pb_len < sizeof(pipebuffer)) {
size_t pb_index = (pb_pos + pb_len) % sizeof(pipebuffer);
size_t ncopy = sz - pos;
if (ncopy > sizeof(pipebuffer) - pb_index) {
ncopy = sizeof(pipebuffer) - pb_index;
}
if (ncopy > sizeof(pipebuffer) - pb_len) {
ncopy = sizeof(pipebuffer) - pb_len;
}
memcpy(&pipebuffer[pb_index], &buf[pos], ncopy);
pos += ncopy;
pb_len += ncopy;
}
current->p_registers.reg_rax = pos;
And the read code:
// read characters
size_t pos = 0;
while (pos < sz && pb_len > 0) {
size_t ncopy = sz - pos;
if (ncopy > sizeof(pipebuffer) - pb_pos) {
ncopy = sizeof(pipebuffer) - pb_pos;
}
if (ncopy > pb_len) {
ncopy = pb_len;
}
memcpy(&buf[pos], &pipebuffer[pb_pos], ncopy);
pb_pos = (pb_pos + ncopy) % sizeof(pipebuffer);
pb_len -= ncopy;
pos += ncopy;
}
current->p_registers.reg_rax = pos;
Things to note:
- Write sets
ncopy
tomin(sz - pos, sizeof(pipebuffer) - pb_index, sizeof(pipebuffer) - pb_len)
. Make sure you understand the three branches of thismin
. The first branch,sz - pos
, limits the write to the number of characters requested by the user. The second branch,sizeof(pipebuffer) - pb_index
, handles circular writes—where a write would fall off the end of thepipebuffer
array. The third branch,sizeof(pipebuffer) - pb_len
, limits the size of the buffer. Similar logic applies to read. - Make sure you understand what
pb_index
is doing in the write case, and why the read case doesn’t need an equivalent.
The results:
1 syscalls: wrote I am a plumber.
1 syscalls: read I am a plumber.
2 syscalls: wrote Hello, friend!
80506 syscalls: read Hello, friend!
3 syscalls: wrote I am a plumber.
155472 syscalls: read I am a plumber.
6 syscalls: wrote I'm talking through a pipe.
231383 syscalls: read I'm talking through a pipe.
7 syscalls: wrote I am a plumber.
307154 syscalls: read I am a plumber.
One system call per write! (Except for the fourth message, which is longer than 16 characters.)
Improvement 3: Blocking
But reading still takes many, many system calls. During the pipewriter’s delay, the reader just polls the empty pipe, waiting for data to appear. This is an efficiency problem (or would be, if the OS had anything else to do).
To solve this problem, we need some way to put the reader process to sleep—and then, critically, we must wake it up when data arrives on the pipe.
Real operating systems solve this problem with data structures called wait queues. A wait queue can be associated with any condition on which a process can block. When a process makes a blocking system call, the OS finds the wait queue associated with the blocking condition, and enqueues the blocked process on that queue. When the condition is satisfied, the OS unqueues the blocked processes and lets them run again.
For example, every process has a “wait queue” corresponding to the
waitpid
system call. If the parent calls blocking waitpid()
, the
parent will be enqueued on the relevant childrens’ wait queues. When a
child dies, the operating system signals its wait queue. This does
nothing if the queue is empty. But if the queue is nonempty, then the
parent is blocked waiting for the child’s status. The kernel, working on
behalf now of the parent, will mark the parent as runnable, remove the
parent from all other wait queues, and (eventually) complete its
waitpid()
system call.
In our OS, though, we don’t need a general wait queue structure. There’s only one process that needs to block—the pipereader. So we can implement the “wait queue” as a single integer: the process (if any) blocked on read.
// PID of process waiting to read
static pid_t pb_waiter;
When a read system call has nothing to read, we enqueue the calling process on the wait queue:
... else if (pb_len == 0) {
// nothing to read, try again later
current->p_registers.reg_rax = -1;
pb_waiter = current->p_pid;
current->p_state = P_BLOCKED;
schedule();
} ...
And when a writer successfully writes some data, it must signal the wait queue:
// write characters ...
current->p_registers.reg_rax = pos;
if (pos > 0 && pb_waiter > 0) {
processes[pb_waiter].p_state = P_RUNNABLE;
pb_waiter = 0;
}
The result:
1 syscalls: wrote I am a plumber.
1 syscalls: read I am a plumber.
2 syscalls: wrote Hello, friend!
3 syscalls: read Hello, friend!
3 syscalls: wrote I am a plumber.
5 syscalls: read I am a plumber.
6 syscalls: wrote I'm talking through a pipe.
9 syscalls: read I'm talking through a pipe.
7 syscalls: wrote I am a plumber.
11 syscalls: read I am a plumber.
One system call per write, two per read (for short messages)!
(Comprehension question: What would go wrong if you left off the
current-\>p_registers.reg_rax = 1;
line from the read system call
implementation above?)
Synchronization
Let’s step back for a moment and reconnect with synchronization. WeensyOS kernel synchronization is built on the foundation of interrupt disabling. WeensyOS configures the hardware to temporarily disable interrupts whenever a system call is running. When all system calls are polling system calls, this suffices to implement system call atomicity. But once we introduce blocking, things get more complicated. The kernel must be aware of multiple processes at once—a system call made by one process can affect the state of another. That’s the essence of synchronization.
Improvement 4: Kernel contexts
The pipereader makes at least two system calls per message (after the first). In our simple wakeup code, even though the reader blocks until there’s data in the pipe, the unblocked reader is set up to immediately return -1 to the process. But we can fix this too. The writer can operate on behalf of the reader, effectively switching to the reading process’s context to restart the read system call!
Our implementation of this uses a helper function, complete_read
,
which is called by both the write and read system call
implementations. The writer’s call looks like this:
if (pos > 0 && pb_waiter > 0) {
processes[pb_waiter].p_state = P_RUNNABLE;
complete_read(&processes[pb_waiter],
(char*) processes[pb_waiter].p_registers.reg_rdi,
(size_t) processes[pb_waiter].p_registers.reg_rsi);
pb_waiter = 0;
}
The result is one system call per write or read.
1 syscalls: wrote I am a plumber.
1 syscalls: read I am a plumber.
2 syscalls: wrote Hello, friend!
2 syscalls: read Hello, friend!
3 syscalls: wrote I am a plumber.
3 syscalls: read I am a plumber.
5 syscalls: wrote I'm talking through a pipe.
5 syscalls: read I'm talking through a pipe.
6 syscalls: wrote I am a plumber.
6 syscalls: read I am a plumber.
7 syscalls: wrote Hello, friend!
7 syscalls: read Hello, friend!
Extensions
- How could you handle a blocked writer? (You’d need another wait queue.)
- How could you handle a pipe that could be read by more than one process? (You’d need a different wait queue structure—an integer would not suffice.)
Multithreaded bounded buffers
We now take the bounded buffer out of the WeensyOS context and move it into a situation with true multiprocessing, and thus, more epic synchronization problems.
Imagine two or more processors—say, multiple cores in a kernel, or
multiple threads in a user-level process—that are sharing the same
bounded buffer memory: the same pipebuffer
, pb_sz
(that is,
sizeof(pipebuffer)`), `pb_pos`, and `pb_len`. The `pipebuffer
and pb_sz
variables are constant: the address of the buffer memory
and the size of the buffer never change. However, the contents of the
buffer, and the pb_pos
and pb_len
variables, change on every
operation.
Our goal is to provide operation atomicity for pipe operations. Every read should read a contiguous range of characters from the pipe. Every write should write a contiguous range of characters to the pipe. And, finally, no written characters should be dropped: every successfully-written character can be read.
How can an implementation ensure correctness, including operation atomicity, even though these variables are being accessed by different processors simultaneously?
Synchronization problems
Concretely, imagine a 16-character buffer with the following write implementation:
...
while (pos < sz && pb_len < pb_sz) {
size_t pb_index = (pb_pos + pb_len) % pb_sz;
size_t ncopy = /* calculate ncopy */;
memcpy(&pipebuffer[pb_index], &buf[pos], ncopy);
pb_len += ncopy;
pos += ncopy;
}
...
And imagine the following write calls being executed on different processors, starting from an empty buffer:
1. pipewrite("AA", 2);
2. pipewrite("BB", 2);
These operations on this state should lead to one of two outcomes, a
buffer containing AABB
or a buffer containing BBAA
. But without
further synchronization, we can get illegal outcomes.
For example, we can get ABBA
—an outcome that violates the atomicity
constraint on write. Imagine the following series of events:
- Initially,
pb_pos == 15
andpb_len == 0
. - The
AA
writer runs through the loop once. It copies one character into the buffer, forpb_len == 1
and contentsA
. - The
BB
writer runs through the loop once. It copies two characters into the buffer, forpb_len == 3
and contentsABB
. This completes theBB
write (since all characters were written). - The
AA
writer runs through the loop a second time. It copies the remaining character into the buffer.
And we can get AA
followed by two characters of garbage.
- Initially,
pb_pos == 0
andpb_len == 0
. - The
BB
writer runs through the loop once, but stops before updatingpb_len
. That copies two characters into the buffer, for contentsAA
—butpb_len
is still 0. - The
AA
writer runs through the loop. That overwrites theB
s, leaving forpb_len == 2
and contentsAA
. - The
BB
writer completes the loop. This setspb_len == 4
—but the 3rd and 4th characters in the buffer are garbage!
We can even get AA
on its own—the BB
write disappears
altogether. This is because the line pb_len += ncopy
is actually
implemented in multiple steps.
- Initially,
pb_pos == 0
andpb_len == 0
. - The
BB
writer runs through the loop once, but stops before updatingpb_len
. That copies two characters into the buffer, for contentsAA
—butpb_len
is still 0. - The
BB
writer loads the old value ofpb_len
(0) into a register. - The
AA
writer runs through the loop. That overwrites theB
s, leaving forpb_len == 2
and contentsAA
. - The
BB
writer completes the loop. It first increments the register holding the old version ofpb_len
. But that sets the register to 2, not 4, because theBB
writer missed theAA
writer’s update. It then writes the register out to memory, settingpb_len == 2
.
To fix this, we need a new property: mutual exclusion.
Mutual exclusion in general
Mutual exclusion is the foundation of atomicity. Its most general definition is this:
Mutual exclusion holds when no more than one thread of execution can simultaneously execute in a region of code with respect to a region of shared storage.
Here, a “thread of execution” refers to a generic instruction pointer: a user-level thread, a process, or a processor. A “region of code” refers to instructions; we might refer to one or more C functions, in whole or in part, or to machine instructions. Finally, a “region of shared storage” refers abstractly to shared memory accessible to the threads of execution. When memory is isolated (e.g., in isolated processes), mutual exclusion generally becomes less important. But we refer to “storage,” rather than shared memory, because in some synchronization situations, we need mutual exclusion with respect to other objects, such as files or pipe buffers.
When mutual exclusion holds, the restricted region of code is called the critical section. When we’re trying to be precise, we call it the induced critical section, since it is enforced by whatever mechanism was used to ensure mutual exclusion. That is, a critical section is a region of code subject to mutual exclusion. The induced critical section may differ from the minimal sufficient critical section, which is the smallest critical section that suffices for correctness (for instance, the smallest critical section required to enforce pipe operation atomicity).
If mutual exclusion holds, and one thread’s instruction pointer is located in the critical section, then no other thread can enter the critical section until the first thread’s instruction pointer leaves it.
Mutual exclusion can be enforced by many different mechanisms. For
example, if a process has only one thread, then its entire code is an
induced critical section, since no other thread can enter the process
with respect to its memory! (Note that even fork
does not violate
this, since fork
creates a new, isolated process with different
memory.) The WeensyOS kernel is a critical section, whose mutual
exclusion is enforced by disabling interrupts on kernel entry (and by
running WeensyOS only on uniprocessor machines).
Sufficient and insufficient critical sections
What critical sections suffice to enforce pipe operation atomicity?
First, let’s return to the pipewrite operation. Say that, through magic, we were able to enforce mutual exclusion on the starred lines:
...
while (pos < sz && pb_len < pb_sz) {
size_t pb_index = (pb_pos + pb_len) % pb_sz;
size_t ncopy = /* calculate ncopy */;
*** memcpy(&pipebuffer[pb_index], &buf[pos], ncopy); ***
*** pb_len += ncopy; ***
pos += ncopy;
}
...
That is, when one thread begins executing memcpy
, no other threads
will execute memcpy
until the first thread updates pb_len
. Does
this induced critical section suffice?
This critical section does prevent some bugs. For instance, it prevents
the bug leading to a 2-character AA
buffer. Since the highlighted
critical section includes the pb_len += ncopy
line, it is impossible
for multiple threads of execution to update pb_len
simultaneously,
and every pb_len
update will see its most up-to-date value. However,
there are other bugs that it does not prevent. For instance, it doesn’t
prevent the AA[garbage]
bug.
What about this larger critical section?
while (pos < sz && pb_len < pb_sz) {
*** size_t pb_index = (pb_pos + pb_len) % pb_sz; ***
*** size_t ncopy = /* calculate ncopy */; ***
*** memcpy(&pipebuffer[pb_index], &buf[pos], ncopy); ***
*** pb_len += ncopy; ***
*** pos += ncopy; ***
}
That prevents AA
, and also prevents AA[garbage]
, but it does
not prevent ABBA
. To prevent all write/write synchronization errors,
we must include the whole while loop in the critical section:
*** while (pos < sz && pb_len < pb_sz) { ***
*** size_t pb_index = (pb_pos + pb_len) % pb_sz; ***
*** size_t ncopy = /* calculate ncopy */; ***
*** memcpy(&pipebuffer[pb_index], &buf[pos], ncopy); ***
*** pb_len += ncopy; ***
*** pos += ncopy; ***
*** } ***
No smaller critical section suffices.
Note that critical sections are not always contiguous ranges of code. For instance, a corresponding range of code in the pipe read operation’s implementation would be part of the same sufficient critical section. This is because the read and write code both access and modify the same shared state (here, the pipe buffer).
Locks
Mutual exclusion is often enforced, and critical sections induced, by synchronization objects, which are libraries designed as building blocks for synchronization.
The most basic synchronization object is the mutual exclusion lock,
or the lock for short. A lock is a small piece of state (often an
integer) associated with two operations, lock
and unlock
. The
lock has two states, locked
and unlocked
; it starts in
unlocked
state. The two operations, lock
and unlock
, behave
like the following:
void lock(lock_type* l) {
// The following executes in
one atomic step
while (l is in locked state) {
/* do nothing */
}
*l := locked;
}
void unlock(lock_type* l) {
assert(l is in locked state);
*l := unlocked;
}
Sometimes a third operation, trylock
, is provided:
int trylock(lock_type* l) {
if (l is in locked state) {
return 0;
} else {
*l := locked;
return 1;
}
}
Lock operations directly map onto the requirements of mutual exclusion. If:
- Every entry into a region of code is guarded by a
lock
operation on a shared lock; - The body of the region doesn’t modify the lock; and
- Every exit from the region is guarded by
unlock
,
then that region of code obeys mutual exclusion. For instance, we can provide mutual exclusion for our pipe’s minimal critical section like so:
static lock_type pb_lock;
...
lock(&pb_lock);
while (pos < sz && pb_len < pb_sz) {
size_t pb_index = (pb_pos + pb_len) % pb_sz;
size_t ncopy = /* calculate ncopy */;
memcpy(&pipebuffer[pb_index], &buf[pos], ncopy);
pb_len += ncopy;
pos += ncopy;
}
unlock(&pb_lock);
This provides mutual exclusion by construction. The only entry to the
region is via the lock()
line. If any thread gets past the
lock()` line, then every *other* thread must block in `lock()
until the first thread calls unlock()
.