Show all solutions
Notes
Assume a Linux operating system running on the x86-64 architecture unless otherwise stated.
If you get stuck, move on to the next question. When unsure, briefly explain your thinking for potential partial credit.
1. Our governing body (15 points)
The Harvard Corporation has attempted to complete pset 6. Their implementation has a lot of problems, but they refuse to admit they’ve done anything wrong. Help prove that they have problems by attacking their implementation.
QUESTION 1A. Phase 1 of the problem
set advises students to
implement the io61_try_lock
, io61_lock
, and io61_unlock
functions “using
a single std::recursive_mutex
per file.” The Harvard Corporation likes to go
its own way, and instead implemented their own coarse-grained lock using
atomics. Here are the relevant excerpts of their code:
struct io61_file { ...
std::atomic<int> lock;
};
int io61_try_lock(io61_file* f, off_t off, off_t len, int locktype) {
if (len == 0
|| f->lock.exchange(1) == 0) {
return 0;
}
return -1;
}
int io61_lock(io61_file* f, off_t off, off_t len, int locktype) {
while (io61_try_lock(f, off, len, locktype) != 0) {
}
return 0;
}
int io61_unlock(io61_file* f, off_t off, off_t len) {
if (len != 0) {
f->lock = 0;
}
return 0;
}
List the true statements.
- This implementation has undefined behavior caused by data races on the
f->lock
field. - This implementation is blocking (rather than polling).
- This implementation correctly provides mutual exclusion (i.e., while one
thread has acquired an exclusive range lock using
io61_lock(f, off, len, LOCK_EX)
, no other thread can acquire an overlapping range lock on the same file). - The
f->lock
field implements a recursive mutex. - None of the above.
QUESTION 1B. Phase 1 is tested using the ftxxfer
program.
When the Harvard Corporation runs ./ftxxfer
on their code, they observe
deterministic behavior—the exact same result every time, regardless of
scheduling order, random number choice, or number of threads. What result to
they see and why? Explain briefly.
QUESTION 1C. The Corporation decides to forge ahead to Phase 2, fine-grained locking. They want to implement fine-grained locking with an array of atomic lock bytes, one lock byte per byte of file data.
struct io61_file { ...
off_t size; // size of file
std::atomic<unsigned char>* locks; // array of `size` lock bytes
};
int io61_try_lock(io61_file* f, off_t off, off_t len, int locktype) {
for (off_t x = off; x < off + len; ++x) {
if (f->locks[x].exchange(1) == 1) {
return -1;
}
}
return 0;
}
int io61_lock(io61_file* f, off_t off, off_t len, int locktype) {
/* ... same as above ... */
}
int io61_unlock(io61_file* f, off_t off, off_t len) {
for (off_t x = off; x < off + len; ++x) {
f->locks[x] = 0;
}
}
List the true statements; assume that f->size
and f->locks
are initialized
correctly.
- This implementation uses fine-grained locking.
- This implementation behaves correctly when
len == 0
. - This implementation has the same behavior on
./ftxxfer
as the previous implementation. - An application can cause undefined behavior by passing unexpected arguments
to
io61_try_lock
. (Expected arguments always obeyoff >= 0
,off <= f->size
,len >= 0
, andoff + len <= f->size
.) - None of the above.
QUESTION 1D. The Corporation’s io61_try_lock
can behave incorrectly even
when passed expected arguments (off >= 0 && off <= f->size && len >= 0 && off + len <= f->size
). Correct the code, or, for partial credit, describe what’s
wrong with the Corporation’s version.
QUESTION 1E. In some cases, even the unfixed io61_try_lock
from
Question 1C always behaves correctly. Write an assertion involving off
,
len
, and f
that defines those cases. For full credit your assertion should
be as broad as possible.
QUESTION 1F. The Corporation moves on to phase 3, blocking. It’s gotten this far (not far):
std::mutex file_mutex;
struct io61_file { ...
off_t size; // size of file
std::atomic<unsigned char>* locks; // array of `size` lock bytes
std::condition_variable_any cv;
};
int io61_lock(io61_file* f, off_t off, off_t len, int locktype) {
while (io61_try_lock(f, off, len, locktype) != 0) {
}
return 0;
}
void io61_unlock(io61_file* f, off_t off, off_t len) {
for (off_t x = off; x < off + len; ++x) {
f->locks[x] = 0;
}
f->cv.notify_all();
}
Change the Corporation’s io61_lock
to correctly block. Do not add any
synchronization objects. You may assume that all previous problems with
io61_try_lock
have been fixed.
QUESTION 1G. Briefly describe one way in which the Corporation’s synchronization plan uses fine-grained locking, and one way in which it uses coarse-grained locking.
2. Server synchronization (10 points)
A web server is a process that accepts connections from clients, reads data from those connections, performs computations on that data, and sends responses back on those connections. This isn’t unlike the weensy database we saw in the synchronization unit. A dirt-simple server might look like this:
int main() {
while (true) {
int fd = accept_incoming_request(); // accept a new connection, return its socket file descriptor
handle_client(fd); // read data, write response
close(fd);
}
return 0;
}
In this code, both accept_incoming_request()
and handle_client()
can
block.
QUESTION 2A. How many concurrent connections can the dirt-simple server handle at a time?
QUESTION 2B. Change the main
loop to handle each connection in a new
child process. This is the multi-process model commonly seen in early web
servers. We’ve started the code for you; you may assume that fork()
always
succeeds.
int main() {
while (true) {
int fd = accept_incoming_request();
// YOUR CODE HERE, including at least one `fork()`
// clean up zombies
int status;
while (waitpid(-1, &status, WNOHANG) > 0) {
}
}
}
QUESTION 2C. Change the main
loop to handle each connection in a separate
thread. This is the multi-threaded model also often seen in early web
servers. We’ve started the code for you.
Hint: Remember that in some situations it is illegal to for a
std::thread
object to go out of scope!
void handle_client_and_close(int fd) {
handle_client(fd);
close(fd);
}
int main() {
while (true) {
int fd = accept_incoming_request();
// YOUR CODE HERE
}
}
QUESTION 2D. True or false? An advantage of the multi-process model is that
bugs in handle_client
will only crash the child process handling a single
connection. In contrast, in the multi-threaded model, a bug in handle_client
will often crash the whole server.
QUESTION 2E. True or false? An advantage of the multi-threaded model is that different connections can easily share state. In contrast, if the weensy database used a simple multi-process model, then changes made by one client would be hidden from all other clients.
QUESTION 2F. A simple multi-threaded server can create an unbounded number of threads, which is dangerous. In this version, we create a thread pool instead. The main thread creates 30 worker threads and passes incoming connections off to these threads; at most 30 connections can be handled concurrently.
int incoming_fd = -1;
void worker() {
while (true) {
// poll for connection, then claim it
int fd = incoming_fd;
while (fd < 0) {
fd = incoming_fd;
}
incoming_fd = -1;
// handle claimed connection
handle_client(fd);
close(fd);
}
}
int main() {
for (int i = 0; i != 30; ++i) {
std::thread th(worker);
th.detach();
}
while (true) {
incoming_fd = accept_incoming_request();
// wait until some worker claims the connection
while (incoming_fd >= 0) {
}
}
}
Does this code violate the Fundamental Law of Synchronization (that is, does it have any undefined behavior due to data races)? Explain briefly.
QUESTION 2G. Describe how to add correct synchronization to this thread-pool
server. List specific synchronization objects you’d add, and explain briefly
how they should be used in worker
and main
.
3. System calls (10 points)
QUESTION 3A. Which of the following system calls are likely to block? List all that apply.
read
on a pipe whose pipe buffer is emptywrite
to a pipe whose pipe buffer is fullfork
when there are no available process slotswaitpid
when the current process has no children yet- None of the above
QUESTION 3B. Which of the following statements about execv
are true? List
all that apply.
- It must always be paired with
fork
- It does not return unless it encounters an error
- It replaces the current process’s program image with a new program
- It closes the current process’s file descriptors except for standard input, standard output, and standard error
- None of the above
QUESTION 3C. When will the return value of a read
system call equal 0? List
all that apply.
- When the byte of data at the file position equals
'\0'
- When the file descriptor is the read end of a pipe and either the pipe buffer is empty or the write end of the pipe is closed
- When the file descriptor is the read end of a pipe and both the pipe buffer is empty and the write end of the pipe is closed
- When the file descriptor is the write end of a pipe
- When the
read
system call is interrupted by a signal before it can read any data - None of the above
QUESTION 3D. In class,
we explored some racer
programs whose
goal is to wait until a child process exits or a timeout has elapsed,
whichever comes first. The racer-block
program attempted to use system call
interruption to solve this problem. Here’s a version of its code, compressed
to omit error checking.
static volatile sig_atomic_t got_signal = 0;
void signal_handler(...) {
S1: got_signal = 1;
}
int main() {
M1: set_signal_handler(SIGCHLD, signal_handler);
M2: pid_t p = fork();
M3: if (p == 0) { /* child process code */ }
M4: usleep(timeout_in_microseconds);
M5: if (elapsed_time_in_microseconds() >= timeout_in_microseconds) {
M6: printf("SLOW\n"); /* child did not exit before timeout */
M7: } else {
M8: printf("quick\n"); /* child exited before timeout */
M9: }
}
Which line contains a system call that the programmer expects to be interrupted whenever a child process exits before the timeout?
QUESTION 3E. In what circumstance might this program print an incorrect
result (SLOW
when the child exited meaningfully before the timeout, or
quick
when the child exited meaningfully after the timeout)? Explain
briefly, referring to specific lines of code.
QUESTION 3F. The racer
race condition can be solved by designing new system
calls. In this question, we add system calls that can temporarily mask
signals, causing them to be delayed until a future time. The system calls are:
-
sigmask(int signum)
Mask signalsignum
. If the process is sent this signal, the kernel records the signal as pending, but does not interrupt the process. -
sigunmask(int signum)
Unmask signalsignum
and immediately deliver any pendingsignum
to the process. -
pusleep(useconds_t timeout)
Unmask all signals and sleep fortimeout
. This happens effectively atomically, with no sleep–wakeup race conditions, so if the process has a pending signal whenpusleep
is called, the kernel immediately delivers the pending signal andpusleep
returns with error codeEINTR
. The original set of masked signals is restored before return.
Use these system calls to fix the race condition in the racer-block
code
above, or explain briefly how this could be done.
4. Testing a shell (15 points)
Your shell problem set was
tested with a check.pl script
that
runs shell commands and examines their output (including both standard output
and standard error). In this problem you will reason about some of check.pl
’s
checks.
QUESTION 4A. Check REDIR17 and its expected output are:
- Command:
echo > out.txt the redirection can occur anywhere && cat out.txt
- Expected output:
the redirection can occur anywhere
Match the actual outputs, on the left, with the implementations that might generate those outputs, on the right. You might use an implementation more than once or not at all.
(When a feature is “not understood”, that means the implementation treats the
feature’s tokens like normal command arguments; for instance, an
implementation that does not understand conditionals would treat &&
tokens
like normal arguments. The actual outputs translate newlines into spaces,
which is what check.pl
does.)
-
- (empty)
> out.txt the redirection can occur anywhere && cat out.txt
> out.txt the redirection can occur anywhere cat: out.txt: No such file or directory
> out.txt the redirection can occur anywhere
-
- Conditionals incorrectly implemented, redirection correctly implemented
- Conditionals incorrectly implemented, redirection not understood
- Conditionals correctly implemented, redirection not understood
- Conditionals and redirection not understood
- None of the above
QUESTION 4B. Check PIPE5 runs the command yes | head -n 5
.
Match the actual outputs, on the left, with the implementations that might generate those outputs, on the right. You might use an implementation more than once or not at all.
-
y y y y y y y y …
foreveryes: invalid option -- 'n' Try 'yes --help' for more information.
y y y y y
, but shell hangs rather than printing another prompty y y y y
-
- Pipes correctly implemented
- Pipes not understood
- Pipes incorrectly hooked up to command standard input and/or output
- Pipes correctly hooked up to command standard input and/or output, but parent shell process does not implement pipe hygiene
- None of the above
QUESTION 4C. Check BG7 runs this command:
sh -c "sleep 0.1; /bin/echo Second" & sh -c "sleep 0.05; /bin/echo First" & sleep 0.15
List the true statements about this command.
sleep 0.15
is executed in the foreground.sleep 0.15
ensures that there is enough time for bothFirst
andSecond
to be printed before the parentsh61
returns to the prompt.sh -c "sleep 0.1; /bin/echo Second"
will finish executing beforesh -c "sleep 0.05; /bin/echo First"
begins executing.sleep 0.05
will finish executing before/bin/echo First
begins executing.- This command line should take a minimum of 0.30 seconds to execute.
- None of the above.
QUESTION 4D. How many different sh61
descendant processes (children,
children of children, etc.) will be created as sh61
executes check BG7?
Explain briefly.
QUESTION 4E. This code is from a student’s incorrect implementation of background commands.
// This function runs the conditional `sec` to completion.
void run_conditional(shell_parser sec);
void run_list(shell_parser sec) {
shell_parser condsec = sec.first_conditional();
while (condsec) {
if (condsec.op() != TYPE_BACKGROUND) {
run_conditional(condsec);
} else {
pid_t p = fork();
assert(p >= 0);
if (p == 0) {
run_conditional(condsec);
_exit(0);
}
[[maybe_unused]] int status;
waitpid(p, &status, 0);
}
condsec.next_conditional();
}
}
What would the associated shell print on check BG7, and how long will it take to print it? Explain briefly.
QUESTION 4F. How could the code in the previous question be fixed?
QUESTION 4G. This similar code has a different problem.
// This function runs the conditional `sec` to completion.
void run_conditional(shell_parser sec);
void run_list(shell_parser sec) {
shell_parser condsec = sec.first_conditional();
while (condsec) {
if (condsec.op() != TYPE_BACKGROUND) {
run_conditional(condsec);
} else {
pid_t p = fork();
assert(p >= 0);
if (p == 0) {
run_conditional(condsec);
}
}
condsec.next_conditional();
}
}
What will this code print on check BG7? Explain briefly.
5. Storage patterns (10 points)
QUESTION 5A. In the third lecture in Unit
1, we saw some strange
performance anomalies when sorting a list of integers via insertion sort using
an intsort
program.
What are the best explanations for the anomalies we saw? List all that apply.
- The system calls required to allocate memory are inherently expensive.
- Processor caches handle sequential memory accesses more efficiently than random memory accesses.
- Dynamically-allocated objects that are allocated in rapid sequence tend to receive addresses that are nearby in virtual address space.
- Insertion sort is an inefficient sorting algorithm.
- None of the above.
QUESTION 5B. In which situations might a stdio cache be flushed? List all that apply.
- When a new write request would overflow the configured cache size.
- When the underlying file data changed because a different process wrote it.
- When a seek request moves the file position outside the range covered by the current cache.
- When the corresponding file is closed.
- Upon encountering Bélády’s anomaly.
- None of the above.
QUESTION 5C. The remaining questions ask you to analyze strace
output and
characterize the behavior of the programs that produced those traces. Your
answer should refer to the following reference strings; an answer might say “A
for reads, B for writes” or “A then B for reads, no writes”.
Reference string | Description | |
---|---|---|
A | [0–4095], [4096-8191], [8192–12287], … | 4096-byte blocks, sequential |
B | 0, 1, 2, 3, 4, … | 1-byte blocks, sequential |
C | 0, 4096, … | 1-byte blocks, every 4096th byte |
D | …, 4, 3, 2, 1, 0 | 1-byte blocks, reverse sequential |
None | (empty) |
Trace 1:
...
read(0, "C", 1) = 1
fstat(0, {st_mode=S_IFREG|0644, st_size=8188, ...}) = 0
lseek(0, 4096, SEEK_SET) = 4096
read(0, "a", 1) = 1
lseek(0, 0, SEEK_SET) = 0
read(0, "CS 61 is an introduction to the fu"..., 4096) = 4096
read(0, "atisfies the Programming 2 and Sys"..., 4096) = 4092
...
- Which reference strings (A–D or None) best describe the reads in this trace?
- Which reference strings best describe the writes?
- Could this program be using a typical stdio cache for reads and/or writes?
QUESTION 5D. Trace 2:
...
lseek(0, 8187, SEEK_SET) = 8187
read(0, "\n", 1) = 1
write(1, "\n", 1) = 1
lseek(0, 8186, SEEK_SET) = 8186
read(0, "t", 1) = 1
write(1, "t", 1) = 1
lseek(0, 8185, SEEK_SET) = 8185
read(0, "c", 1) = 1
write(1, "c", 1) = 1
lseek(0, 8184, SEEK_SET) = 8184
read(0, "\205", 1) = 1
write(1, "\205", 1) = 1
lseek(0, 8183, SEEK_SET) = 8183
read(0, ";", 1) = 1
write(1, ";", 1) = 1
lseek(0, 8182, SEEK_SET) = 8182
read(0, "k", 1) = 1
write(1, "k", 1) = 1
lseek(0, 8181, SEEK_SET) = 8181
read(0, "e", 1) = 1
write(1, "e", 1) = 1
...
- Which reference strings (A–D or None) best describe the reads in this trace?
- Which reference strings best describe the writes?
- Could this program be using a typical stdio cache for reads and/or writes?
QUESTION 5E. Trace 3:
...
read(0, "CS 61 is an introduction to the fu"..., 4096) = 4096
read(0, "may be used as one of the four CS "..., 4096) = 4094
write(1, "CS 61 is an introduction to the fu"..., 4096) = 4096
read(0, "", 4096) = 0
write(1, "may be used as one of the four CS "..., 4094) = 4094
- Which reference strings (A–D or None) best describe the reads in this trace?
- Which reference strings best describe the writes?
- Could this program be using a typical stdio cache for reads and/or writes?
6. Kernel (10 points)
QUESTION 6A. List the true statements about user processes on x86-64 operating systems.
- A user process cannot read or write memory containing kernel code or data unless the kernel explicitly allows it.
- When a user process executes an invalid instruction, such as dividing by zero, the kernel receives control through an exception and can decide to terminate the process.
- The
%cs
register’s lowest 2 bits determine the privilege level of a process, and user processes operate at the highest privilege level((%cs & 3) == 0)
. - A user process can execute the
syscall
instruction to safely raise the processor’s privilege level and transfer control to the kernel. - User processes can configure the computer’s timer interrupt frequency to optimize their own CPU usage.
- If a user process enters an infinite loop, timer interrupts ensure that the kernel regains control and can schedule other processes.
- None of the above.
QUESTION 6B. List the true statements about system calls on WeensyOS.
- System calls have a different calling convention than normal function calls.
- The
syscall
instruction transfers control to the kernel, which starts executing in privileged mode at the virtual address defined by the%rdx
register. - User processes pass arguments to the kernel’s system call implementation by
loading those arguments into registers before executing the
syscall
instruction. - The kernel can return results from a system call by loading them into
specific registers in
current->regs
. - The kernel can return results from a system call by executing a
return
statement from itssyscall
function. - None of the above.
QUESTION 6C. The next three questions present partially-obscured
address-space loops using vmiter
objects. The original loops were taken from
correct and complete code on the WeensyOS problem
set.
vmiter it1(/* some page table */, 0);
vmiter it2(/* some page table */, 0);
for (; it1.va() < PROC_START_ADDR && it2.va() < PROC_START_ADDR; it1 += PAGESIZE, it2 += PAGESIZE) {
int r = it2.try_map(it1.pa(), it1.perm());
if (r < 0) {
/* do something sensible */
}
}
List the true statements about this loop.
- A loop like this could be used in
kernel_start
to createkernel_pagetable
, an identity-mapped page table. - A loop like this could be used in
process_setup
to initialize some or all of an initial process address space. - A loop like this could be used in a
SYSCALL_FORK
implementation to initialize some or all of a child process’s address space. - A loop like this could be used in a
SYSCALL_EXIT
implementation to free process memory. - Within the loop body,
it1.va() == it2.va()
. - None of the above.
QUESTION 6D.
vmiter it(/* some page table */, 0);
for (; it.va() < MEMSIZE_PHYSICAL; it += PAGESIZE) {
int perm;
if (it.va() == 0) {
perm = 0;
} else if (it.va() < PROC_START_ADDR && it.va() != CONSOLE_ADDR) {
perm = PTE_P | PTE_W;
} else {
perm = PTE_P | PTE_W | PTE_U;
}
int r = it.try_map(it.va(), perm);
if (r < 0) {
/* do something sensible */
}
}
List the true statements about this loop.
- A loop like this could be used in
kernel_start
to createkernel_pagetable
, an identity-mapped page table. - A loop like this could be used in
process_setup
to initialize some or all of an initial process address space. - A loop like this could be used in a
SYSCALL_FORK
implementation to initialize some or all of a child process’s address space. - A loop like this could be used in a
SYSCALL_EXIT
implementation to free process memory. - Within the loop body,
it.va() < MEMSIZE_VIRTUAL
. - None of the above.
QUESTION 6E.
vmiter it1(/* some page table */, 0);
vmiter it2(/* some page table */, 0);
for (; it1.va() < MEMSIZE_VIRTUAL && it2.va() < MEMSIZE_VIRTUAL; it1 += PAGESIZE, it2 += PAGESIZE) {
if (!it1.user() || !it1.writable() || it1.va() == CONSOLE_ADDR) {
/* do something sensible */
} else {
/* do something sensible */
}
}
List the true statements about this loop.
- A loop like this could be used in
kernel_start
to createkernel_pagetable
, an identity-mapped page table. - A loop like this could be used in
process_setup
to initialize some or all of an initial process address space. - A loop like this could be used in a
SYSCALL_FORK
implementation to initialize some or all of a child process’s address space. - A loop like this could be used in a
SYSCALL_EXIT
implementation to free process memory. - None of the above.
7. Assembly (10 points)
Intel is trying to sell the next generation of its Fartium Core Trio processor. On this still-terrible processor, the only instructions that work are the ones on this list:
addq %rsi, %rax
cmpq %rsi, %rdi
decq %rsi
incq %rax
je L1
jl L2
jmp L3
leaq (%rax,%rdi,2), %rsi
movl (%rdi,%rax,4), %edi
retq
xchgq %rax, %rdi
xorq %rax, %rax
The instructions, registers, and arguments must match exactly. For example,
addq %rsi, %rax
works fine, but addq %rdi,
%rax
will explode the processor.
QUESTION 7A. Which Fartium Core Trio instructions examine the condition flags (i.e., behave differently based on condition flags)? List all that apply (by number or name) or say “none.”
QUESTION 7B. Which Fartium Core Trio instructions read primary memory (not including memory used to store instructions)? List all that apply or say “none.”
QUESTION 7C. In the remaining questions, your job is to correctly
implement C++ functions using only the instructions on this list. Your
implementations must have the same semantics as the C++ functions, but may
perform much worse than one might expect. Your response can be a list of
instruction numbers (“2, 11, 9, 10”) or names (“cmpq; xchgq; movl; retq
”),
intermixed with labels L1
–L3
as necessary. Assume that the Fartium Core
Trio uses the normal x86-64 calling convention.
int return_one() {
return 1;
}
QUESTION 7D.
long return_minus_one() {
return -1;
}
QUESTION 7E.
long max(long a, long b) {
if (a > b) {
return a;
} else {
return b;
}
}
8. Data representation (15 points)
QUESTION 8A. Given this code:
int array[10000];
int* p1 = &array[1];
Provide initializations of a new variable, p2
, so that p2 - p1
has the
given values according to the C++ abstract machine, or say that doing so is
impossible. Your answer will comprise five initializations, one per value.
p2 - p1 == 0
p2 - p1 == 3
p2 - p1 == -3
p2 - p1
causes undefined behaviorp2 - p1
returns a value dependent on the processor architecture, but does not cause undefined behavior
QUESTION 8B. Which of the marked printf
lines in the following code
contain violations of the live object
rule? List all that
apply or say “none.”
#include <cstdio>
int globalArray[5];
int* return_pointer_to_parameter(int x) {
return &x;
}
int main() {
ptr = &globalArray[5];
A: printf("%d\n", *ptr);
B: printf("%p\n", ptr);
ptr = return_pointer_to_parameter(10);
C: printf("%d\n", *ptr);
ptr = new int(12);
D: printf("%d\n", *ptr);
delete ptr;
E: printf("%p\n", ptr);
ptr = nullptr;
F: printf("%d\n", *ptr);
return 0;
}
QUESTION 8C. The next three questions concern an implementation of pset
1 that allocates memory from
a single default_buffer
of size 256 bytes.
Write a m61_malloc
call that should always fail on this implementation.
QUESTION 8D. Write two or more m61_malloc
calls where at least one
should always fail on this implementation, even though the total
user-accessible memory requested is less than 256 bytes.
QUESTION 8E. Write a sequence of m61_malloc
and m61_free
calls where at
least one of the m61_malloc
calls should fail unless the implementation
correctly coalesces freed allocations.
QUESTION 8F. The UTF-8/Ethics lecture presented four design ideas for Unicode encodings. They were:
- Idea #1: Unary encoding
- Idea #2: Byte pairs
- Idea #3: Byte triples
- Idea #4: Self-synchronizing byte quadruplets
Which of these encodings would cause clear economic harm to some group if it were the only available text encoding, and which would cause clear allocative harm? Explain briefly.
QUESTION 8G. The UTF-8 encoding is defined as follows:
- Encode in the single byte with value ⟨⟩
- Encode in two bytes:
- Let equal bits 0–5 of , and equal bits 6–10
- Use bytes ⟨⟩ ⟨⟩
- Encode in three bytes:
- Let equal bits 0–5 of , bits 6–11, and bits 12–15
- Use bytes ⟨⟩ ⟨⟩ ⟨⟩
- Encode in four bytes:
- Let equal bits 0–5 of , bits 6–11, bits 12–17, and bits 18–20
- Use bytes ⟨⟩ ⟨⟩ ⟨⟩ ⟨⟩
Does the UTF-8 encoding resemble big-endian or little-endian integer representation?
QUESTION 8H. This function intends to encode a Unicode character u
into
UTF-8 in the character buffer buf
. Complete the missing case so that the
function works for any u
between 0 and 0xFFFF.
// Returns the number of bytes encoded
int encode_utf8(unsigned u, unsigned char* buf) {
if (u <= 0x7F) {
buf[0] = u;
return 1;
} else if (u <= 0xFFFF) {
// YOUR CODE HERE
} else {
assert(false);
}
}
9. The end (5 points)
QUESTION 9A. Is there any TF or fellow student you’d like to shout out? Any answer but no answer will receive full credit.