This problem set will teach you some useful strategies for synchronization and networking using a game called pong.
- Extended deadline: Wed 12/12 at 11:59pm for college students (1 day
later for extension).
- NO CLASSWORK WILL BE ACCEPTED AFTER THE EXTENDED DEADLINE REGARDLESS OF LATE HOURS. That is, you cannot use late hours to delay the extended deadline.
About Pong
Pong, one of the earliest arcade video games, features a ball that moves diagonally on a rectangular playing field.
In CS 61, a pong board is a rectangular grid of cells with one or more diagonally-moving balls. Each cell can contain at most one ball. Balls bounce off the edges of the playing field and off each other. Board cells can be normal or sticky; when a ball hits a sticky cell, it stops moving until dislodged by another ball.
The logic for pong boards and balls is implemented in pongboard.hh
.
The
pong_celltype
enum represents a cell type (empty, sticky, or obstacle).The
pong_cell
struct represents the state of a cell, which is a cell type and an optional ball.The
pong_board
struct represents the state of a board. Its main component iscells_
, an array ofwidth_ * height_
pong_cell
s that represents the state of the board. Cells are stored in row-major order.The
pong_ball
struct represents the state of a ball, which is a position (x_
,y_
) and direction (dx_
,dy_
). Its member functionmove()
implements the logic for ball movement, including collisions and sticky cells.move()
returnstrue
if and only if the ball moved.
Getting started
Merge our problem set 6 code into your repository with git pull handout
master
. This will merge our code with
your previous work. If you have any “conflicts” from problem set 5,
resolve them before continuing further.
Run git push
to save your work.
Please use an explicit merge to create your repository. If you copy code by hand, our automated scripts will have trouble analyzing your code, and it’ll be harder for you to incorporate our updates.
You may also create a new cs61-psets
repository for this assignment.
Tell us if you do.
Once you have merged, edit the pset6/serverinfo.h
file so that
PONG_USER
is defined to a unique string that only you know.
Note. We currently do not plan to implement authentication on the pong server. This means that, if you learn another person’s string, you can mess with their heads. Please do not abuse this system. Aggressive abuse will be reported to The Authorities.
Type make
, and then run the ./pong61
program. It will print out a
URL. Visit that URL in your browser. You should see a bouncing ball in a
rectangular field:
Goals
Your goal for this problem set is to:
Support thread safety with efficient synchronization.
Support network robustness by handling loss, delay, congestion, and unexpected inputs.
Two test programs are provided, one for each goal. The ./pong61
program
tests network robustness (and some synchronization), while the ./simpong61
program tests thread safety. These goals can be tackled
independently and in either order.
Thread safety: simpong61
The simpong61
program simulates a Pong board with multiple balls and
sticky cells. Each ball runs in its own thread.
simpong61
takes the following arguments:
-b N
: Sets the number of balls (default 24).-s N
: Sets the number of sticky cells (default 12).-d DELAY
: Sets the delay in seconds between moves for each ball (default 0).-p DELAY
: Causes the board to be printed everyDELAY
seconds.-w WIDTH
: Sets the width of the board (default 100).-h HEIGHT
: Sets the height of the board (default 31).-1
: Run all balls in a single thread for testing purposes.
Try the command ./simpong61 -d0.1 -p0.1
. Your terminal should fill up
with board pictures like this:
....................................................................................................
...................._....................................................O......O........O..........
..........O.........................................................................................
................................................................................................._..
..........................................................._........................................
........................................._..........................................................
....................................................................................................
................................_...............................................OO..................
........O..........................................O................................................
......................O.............................................................................
.............................................O......................................................
....................................................................................................
....................................................................................................
....................................................O...............................................
......_.............................................................................................
............................................................................................O.......
................................O...................................................................
....................................................................................................
.............O..................................................................._..................
....................................................................................................
....................................................................................................
..................O.......................................O....................._...................
...................................................................O................................
.................................................O..................................................
...............................................O....................................................
....................................................................................................
.........................................................._.........................................
..O.........O................................................_............._........................
..............................................O..............................O......................
...........O........................................................................................
......................_.............................................................................
You should be able to see the balls bounce off the board edges and even off each other.
Now try the same command without delay: ./simpong61
. We get an
almost immediate assertion failure!
Or try the command with delay and thread sanitization: make TSAN=1
simpong61 && ./simpong61 -d0.1
. The thread sanitizer is very unhappy and
reports many “data races.”
The handout code is not thread-safe; you must make it thread-safe. Given
any combination of arguments not including -p
:
There must be no data races.
simpong61
should run indefinitely, without reporting errors or deadlocking, with or without the thread sanitizer.There must be fine-grained parallelism. It must be possible for more than one ball to move at a time, provided the balls are in different regions of the board. (You can define “region” however you like, so long as larger boards have more regions.)
Your code must block. Specifically, in the handout code, if a ball hits a sticky cell, the thread managing that ball spins (runs without blocking) until the ball is dislodged. In your code, the thread managing a stuck ball should block until the ball is dislodged.
The signal handler that supports
-p
is inherently thread-unsafe. You don’t need to make it safe, and don’t use-p
when the thread sanitizer is on.
Hints
Our solution code is in pong_ball::move
(in pongboard.hh
) and
ball_thread
(in simpong61.cc
), with initialization code elsewhere (we
added members and/or constructor code to pong_board
and/or pong_ball
). But
you may change anything.
It is possible to avoid all data races with two or three lines of code. Do that first. Fine-grained parallelism and blocking require more work.
Think carefully about your fine-grained parallelism strategy. If you’re not careful, your code will deadlock. Consider reducing the amount of parallelism you support in favor of ease of programming.
You may assume that cell types do not change as the board evolves, so there
are no data races involving pong_cell::type_
.
The C++ std::mutex
and friends have some restrictions that may surprise you.
For example, you cannot have a std::vector
of mutex objects; instead, you
must create a plain old dynamically-allocated array using new std::mutex[SIZE]
.
Use a command such as ./simpong61 -b24 -s18 -d0.1
to test for blocking. If
your laptop fan starts running, your code is polling rather than blocking. (Or
use Linux top
or Mac OS X top -o cpu
in a terminal window; a polling
simpong61
thread will appear at the top of the list.)
Our full solutions require less than 20 lines of code.
Networking: pong61
pong61
works by sending HTTP messages to a Web server we run. HTTP is the
application protocol on which the Web is built. Read more about HTTP before
you continue. When it starts up, pong61
makes a single request
to a URL like this:
http://cs61.seas.harvard.edu:6168/PONG_USER/reset
This tells the server to reset the pong board’s state. The server clears the board and returns a simple response containing the board’s width and height:
18 23
Then pong61
makes many requests to URLs like this:
http://cs61.seas.harvard.edu:6168/PONG_USER/move?x=XPOS&y=YPOS&style=on
This request causes a new ball to appear at position (XPOS
,YPOS
).
The server responds with a numeric code and explanation. If everything
goes well, it says:
0 OK
If there’s a problem, the numeric code is negative, as in:
-1 x and y parameters must be numbers
After each request, pong61
waits 0.1 seconds before making the next
request.
Our handout code runs each HTTP request in its own thread. The main thread spins to wait for each thread to complete before going on to the next. This works just fine on Phase 0. To do the problem set, you must change the code so it works on the other phases too, and introduce synchronization along the way. Use the web page’s phase buttons to change phases.
Phase 1: Loss
In Phase 1, the server starts to lose messages. It will occasionally
go offline for a short period. During that time, every move
request is
rejected by closing down the connection. The
http_connection::receive_response_headers()
function sets conn->cstate
to
cstate_broken
and conn->status_code
to -1
when this happens, but
the pong thread ignores this problem and continues as if everything was
fine. That position in the pong trail never gets filled in. Our server
shows this mistake by drawing black marks in the spaces.
Your job in this phase is to detect lost messages and retry. When the server drops a connection, your code should close that connection (to free its resources) and make a new connection attempt at the same position. It shouldn’t move to the next position until the server responds for the current position.
However, you must be careful not to bombard the server while it is offline. The server will notice this and explode. Instead, you must implement a form of exponential backoff. This is a simple, powerful concept.
- The first time a connection attempt fails, wait for K seconds before trying again. A good initial value for K is 0.01 sec; it shouldn’t be too long—1 sec is way too large.
- If the retry also fails, wait 2K seconds before trying again.
- If that retry also fails, wait 4K seconds before trying again.
- In general, after N failed retries, wait 2NK seconds before trying again. (You may want to introduce a maximum backoff; perhaps you would wait min(2NK, 128) seconds before trying again.)
Exponential backoff is awesome because it responds to short outages quickly, but imposes only logarithmic overhead (i.e., the number of messages sent during the outage is logarithmic in the length of the outage). It’s ubiquitous: Ethernet is built on it, and the next time your Gmail goes offline, check out the numbers that appear after “Not connected. Trying again in...”.
Hint: Implement one phase at a time, always thinking how you could accomplish the task in the simplest correct way. Avoid overengineering! Our solution set implements all phases, without race conditions, in less than 60 lines of code.
Slow networks: The
pong61
server may report spurious errors if you are doing the problem set remotely or over a slow network. Use the grading server to check for errors if you think your errors are spurious.
Phase 2: Delay
In Phase 2, the server delays its responses. It will send you the full header for its response, but delay the body. Since the handout loop waits for the body before sending the next request, the pong ball will move extremely slowly in Phase 2. Too slowly, in fact: the server enforces a minimum ball speed, and when your code is slower than that speed, you’ll see some black marks on the display.
You might think solving this problem would be easy: just close the connection before the response arrives. But the server is too clever for this. If you close a connection prematurely, it explodes.
To support delay, your pong61
must handle multiple concurrent
connections to the server. Now, the main thread may need to spawn a new
thread before the response arrives! But watch out. If you leak
connections, the server will explode.
Your Phase 2 code must also work in Phase 1. We suggest you make Phase 2 work first on its own, then go back and make Phase 1 work again.
Phase 3: Utilization and Synchronization
So far, your pong61
client opens a new network connection for every
ball. This is wasteful and slow and in Phase 3 the server will not allow
it. You should instead reuse valid HTTP connections for new ball
positions.
An http_connection
object is available for reuse if and only if
conn->cstate == cstate_idle
. This means that the server sent a
complete response and is waiting for another request.
Reusing connections would be really easy—except that in Phase 3 the
server also drops some connections (as in Phase 1) and delays some
connections (as in Phase 2). Your pong61
client must handle it all,
and you must use synchronization primitives correctly.
The key function you’ll need to add is a connection table of available
connections. This can be a linked list, an array, a std::list
or
std::deque
, or whatever you’d like. When a connection reaches state
cstate_idle
, add it to the table. When you need to call a new RPC, check the
table first, and use that connection if one exists.
Make sure that you protect your connection table from concurrent access!
There should be no race condition bugs. Use synchronization objects to
handle this. You should simultaneously remove the unsafe global move_done
variable and replace it with a synchronization object.
Phase 4: Congestion
In Phase 4, the server sometimes behaves as if it is congested. A
congested server responds to a request not with 0 OK
, but with a
positive number, such as this:
+1948 STOP
This means that the server is overloaded. pong61
is not allowed to
send any more requests for (in this case) 1948 milliseconds—not even
from concurrent threads. This should give the server enough time to
cool down. The display will show a stop sign during the cool-down
period, and if pong61
ignores the stop request and sends a message
anyway, the server will explode. But after the cool-down period, the
client should go right back to sending requests. A congestion response
can sometimes be delayed, as in Phase 2; your client threads should
proceed during the delay, entering the cool-down period only once the
whole response is available.
Phases 1 through 3 are still active in Phase 4. Phase 4 may catch some race conditions in your code from Phase 3.
The best solutions will avoid all race conditions for non-delayed
responses, meaning that the main
thread will remain blocked during an
immediate congestion signal (i.e., a complete congestion signal included
in the server’s initial response). This may require changes to your
Phase 2 code.
Phase 5: Evil
Phase 5 is a mystery, but run it and you’ll figure out the problem soon enough.
Hints and advice
For full credit, your code must not suffer from race condition bugs. You’ll
need to think this through carefully, as race conditions may not show up
during normal testing. Both ./pong61
and ./simpong61
should run cleanly
with thread sanitizers. The ./pong61 -f
flag, which runs pong61
faster
than normal, might be useful as you look for race conditions.
We are only concerned with race conditions inside your client (i.e., between different client threads). We are not concerned with rare issues with scheduling between the server and the client, such as network reordering. It is impossible to avoid all client–server race conditions in this pset. But as usual, your code should never execute undefined behavior.
Extra credit
If you have extra time, implement something fun. For example, two teams could get together and implement Space Invaders (one team programming the monsters, and one team programming the spaceship)! Here are some RPCs the server implements that might be useful.
- Your fun mode should run when
pong61
is given the-n
flag (inmain
, this is indicated bynocheck == 1
). This flag turns off the server’s checking facilities. Without the-n
flag, your client should run in normal mode. - You can query the state of a cell with a
query?x=XPOS&y=YPOS
RPC. - You can set a cell to a new state for a given duration with a
move?x=XPOS&y=YPOS&style=STYLE&duration=MILLISECONDS
RPC. This only works innocheck
mode. You can also append&fade=MILLISECONDS
to the RPC to control how quickly the cell fades out (the fadeout defaults to 4 seconds). STYLE
has a number of interesting possibilities. See if you can figure out what they are!- You can query multiple cells with a
query?coords=STR
RPC, and set multiple cells’ states with amove?coords=STR
RPC.STR
should consist of X,Y coordinate pairs separated by spaces or commas. - There is also a
cmpxchg?x=XPOS&y=YPOS&expected=STYLE&style=STYLE...
RPC.
Or add features to simpong61
. Implement obstacles on the board, or paddles
(moving obstacles, each controlled by their own thread).
Turnin
You will turn in your code by pushing your git repository and informing the grading server. Inform us ASAP if you have changed partner or repository from pset 5.
Remember to fill out README.md
and AUTHORS.md
.
This pset was created for CS61.