Overview
In this last lecture, we build a correctly-synchronized networked database.
Key-value database
- A map that’s connected to the network
- Operations:
get
,set
,delete
get(K)
: Return the value stored for keyK
set(K, V)
: Change the value stored for keyK
toV
delete(K)
: Delete any value stored for keyK
- Example:
memcached
- At Facebook in 2013, “loading one of our popular pages results in an average of 521 distinct items fetched from
memcache
. … We provision hundreds ofmemcached
servers in a cluster” [ref]
- At Facebook in 2013, “loading one of our popular pages results in an average of 521 distinct items fetched from
Network API
- A network connection is a streaming connection, like a pipe, between two different computers
- Because the “pipe” is between unrelated computers, there’s no common ancestor—no “parent process” to inherit the pipe from
Clients and servers
- Server
- A program that opens a listening port that can accept new connections
- Passive
- Client
- A program that opens a new connection to a preexisting listening port on some server
- Active
WeensyDB server loop
int fd = open_listen_socket(port);
// `fd` doesn’t receive *data*—it receives *new connections*
while (true) {
struct sockaddr addr;
socklen_t addrlen = sizeof(addr);
// Accept connection on listening socket
int cfd = accept(fd, &addr, &addrlen);
// `cfd` is a new connection from the computer at `addr`
// Handle connection
handle_connection(cfd, unparse_sockaddr(&addr, addrlen));
// `handle_connection` closes `cfd` when done
}
WeensyDB internals
- Hash table
- Hash function: bucket number = first character of key
WeensyDB protocol
./weensydb01
starts servertelnet localhost 6162
(ortelnet cs61.seas.harvard.edu 6162
)
kohler@unshare$ telnet cs61 6162
Trying 34.193.34.22...
Connected to cs61.seas.harvard.edu.
Escape character is '^]'.
$get hello
END
$set hello 3 [3 = size of value]
$Hi
STORED #1
$get hello
VALUE hello 3 #1
Hi
END
$delete hello
DELETED #1
$set hello 9
$Hi again
STORED #2
$quit
Connection closed by foreign host.
Synchronization issues
- Parallelism
- Only one client can access the database at a time
- Data races
- Multiple threads can conflict on concurrent memory access
- Lock granularity
- Unnecessarily strict locking reduces parallelism
- Denial of service
- A misbehaving client can block out all other clients indefinitely
- Deadlock
- Circular locking: a client block itself (and all other clients)
- Lock ordering
- Resource exhaustion
- Too many clients can crash the server
dbmap
API
db.bucket(KEY)
- Return bucket number containing
KEY
- Return bucket number containing
db.bbegin(B)
,db.bend(B)
- Iterators to beginning and end of bucket
B
- Iterators to beginning and end of bucket
db.bfind(B, KEY)
- Return iterator to item with key
KEY
- Return iterator to item with key
db.binsert(B, KEY)
- Insert new item with key
KEY
, return iterator to it
- Insert new item with key
db.berase(B, IT)
- Erase item
*IT
- Erase item
Summary
- weensydb01
can handle only one connection at a time
- Put each connection in its own thread
- weensydb02
has data races
- Add a coarse-grained mutex protecting the database
- weensydb03
has coarse-grained locking and therefore little parallelism between connections
- Implement fine-grained synchronization: one mutex per hash bucket
- weensydb04
is vulnerable to a (perhaps unintentional) denial-of-service attack: if a connection doesn’t
supply a value in
set
, then that thread will block forever with the corresponding mutex locked- Don’t block with a mutex acquired if possible
- Read the value before acquiring the mutex
- weensydb05
is vulnerable to a (perhaps unintentional) denial-of-service attack: if a connection doesn’t
supply a value in
set
, then that thread will block forever with the corresponding mutex locked- Don’t block with a mutex acquired if possible
- Read the value before acquiring the mutex
- Also applies to writing data back to the user
- weensydb06
introduces the
exch
command, which swaps two values: it’s vulnerable to deadlock!- Don’t acquire the same mutex twice
- weensydb07 is perfect…?
Principles of synchronization
- Atomic operations
- Modern synchronization rests on atomic operations that read and write shared state in one atomic step
- Example: Atomic increment, decrement
- Data races
- Concurrent unsynchronized access to the same memory is undefined behavior unless all accesses are reads
- Mutual exclusion
- Avoid data races with mutex synchronization objects
- A mutex protects access to shared state
- Guard synchronization objects ensure mutual exclusion throughout a scope
- Relationship between mutex and state is implicit (marked by code)
- Condition variables
- Threads may need to wait for a condition to become true
- This can cause polling, which is expensive in CPU time
- Condition variables allow safe blocking on a condition (no lost wakeups)
- Code must notify cv when condition changes
- Granularity
- Coarse-grained synchronization object covers large state/broad condition
- Fine-grained synchronization object covers small state/narrow condition
Fairness
- Who wins the race to acquire a mutex? Is it fair? Can a thread starve?
- Ticket lock
Atomicity and security
- Time-of-check-to-time-of-use (TOCTTOU) vulnerabilities (link)
Research!
Next
-
CS 161: Operating Systems
-
CS 165: Data Systems (fall)
-
CS 260r: Topics in Computer Systems (this year: parallel and distributed execution models)
-
CS 263: Systems Security (fall 2023)
-
CS 265: Big Data Systems (spring 2023)
-
CS 153: Compilers (fall)
-
CS 141: Computing Hardware
-
CS 148/248: Design of VLSI Circuits and Systems
-
CS 145/245: Cloud Networking and Computing (2022/2023)
-
CS 96: System Design Projects: Machine Learning for Social Impact
-
CS 124: Data Structures and Algorithms
-
CS 143: Networks
-
CS 175: Computer Graphics
-
CS 181: Machine Learning
-
CS 191: Classics of Computer Science
-
CS 222: Algorithms at the Ends of the Wire (future)
-
CS 286: Multi-Robot Systems: Control, Communication, and Security