Overview
In this lecture, we build a correctly-synchronized networked database.
Key-value database
- A map that’s connected to the network
- Operations:
get
,set
,delete
get(K)
: Return the value stored for keyK
set(K, V)
: Change the value stored for keyK
toV
delete(K)
: Delete any value stored for keyK
- Example:
memcached
- At Facebook in 2013, “loading one of our popular pages results in an average of 521 distinct items fetched from
memcache
. … We provision hundreds ofmemcached
servers in a cluster” [ref]
- At Facebook in 2013, “loading one of our popular pages results in an average of 521 distinct items fetched from
Network API
- A network connection is a streaming connection, like a pipe, between two different computers
- Because the “pipe” is between unrelated computers, there’s no common ancestor—no “parent process” to inherit the pipe from
Clients and servers
- Server
- A program that opens a listening port that can accept new connections
- Passive
- Client
- A program that opens a new connection to a preexisting listening port on some server
- Active
WeensyDB server loop
int fd = open_listen_socket(port);
// `fd` doesn’t receive *data*—it receives *new connections*
while (true) {
struct sockaddr addr;
socklen_t addrlen = sizeof(addr);
// Accept connection on listening socket
int cfd = accept(fd, &addr, &addrlen);
// `cfd` is a new connection from the computer at `addr`
// Handle connection
handle_connection(cfd, unparse_sockaddr(&addr, addrlen));
// `handle_connection` closes `cfd` when done
}
WeensyDB internals
- Hash table
- Hash function: bucket number = first character of key
WeensyDB protocol
./weensydb-01
starts servertelnet localhost 6162
(ortelnet cs61.seas.harvard.edu 6162
)
kohler@unshare$ telnet cs61 6162
Trying 34.193.34.22...
Connected to cs61.seas.harvard.edu.
Escape character is '^]'.
$get hello
END
$set hello 3 [3 = size of value]
$Hi
STORED #1
$get hello
VALUE hello 3 #1
Hi
END
$delete hello
DELETED #1
$set hello 9
$Hi again
STORED #2
$quit
Connection closed by foreign host.
Synchronization issues
- Parallelism
- Only one client can access the database at a time
- Data races
- Multiple threads can conflict on concurrent memory access
- Lock granularity
- Unnecessarily strict locking reduces parallelism
- Denial of service
- A misbehaving client can block out all other clients indefinitely
- Deadlock
- Circular locking: a client block itself (and all other clients)
- Lock ordering
- Resource exhaustion
- Too many clients can crash the server
dbmap
API
db.bucket(KEY)
- Return bucket number containing
KEY
- Return bucket number containing
db.bbegin(B)
,db.bend(B)
- Iterators to beginning and end of bucket
B
- Iterators to beginning and end of bucket
db.bfind(B, KEY)
- Return iterator to item with key
KEY
- Return iterator to item with key
db.binsert(B, KEY)
- Insert new item with key
KEY
, return iterator to it
- Insert new item with key
db.berase(B, IT)
- Erase item
*IT
- Erase item