Overview
In this lecture, we start building a networked database.
Database
- An organized collection of data
- Responsible for storing, querying, and modifying records
- Records might be structured (divided into predefined fields) or unstructured (byte strings)
- Queries and modifications might be general or limited
- General query: “Return a list of the current grades of all students with 47 or more late hours remaining”
- Limited query: “Return the single record with key
hello
”
Key-value database
- Limited queries, unstructured data
- Keys are strings, values are strings
- Operations:
get
,set
,delete
get(K)
: Return the value stored for keyK
set(K, V)
: Change the value stored for keyK
toV
delete(K)
: Delete any value stored for keyK
- Could be implemented by
std::map
orstd::unordered_map
!
- Example:
memcached
- At Facebook in 2013, “loading one of our popular pages results in an average of 521 distinct items fetched from
memcache
. … We provision hundreds ofmemcached
servers in a cluster” [ref] - Still in use at massive scale, alongside other key/value caches
- At Facebook in 2013, “loading one of our popular pages results in an average of 521 distinct items fetched from
Databases and networks
- A database is more useful when it can respond to queries from all over the world
- Add a network interface
Representing networking in system calls
- A network connection is a streaming connection, like a pipe, between two different computers
- Because the “pipe” is between unrelated computers, there’s no common ancestor—no “parent process” to inherit the pipe from
Clients and servers
- Server
- A program that opens a listening port that can accept new connections
- Passive
- Client
- A program that opens a new connection to a preexisting listening port on some server
- Active
- Commands are respresented as byte streams
- Data representation!
- Requires serialization (to change a command into a stream of bytes) and deserialization (to parse a stream of bytes into a command)
WeensyDB protocol
./weensydb01
starts servertelnet localhost 6162
(ortelnet cs61.seas.harvard.edu 6162
)
kohler@unshare$ telnet cs61 6162
Trying 34.193.34.22...
Connected to cs61.seas.harvard.edu.
Escape character is '^]'.
$GET hello
VALUE hello 5 #1
world
END
$SET hello 3 [3 = size of value]
$Hi
STORED #1
$GET hello
VALUE hello 3 #1
Hi
END
$DELETE hello
DELETED #1
$SET hello 9
$Hi again
STORED #2
$Q
Connection closed by foreign host.
System calls?
Unix networking system calls
socket
: Create a new file descriptor for networking- Server system calls
bind
: Connect the socket to a local portlisten
: Declare the socket as a server socketaccept
: Block until there is a new connection to the server socket, then return it as a new file descriptor
- Client system calls
connect
: Connect the socket to a server port, usually on another machine
WeensyDB server loop
int fd = open_listen_socket(port);
// `fd` doesn’t receive *data*—it receives *new connections*
while (true) {
struct sockaddr addr;
socklen_t addrlen = sizeof(addr);
// Accept connection on listening socket
int cfd = accept(fd, &addr, &addrlen);
// `cfd` is a new connection from the computer at `addr`
// Handle connection
handle_connection(cfd, unparse_sockaddr(&addr, addrlen));
// `handle_connection` closes `cfd` when done
}
WeensyDB internals
- Hash table
- Hash function: bucket number = first character of key
dbmap
API
db.bucket(KEY)
- Return bucket number containing
KEY
- Return bucket number containing
db.bbegin(B)
,db.bend(B)
- Iterators to beginning and end of bucket
B
- Iterators to beginning and end of bucket
db.bfind(B, KEY)
- Return iterator to item with key
KEY
- Return iterator to item with key
db.binsert(B, KEY)
- Insert new item with key
KEY
, return iterator to it
- Insert new item with key
db.berase(B, IT)
- Erase item
*IT
- Erase item
Synchronization issues
- Parallelism
- Only one client can access the database at a time
- Data races
- Multiple threads can conflict on concurrent memory access
- Lock granularity
- Unnecessarily strict locking reduces parallelism
- Denial of service
- A misbehaving client can block out all other clients indefinitely
- Deadlock
- Circular locking: a client block itself (and all other clients)
- Lock ordering
- Resource exhaustion
- Too many clients can crash the server