Here’s a brief introduction to file descriptors for CS 61.
For another presentation of this material, see CS:APP3e chapter 10, particularly through section 10.5. Section 10.4.2 may be particularly interesting for Problem Set 4!
A file descriptor is the Unix abstraction for an open input/output stream: a file, a network connection, a pipe (a communication channel between processes), a terminal, etc.
A Unix file descriptor thus fills a similar niche as a stdio
However, whereas a
stdout) is a pointer to
some object structure, a file descriptor is just an integer. For
example, 0, 1, and 2 are the file descriptor versions of
(Integers are used because they’re easier for the operating system kernel to
verify than arbitrary pointers. Although the kernel has objects somewhat
FILE*s, it doesn’t give applications direct access to those
objects. Instead, an array called the file descriptor table stores an
array of such objects. The file descriptors that applications manipulate are
indexes into this table. It’s very easy to check that an integer is in
Logically, a file descriptor comprises a file reference, which represents
the underlying data (such as
/home/kohler/grades.txt), and a file
position, which is an offset into the file. There can be many file
descriptors simultaneously open for the same file reference, each with a
different position. For disk files, the position can be explicitly changed: a
process can rewind and re-read part of a file, for example, or skip around, as
we saw with strided I/O patterns. These files are called seekable. However,
not all types of file descriptor are seekable. Most communication channels
between processes aren’t, and neither are network channels.
File descriptor system calls
These are the most common system calls relating to file descriptors. You may
read about them in detail using
man: for instance,
man 2 open,
man 2 read,
man 2 lseek. The “
2” means “tell me about the system call.” Or you
can check the book.
int open(const char* pathname, int flags, [mode_t mode])
Open the file
pathname according to
mode, which a set of flags containing
exactly one of
O_RDONLY (open for reading),
O_WRONLY (open for writing),
O_RDWR (open for both reading and writing), as well as other optional
flags. Returns a file descriptor for the open file, or -1 on error.
Other important flags include:
O_CREAT: Create the file if it does not exist, using the
modeargument to set the file’s initial permissions. (Typically the
modeargument will be
S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP, which allows the current user and group to read or write the file.
O_CREAT | O_EXCL: Create the file and fail if the file already exists.
O_APPEND: Open the file in append mode: every
writeautomatically jumps to the end of the file and makes the file longer.
O_TRUNC: Truncate the file to length 0.
ssize_t read(int fd, void* buf, size_t sz)
Read at most
sz bytes from file descriptor
fd into buffer
Returns the number of bytes read, if any. Returns 0 at end of file and
-1 on error.
sz, but it can return less. For instance,
there might be just
sz - 2 bytes left in the file, or there might only
sz - 10 bytes available to read at the moment. A
returns less than the requested number of bytes is called a short
sz > 0, then the return value 0 is a reliable end-of-file
indicator. For instance, when reading a pipe, 0 means the other end of
the pipe has closed. Other short reads are not reliable end-of-file
indicators. For instance, when reading from the terminal, a read of 1024
bytes might return 1 byte because the user has only typed 1 byte so far;
the user might still type more bytes in the future.
The return value
-1 indicates an error (possibly a restartable error)
and means that no bytes were read. If any bytes were read, the return value
will be greater than 0. However, not all errors are equally serious.
Permanent and restartable errors
write system calls, as well as some other system calls, are
so-called “slow” system calls that can return different classes of error.
Some errors indicate problems with the underlying file. For instance, the
EIO error indicates disk corruption, and
ENOSPC indicates that the disk is
full. These errors, which we’ll call permanent errors, should be returned
to the user.
Other, restartable errors indicate a temporary blip, and retrying the slow
system call will likely succeed. These errors are
kernel uses these error codes to indicate an interruption or condition that
the process may want to check.1 I/O libraries must sometimes mask
these errors by retrying until the errors go away. For example, the stdio
fflush function retries its writes until restartable errors go
away; and in pset 4, your
io61_flush function must do the same.
Error codes like
EINTR are defined in
#include <cerrno>. When a
system call returns an error, it generally returns -1; the error code is
returned in a special global variable called
Each system call manual page list all errors that can occur for that system
call. Read this page for read by looking at
read(2). (That notation means
“the page for read in section 2 of the manual”; run
man 2 read.)
ssize_t write(int fd, const void* buf, size_t sz)
Write at most
sz bytes to file descriptor
fd from buffer
Returns the number of bytes written, if any. Returns
-1 on error.
sz, but as with
read, it might return less: a
short write. Short writes are less common than short reads, but they can
happen; for instance, the drive storing the file might not have space for all
sz bytes, or a write attempt might be interrupted by a signal.2
The return value
-1 indicates an error (possibly a restartable error)
and means that no bytes were written. If any bytes were written, the return
value will be greater than 0. The return value
0 is possible only if
sz == 0.
off_t lseek(int fd, off_t pos, int whence)
Change file descriptor
fd’s position and return the resulting position
relative to the beginning of the file. There are three important values for
SEEK_SET: Set the file position to
pos == 0sets the position to the beginning of the file,
pos == 1sets it one byte in, and so forth.
SEEK_CUR: Change the file position relative to the current position.
pos == 0leaves the position unchanged,
pos == 10skips over the next 10 bytes, and so forth.
SEEK_END: Set the file position relative to the file size.
pos == 0sets the position to the end of the file,
pos == -1sets it to the last byte in the file, and so forth.
lseek(fd, 0, SEEK_CUR) returns the current position without changing it.
-1 on error, which can happen, for example, if the file is not
seekable or the new file position is out of range for the file.
int close(int fd)
Close the file descriptor.
The Unix error convention is that system calls return
-1 on error. A
int errno, is then set so the program can tell what
kind of error occurred.3 The
<cerrno> header file defines symbolic
names for specific error conditions. Each name starts with
example, the system calls above “return
fd is not an open
file descriptor.” This actually means that the system call returns the
-1 (cast to the appropriate type), and the global
variable is set to the constant
const char* strerror(int errnum) library function returns a
textual string describing an error constant. For instance,
"Invalid argument". This might be useful
A system call’s manual page will list the errors it might return.
Additional system calls
The following system calls might also be useful for problem set 4, depending on your implementation strategy. Read their manual pages, consult CS:APP3e or our handout code for more.
void* mmap(void* addr, size_t len, int prot, int flags, int fd, off_t offset)
Memory-map a portion of a file, returning the mapped address. Returns
MAP_FAILED == (void*) -1on error. Doesn’t work for all file types.
int munmap(void* addr, size_t len)
Unmap a previously-mapped memory region.
int madvise(void* addr, size_t len, int advice)
Provide prefetching advice for a portion of a memory-mapped region.
int posix_fadvise(int fd, off_t pos, off_t len, int advice)
Provide prefetching advice for a portion of a file descriptor.
EINTRmeans a signal was delivered to the process before any reading or writing could occur, and
EAGAINmeans the system call would normally block, but the file descriptor is in non-blocking mode. ↩︎
Different types of file have different behavior around short writes. When writing to a Linux pipe, for example, writes of 4096 or fewer bytes cannot be short. Such writes will happen either completely or not at all (i.e., the return value from
writewill either be the number of bytes requested or
-1). (Ref; confirmed in the code) ↩︎
errnovariable is actually thread-local—in a multithreaded program, each thread has its own