File Descriptors
Here’s a brief introduction to file descriptors for CS 61.
For another presentation of this material, see CS:APP2e chapter 10, particularly through section 10.5. Section 10.4.2 may be particularly interesting for Assignment 4!
A file descriptor is the Unix abstraction for an open input/output stream: a file, a network connection, a pipe (a communication channel between processes), a terminal, etc.
A Unix file descriptor thus fills a similar niche as a stdio “FILE *
”.
However, whereas a FILE *
(like stdin
or stdout
) is a pointer to
some object structure, a file descriptor is just an integer. For
example, 0, 1, and 2 are the file descriptor versions of stdin
,
stdout
, and stderr
, respectively.
(Why use integers? Because of process isolation! The kernel must verify
every object passed to it by a user program. Otherwise, a process might
be able to construct a malformed object that, when used by the kernel,
could screw up isolation. This is another version of the page_alloc
restriction in WeensyOS and OS02, where the kernel had to prevent
processes from allocating memory in the kernel’s address space. And
integers are much easier to verify than arbitrary pointers. The kernel
gives each process its own file descriptor table, a simple array that
maps integers to valid file descriptors. It’s very easy to check that an
integer is in the array bounds.)
Logically, a file descriptor comprises a file object, which represents
the underlying data (such as /home/kohler/grades.txt
), and a
position, which is an offset into the file. There can be many file
descriptors simultaneously open for the same file object, each with a
different position. We saw this with the l17/r10-stridestdiomulti.c
program (see Lecture 18 just after the break). For disk files, the
position can be explicitly changed: a process can rewind and re-read
part of a file, for example, or skip around, as we saw with strided I/O
patterns. These files are called seekable. However, not all types of
file descriptor are seekable. Most communication channels between
processes aren’t, and neither are network channels.
File descriptor system calls
You will use the following system calls in Assignment 4. You may read
about them in detail by typing man
on your appliance: for instance,
man 2 open
, man 2 read
, man 2 lseek
. The “2
” means “tell me
about the system call.” Or you can check the book.
ssize_t read(int fd, char *buf, size_t sz)
Read bytes from file descriptor fd
into buffer buf
. Read at most
sz
bytes.
Returns the number of bytes read. This is normally equal to sz
. It
might be less, however. For instance, there might be just sz - 2
bytes
left in the file, or (if the file descriptor is connected to a pipe)
there might be sz - 10
bytes available to read at the moment.
Returns 0 at end of file, and -1
on error.
ssize_t write(int fd, const char *buf, size_t sz)
Write bytes to file descriptor fd
from buffer buf
. Write at most
sz
bytes.
Returns the number of bytes written. This is normally equal to sz
, but
it might be less. For instance, if the disk is full, and there was only
room for sz - 2
bytes.
Returns -1
on error.
off_t lseek(int fd, off_t pos, int whence)
Change file descriptor fd
’s position. Normally whence == SEEK_SET
.
Then the file’s position is set to pos
; so pos == 0
sets the
position to the beginning of the file, pos == 1
sets it one byte in,
and so forth. You may also set whence == SEEK_CUR
, which changes the
position relative to the current position, or whence == SEEK_END
,
which sets the position relative to the file’s size (i.e.,
lseek(fd, -1, SEEK_END)
sets the position to the file’s last byte).
Returns the new position, measured in bytes past the beginning of the
file. Returns -1
on error, which can happen, for example, if the file
is not seekable or the new file position is out of range for the file.
int close(int fd)
Close the file descriptor.
Understanding errors
The Unix error convention is that system calls return -1
on error. A
global variable, int errno
, is then set so the program can tell what
kind of error occurred. The <errno.h>
header file defines symbolic
names for specific error conditions. Each name starts with E
. For
example, the system calls above “return EBADF
if fd
is not an open
file descriptor.” This actually means that the system call returns the
value -1
(cast to the appropriate type), and the global errno
variable is set to the constant EBADF
.
The const char *strerror(int errnum)
library function returns a
textual string describing an error constant. For instance,
strerror(EINVAL)
returns "Invalid argument"
. This might be useful
for debugging.
A system call’s manual page will list the errors it might return.
Additional system calls
The following system calls might also be useful for Assignment 4, depending on your implementation strategy. Read their manual pages, consult CS:APP2e or our handout code, or contact Piazza for more.
void *mmap(void *addr, size_t len, int prot, int flags, int fd, off_t offset)
Memory-map a portion of a file, returning the mapped address. Returns
MAP_FAILED == (void *) -1
on error. Doesn’t work for all file types.
(CS:APP2e §9.8.4, l18/memreader.c
)
int munmap(void *addr, size_t len)
Unmap a previously-mapped memory region.
int madvise(void *addr, size_t len, int advice)
Provide prefetching advice for a portion of a memory-mapped region.
int posix_fadvise(int fd, off_t pos, off_t len, int advice)
Provide prefetching advice for a portion of a file descriptor.