Overview
This course is about systems software programming, and how and why systems software works.
What is a system?
Computer systems software is the software that acts as a foundation for other computer applications. Systems software includes low-level operating systems code, support libraries for programming language runtimes (like Python), databases, and network servers.
Systems software runs in challenging circumstances. It often has stringent performance requirements: a modern Web server should be able to serve hundreds of thousands of Web pages a second. It often operates in hostile environments: an operating system must resist attack from malicious people or bots. And, unlike pure algorithms, it exists in the practical world, and the needs of the practical world are always changing. It runs on real hardware, which has interesting performance characteristics that change over time and that affect which algorithms work best. And users and developers always have new needs or provide new workloads.
These challenges make systems software an excellent context to learn computer programming. We think systems software programming is a critical skill for computer scientists. If you understand systems programming, you will be able to analyze and solve more software problems—you will have the tools to tame some of the most confusing bugs there are. Few computer scientists are full-time systems programmers, but every important program I’ve ever worked on has portions that demand a systems approach. And systems programming is really fun: it’s fun to figure out how software really works. You get there by building systems yourself.
Your work
(This and the next slides summarize several aspects of the course policies.)
- Six problem sets
- Two midterms and a final in person
- Section
- Starting next week
- Attendance checked, especially for simultaneously-enrolled students
Grading
- Rough breakdown: 50% assignments, 35% tests, 15% participation
- Course grading: A means mastery
- Grading with extra credit
- Each problem set has extra credit opportunities
- Final course grades are assigned by computing two scores, one without extra credit and one with extra credit
- You get the maximum of the two grades
- No conversion to pass/fail after 5th Monday
Problem sets and collaboration
- Collaboration on problem sets is encouraged
- Discuss general strategies, code structure, specific bugs with your classmates
- But you must turn in your own code, and you must understand your code
- You should be able to replicate your solution, from scratch, without collaboration or AI help
- Cite all help (except staff)
- New this year
- Three distinct commits: Each problem set must show evidence of having been worked on over time, in the form of three different commits in the history that pass different numbers of tests
- We may ask students to answer oral questions about their code
AI
- The goal of this course is to teach you a valuable way of thinking
- You may code with an AI assistant, but:
- You must turn in your own code, and you must understand your code
- Cite any AI assistants you use
- No collaboration with humans or AIs on tests and exams
Our programming language
We use the C++ programming language in this class.
C++ is a boring, old, and unsafe programming language, but boring languages are underrated. C++ offers several important advantages for this class, including ubiquitous availability, the ability to demonstrate impactful errors, and a good standard library of data structures.
Pset 0 links to several C++ tutorials and references, and to a textbook.
Class outline
- Data representation
- How do computers represent different kinds of information?
- How does data representation impact performance and correctness?
- Assembly & machine programming
- What language is understood by computer processors?
- How is code you write translated to code a processor runs?
- Kernel programming
- How do hardware and software defend against bugs and attacks?
- How are operating systems interfaces implemented?
- Storage & caching
- What kinds of computer data storage are available, and how do they perform?
- How can we improve the performance of a system that stores data?
- Process management
- How can programs running on the same computer cooperate and interact?
- What kinds of operating systems interfaces are useful?
- Concurrency
- How can a single program safely use multiple processors?
- How can multiple computers safely interact over a network?
Add
Let’s investigate the representation of integers and code.
#include <cstdio>
#include <string>
int add(int a, int b) {
return a + b;
}
int main(int argc, char* argv[]) {
// we must have exactly 3 arguments (including the program name)
assert(argc == 3);
// convert texts to integers
int a = std::stoi(argv[1]);
int b = std::stoi(argv[2]);
// print their sum
printf("%d + %d = %d\n", a, b, add(a, b));
}
Questions
- What are
a
andb
? - Where are
a
andb
? - What is even happening?
Primitive types, values, and objects
-
A type defines a set of related values in a programming language.
-
A primitive type is irreducible, meaning its values aren’t composed of smaller values.
- Different programming languages have different primitive types.
- In C++, the primitive types include integers, like 0 and 1; floating
point numbers, like 0.5 and
INFINITY
; booleanstrue
andfalse
; and pointers.
-
An object is a value stored in memory.
- The standard says “a region of data storage in the execution environment, the contents of which can represent values”.
- Objects, unlike values, can change.
- Because, unlike values, they are present somewhere in the real physical world!
Where’s a
and b
?
From https://www.ifixit.com/News/46884/m1-macbook-teardowns-something-old-something-new
Computer addresses
- Computer software in execution deals with zeroes and ones: bits
- Bits are grouped into groups of eight called bytes
- A byte is a kind of integer with a value between 0 and 255
0b00000000
(0x00
) is the byte (with decimal value) 00b00000001
(0x01
) is the byte 10b00001101
(0x0d
) is the byte 13
- And the bytes are stored in memory
- Memory is made up of billions and billions of transistors—“MOSFETs” (metal–oxide–semiconductor field-effect transistors: don’t ask)
- We need a way to refer to specific bytes of memory
- So we can refer to an object—or change it
- Which MOSFETs contain the bytes that make up
a
andb
?
- Each byte of memory has an address
- Which is stored as another integer!
Stored-program architecture
- Modern computers use a stored-program architecture
- Instructions and data are both stored as bytes in the same underlying memory
- Instructions work on primitive machine values
- Including one-byte, two-byte, four-byte, and eight-byte integers, and four-byte and eight-byte floating point numbers, as well as others
- Mapping between programming language types and machine values is the work of
the compiler
- The compiler takes a program and changes it into equivalent instructions
- Can different programs have the same equivalent instructions?