Data representation 1: Introduction

Overview

This course is about systems software programming, and how and why systems software works.

Textbook readings

What is a system?

Computer systems software is the software that acts as a foundation for other computer applications. Systems software includes low-level operating systems code, support libraries for programming language runtimes (like Python), databases, and network servers.

Systems software runs in challenging circumstances. It often has stringent performance requirements: a modern Web server should be able to serve hundreds of thousands of Web pages a second. It often operates in hostile environments: an operating system must resist attack from malicious people or bots. And, unlike pure algorithms, it exists in the practical world, and the needs of the practical world are always changing. It runs on real hardware, which has interesting performance characteristics that change over time and that affect which algorithms work best. And users and developers always have new needs or provide new workloads.

These challenges make systems software an excellent context to learn computer programming. We think systems software programming is a critical skill for computer scientists. If you understand systems programming, you will be able to analyze and solve more software problems—you will have the tools to tame some of the most confusing bugs there are. Few computer scientists are full-time systems programmers, but every important program I’ve ever worked on has portions that demand a systems approach. And systems programming is really fun: it’s fun to figure out how software really works. You get there by building systems yourself.

Your work

(This and the next slides summarize several aspects of the course policies.)

Grading

Problem sets and collaboration

AI

Our programming language

We use the C++ programming language in this class.

C++ is a boring, old, and unsafe programming language, but boring languages are underrated. C++ offers several important advantages for this class, including ubiquitous availability, the ability to demonstrate impactful errors, and a good standard library of data structures.

Pset 0 links to several C++ tutorials and references, and to a textbook.

Class outline

  1. Data representation
    • How do computers represent different kinds of information?
    • How does data representation impact performance and correctness?
  2. Assembly & machine programming
    • What language is understood by computer processors?
    • How is code you write translated to code a processor runs?
  3. Kernel programming
    • How do hardware and software defend against bugs and attacks?
    • How are operating systems interfaces implemented?
  4. Storage & caching
    • What kinds of computer data storage are available, and how do they perform?
    • How can we improve the performance of a system that stores data?
  5. Process management
    • How can programs running on the same computer cooperate and interact?
    • What kinds of operating systems interfaces are useful?
  6. Concurrency
    • How can a single program safely use multiple processors?
    • How can multiple computers safely interact over a network?

Add

Let’s investigate the representation of integers and code.

#include <cstdio>
#include <string>

int add(int a, int b) {
    return a + b;
}


int main(int argc, char* argv[]) {
    // we must have exactly 3 arguments (including the program name)
    assert(argc == 3);

    // convert texts to integers
    int a = std::stoi(argv[1]);
    int b = std::stoi(argv[2]);

    // print their sum
    printf("%d + %d = %d\n", a, b, add(a, b));
}

Questions

Primitive types, values, and objects

Where’s a and b?

M1 Mac board

From https://www.ifixit.com/News/46884/m1-macbook-teardowns-something-old-something-new

Computer addresses

Stored-program architecture