Git

Git is a version control system.

Introduction

So you’re working on a problem set. You start with the problem set handout.

You change it some.

And some more.

But wait a minute: somewhere in this latest version, you made a mistake! Oh no! But there are so many changes—which one is at fault?

If only you could find out how the file changed! If only you had a program that remembered every version of your file:

And could highlight differences:

This is what version control systems do. Git is a version control system.

Repositories

Git works in units called repositories. A git repository contains two main pieces: the version repository and the working copy.

The working copy consists of normal files arranged in a directory structure. The version repository stores previous versions of these files and directories. We’ll draw them like this:

Remember, though, that a working copy usually contains many files, not just one, and each version in the repository is a snapshot of many files’ states, not just one file’s state.

Each version in the repository is called a commit, and each commit has a unique name. The name is a 40-character hexadecimal string, derived from a SHA-1 cryptographic hash of the commit’s contents and its previous history. (There are 2¹⁶⁰ possible commit hashes.)

Luckily you can abbreviate these names to unique prefixes. Only 5 or 6 characters are usually required to uniquely identify a commit in the current repository, but you can type as many as you want.

Creating a repository

To create a repository in the current directory—for instance, when you’re starting a new project—just run git init. This will make a blank repository with no commits.

It’s more common, though, to make a copy of an existing repository, such as one on code.seas.harvard.edu. The command for that is git clone URL, where URL names the “upstream” repository you want to clone. For example:

% git clone git@code.seas.harvard.edu:~kohler/cs61/cs61-psets.git

will create a new repository on your machine, placing it in the cs61-psets directory. The repository is initialized with a copy of the commits in git@code.seas.harvard.edu:~kohler/cs61/cs61-psets.git.

The URL you pass to git clone is normally an SSH URL, such as that above. SSH URLs allow you to push your changes back to the upstream repository. But other forms are possible too. For example, URLs like git://code.seas.harvard.edu/cs61/cs61-psets.git can be used to clone public repositories. You can even create a local copy of a repository with git clone OLDDIRECTORY NEWDIRECTORY.

Committing

Say you make an edit to your working copy.

This edit is not part of your version repository yet. Git doesn’t automatically track every change.

The git commit -a command takes a snapshot of your current working copy and saves it in the version repository.

The -a means “commit all changes in the working copy.”

As part of the commit process, you’ll be required to enter a commit message. This describes the commit. It’s best to enter something that will be meaningful to you later, but don’t let that stop you from committing early and often.

Once you’ve entered a commit message, git will create the new commit’s hash

and report it to you.

% git commit -a
[master 0716b85] More
 1 file changed, 11 insertions(+), 11 deletions(-)

Logging

The git log command reports all of the commits in your current history. It lists them in reverse order, so the most recent commit comes first.

% git log
commit 1c1bbc3eb93e0ea44d138cb35c77f225b6ce7b54
Author: Eddie Kohler <ekohler@gmail.com>
Date:   Thu Sep 20 20:42:04 2012 -0400

    compare.pl: {Spaces}??? -- the ??? can be preceded by any number of spaces.

    Thanks to Kenneth Ho.

commit a20ef24d4824c499db2d2211f87281ce2065446b
Author: Eddie Kohler <ekohler@gmail.com>
Date:   Tue Sep 18 08:52:42 2012 -0400

    Add and update boundary write error tests.

commit 6240b8872becb52531d92ff373f40f4aea291365
Author: Eddie Kohler <ekohler@gmail.com>
Date:   Fri Sep 14 21:23:34 2012 -0400

    Initial commit of problem set 1.

My favorite way to log commits is git log -p. This shows, with each commit, the difference between that commit and the previous commit. The difference is called a patch (thus the -p).

% git log -p
commit 1c1bbc3eb93e0ea44d138cb35c77f225b6ce7b54
Author: Eddie Kohler <ekohler@gmail.com>
Date:   Thu Sep 20 20:42:04 2012 -0400

    compare.pl: {Spaces}??? -- the ??? can be preceded by any number of spaces.

    Thanks to Kenneth Ho.

diff --git a/pset1/compare.pl b/pset1/compare.pl
index b9faaed..6e5fbf8 100644
--- a/pset1/compare.pl
+++ b/pset1/compare.pl
@@ -18,6 +18,7 @@ while (defined($_ = <EXPECTED>)) {
                  "r" => "", "match" => []};
        foreach my $x (split(/(???|??\{.*?\}(?:=\w+)???)/)) {
            if ($x eq "???") {
+               $m->{r} =~ s{(\\ )+$}{\\s+};
                $m->{r} .= ".*";
            } elsif ($x =~ /\A??\{(.*)\}=(\w+)??\z/) {
                $m->{r} .= "(" . $1 . ")";

commit a20ef24d4824c499db2d2211f87281ce2065446b
Author: Eddie Kohler <ekohler@gmail.com>
Date:   Tue Sep 18 08:52:42 2012 -0400

    Add and update boundary write error tests.

diff --git a/pset1/test020.c b/pset1/test020.c
index 0f59130..42e0ca5 100644
--- a/pset1/test020.c
+++ b/pset1/test020.c
@@ -2,7 +2,7 @@
 #include <stdio.h>
 #include <assert.h>
 #include <string.h>
-// test020: check for wild writes off the end of the allocated block.
+// test020: check for boundary write errors off the end of an allocated block.

 int main() {
     int *ptr = (int *) malloc(sizeof(int) * 10);
diff --git a/pset1/test027.c b/pset1/test027.c
...

It is valuable to learn how to read patches. Compared to the previous commit, a commit deletes the lines indicated by - and adds new lines indicated by +. Lines that start with a space weren’t changed; they’re provided for context. (Learn more about diffs)

Branches

Git commits are arranged into lists called branches. The most important branch is master. This is the default branch; it’s created whenever you initialize a new git repository.

Remember what git printed when you committed?

% git commit -a
[master 0716b85] More
 1 file changed, 11 insertions(+), 11 deletions(-)

Now you know what master means. This commit changed the master branch.

Git can support many branches, not just master. Furthermore, a branch can involve actual branching points, rather than a linear sequence of commits.

A branch name, such as “master,” also names a commit, namely the most recent commit on that branch. This most recent commit changes over time as you add new commits. Commit hashes, in contrast, are stable: a commit hash always means the same version.

Diffs

Main article on diffs

Comparing different versions of your code is incredibly valuable, and one of the best reasons to use a version control system. Git presents differences in what’s called the unified diff format.

There are a couple main patterns for calling git diff.

Run git diff (with no arguments) to see the differences between your current commit (plus the staging area; see below) and your working copy.
Run git diff COMMIT to see the difference between a named commit and your working copy.
Run git diff COMMIT1 COMMIT2 to see the differences between the two named commits.

If git diff reports too much information, then try git diff FILENAME (or git diff COMMIT1 COMMIT2 FILENAME), which prints just the differences to FILENAME.

Undoing changes

To undo changes to a file’s working copy, use git checkout FILENAME. This permanently throws away your changes and rolls back to the most-recently-committed version of FILENAME. You can also roll back to an earlier version: try git checkout COMMIT FILENAME.

Adding files

git commit -a commits changes to all the files that git knows about. But you may sometimes want to add a new file to the repository—to tell git to start tracking another file. This requires the git add command:

% git add FILENAME
% git commit -a

The git add command doesn’t actually commit the change. Instead, git adds the file to an invisible staging area that holds changes until the next commit. That’s why we follow git add with git commit -a.

Removing and renaming files

To remove a file from the version repository, use git rm FILENAME. Like git add, this affects the working tree and staging area; a separate git commit is required to really change the version repository. To rename a file, use git mv OLDFILENAME NEWFILENAME (and then commit).

Status

The git status command reports on the current state of your working copy. Here's an example:

% git status
# On branch master
# Your branch is ahead of 'origin/master' by 6 commits.
#
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#       modified:   pset1/README.txt
#
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#       pset1/test.txt
no changes added to commit (use "git add" and/or "git commit -a")

git status is a good overview, and its parenthetical comments are helpful too. Its most valuable output is the “Untracked files” section. This tells you which files in your working copy have no equivalents in the version repository. Forgetting to git add a file is one of the most common git errors. Running git status once in a while and checking the untracked files section is good practice for everyone.

Partial commits

Sometimes you don’t want to commit your entire working copy. For instance, maybe your README.txt is ready to share with your partner, but your m61.c is broken and currently crashes all the tests.

Tell git commit the files you want to commit, instead of passing -a, and it will commit just the named files. For instance:

% git commit README.txt

Alternately, you can stage files for commit using git add. Just as when you add files, this tells git that the next commit, whenever it happens, should include the changes currently in README.txt:

% git add README.txt

git status will show that README.txt’s change has been staged. It also says how you could unstage the file.

% git status
# On branch master
# Your branch is ahead of 'origin/master' by 6 commits.
#
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#       modified:   pset1/README.txt
#
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#       pset1/test.txt

Once you’re ready, git commit will commit all staged changes.

Finally, the wicked cool git add -p interactively prompts you to select individual modifications to stage.

Remotes

A git repository can associate with other repositories called remotes. Remotes are stored elsewhere—for example, on github, or code.seas.harvard.edu.

For example, in CS 61, you make commits in the repository on your CS50 Appliance. But to turn in your code, and to protect your work against Appliance problems, you save your changes into a git repository hosted on code.seas.harvard.edu. Your Appliance repository sees your code.seas repository as a remote.

The git remote command lists a repository’s remotes. If you used git clone to create your repository, you will see at least one remote, called origin:

% git remote
origin

The origin remote keeps track of the original source of your repository. In CS 61, this will most likely be a repository hosted on code.seas.harvard.edu.

You can add a new remote whenever you like using git remote add. For instance, add a remote “handout” for CS 61 handout code:

% git remote add handout git://code.seas.harvard.edu/cs61/cs61-psets.git

Your git repository maintains a shadow copy of all the data in each remote.

The remote name, such as origin, refers to the shadow copy.

Since git stores shadow copies of remotes, it can also display their contents. Many of the commands we’ve seen work well on remotes. For example, say you have a handout remote. Then try:

git log handout/master

Print log of commits to the handout remote’s master branch.

git diff handout/master master

Compare the most recent handout code (handout/master) with your most recent commit (master).

Shadow copies get stale as remote repositories change.

The git fetch command updates your repository’s shadow copy from the actual remote repository. git fetch REMOTENAME updates your shadow copy of the REMOTENAME remote; git fetch --all updates all remotes.

git fetch does not change your version repository. It only changes shadow copies. Your working copy, and your repository’s version history, remain unchanged.

Merges

A merge joins development streams together. For instance:

You and a partner split up work on a problem set, and commit your work to different repositories. A merge can combine your changes.
The instructors make changes to a problem set, and you have already begun work. A merge will combine your changes with the instructors’ updates.

Let’s take an example: git merge origin/master. This merges the current branch with origin/master, which is the current value of the master branch in the shadow copy of the origin remote.

First, git checks where the two repositories diverged. Let’s say the commits look like this:

The common commits are b7dde5 and eb519b. After that, the histories diverge. Git thinks of these histories as a branched graph.

The git merge origin/master command combines the branches into a new commit that covers both histories.

If you look at git log after merging, you can see both histories:

commit b0ce5ed01c688921ed0eae69b6bd571619935fad
Merge: 60c4bc9 d35aa90
Author: Eddie Kohler <kohler@seas.harvard.edu>
Date:   Thu Oct 4 14:25:55 2012 -0400

    Merge branch 'master' of git@code.seas.harvard.edu:cs61/cs61-psets.git

git merge is a form of git commit; it will allow you to edit the merge message if you like.

Conflicts

In an ideal world, all merges would complete automatically. But in the real world, sometimes the merged branches edit exactly the same portion of the code. Git is not smart enough to figure out which edits to use, so it stops and reports a conflict. You must edit the files to resolve the conflict by hand.

Here’s an example failed merge:

% git merge origin/master
Auto-merging pset1/test004.c
CONFLICT (content): Merge conflict in pset1/test004.c
Auto-merging pset1/test003.c
CONFLICT (content): Merge conflict in pset1/test003.c
Automatic merge failed; fix conflicts and then commit the result.

ALL CAPS ARE SCARY, WHAT SHOULD WE DO? Well, maybe you don’t want to do a merge after all! Type git merge --abort and git will restore the pre-merge version. But probably you want to resolve the conflict. To do this, you need to know how git shows conflicts. Here’s the conflicted state of pset1/test003.c:

#include "m61.h"
#include <stdio.h>
// test003: active allocation counts.

int main() {
    void *ptrs[10];
    for (int i = 0; i < 10; ++i)
<<<<<<< HEAD
        ptrs[i] = malloc(i | 1);
=======
        ptrs[i] = malloc(i + 1);
>>>>>>> origin/master
    for (int i = 0; i < 5; ++i)
        free(ptrs[i]);
    m61_printstatistics();
}

//! malloc count: active          5   total         10   fail        ???
//! malloc size:  active        ???   total        ???   fail        ???

The <<<<<<<, ======= and >>>>>>> lines are conflict markers. Git adds conflict markers around the conflicting edits. The text between <<<<<<< and ======= represents your edits—the ones in HEAD (git’s name for the current branch). The text between ======= and >>>>>>> represents the other branch’s edits—here, the ones in origin/master.

To resolve a conflict, just choose which edit to keep and delete the other edit. For example, to keep your edit, remove the markers and the origin/master code, to produce:

... for (int i = 0; i < 10; ++i)
        ptrs[i] = malloc(i | 1);
    for (int i = 0; i < 5; ++i)...

You might also want to create a new version that combines the edits.

... for (int i = 0; i < 10; ++i)
        ptrs[i] = malloc((i | 1) + 1);
    for (int i = 0; i < 5; ++i)...

Once you’ve edited all the conflicts (use git status to check your work), commit the result with git commit -a. (Just like the merge suggested!) The commit message is pre-loaded with merge information.

Conflict resolution can get quite hairy. Luckily, there are lots of visual tools available for editing conflicts; try git mergetool. We also find that git rebase is better at handling certain types of conflicts, but a description of rebasing will have to wait.

Fast forwards

Merging has a best case: one of the branches is a subset of the other. For example, consider git merge origin/master with these commits:

The origin/master history is a subset of the master history:

There’s nothing for the merge to do, so it will report “Already up-to-date.”

These commits are also easy to merge:

Here, the master history is a subset of the origin/master history. A

git merge origin/master` command will *fast-forward* the `master

branch to the latest version on origin/master, without creating a new commit:

A fast-forward can never create a conflict.

Pulls

The convenience command git pull combines git fetch and git merge. The command git pull ORIGIN BRANCH means almost the same thing as git fetch ORIGIN; git merge ORIGIN/BRANCH.

Pushes

git push is the inverse of git fetch. Where git fetch updates a shadow repository from the remote, git push takes your local version and copies it to a remote repository. You use this to save your local changes in a more permanent way (for example, on code.seas.harvard.edu).

The command git push REMOTE BRANCH will take your local BRANCH and push a copy of it onto REMOTE’s BRANCH. For example, git push origin master pushes your local master branch to the origin remote’s master branch. For instance, from this state:

git push origin master would do this:

Now origin/master is the same commit as the local master.

Normally git push will reject every attempt to push that is not a fast-forward. If it complains, you need to merge first with the remote branch, for instance by git pull origin master. Push again after completing the merge.

Pushes don’t consider the working copy at all, they only copy committed versions.

This just scratches the surface of what git can do, but should offer you the conceptual tools and enough basic commands to get a lot done. For more on git, check out some of the resources available online:

The git home page
Git documentation—especially the book
Git tutorial videos!
SEAS’s git introduction and slides
Commands I’d like to describe in this document: git branch, git stash, git rebase, git rebase -i, git grep, git reset, git tag, git show, git commit --amend.
Other cool commands: git bisect.