Skip to content

Latest commit

 

History

History
3622 lines (2534 loc) · 162 KB

gitbuch_03.adoc

File metadata and controls

3622 lines (2534 loc) · 162 KB

Practical Version Control

The following chapter introduces all the essential techniques you’ll use in your daily work with Git. In addition to a more detailed description of the index and how to restore old versions, the focus is on working effectively with branches.

References: Branches and Tags

In the CVS/SVN environment, “Branch” and “Merge” are often a book with seven seals for newcomers, but for experts they are a regular cause for hair-raising. In Git, branching and merging are commonplace, simple, transparent, and fast. It’s common for a developer to create multiple branches and perform multiple merges in one day.

The tool Gitk is helpful in order not to lose the overview of several branches. With gitk --all you show all branches. The tool visualizes the commit graph explained in the previous section. Each commit represents one line. Branches are displayed as green labels, tags as yellow pointers. For more information, see Gitk.

gitk basic
Figure 1. The sample repository from [ch.basics]. For illustration purposes, the second commit has been tagged v0.1.

Because branches in Git are “cheap” and merges are easy, you can afford to use branches excessively. Want to try something, prepare a small bug fix, or start with an experimental feature? You can create a new branch for each of these. You want to test if one branch is compatible with the other? Merge them together, test everything, then delete the merge again and continue developing. This is common practice among developers using Git.

First, let’s look at references in general. References are nothing more than symbolic names for the hard to remember SHA-1 sums of commits.

These references are stored in .git/refs/. The name of a reference is determined by the file name, and the target is determined by the contents of the file. For example, the master branch you have been working on all along looks like this:

$ cat .git/refs/heads/master
89062b72afccda5b9e8ed77bf82c38577e603251
Tip

If Git needs to manage a lot of references, they may not be stored as files under .git/refs/. Instead, Git creates a container that contains packed references (Packed Refs): One line per reference with name and SHA-1 sum. This makes sequential resolution of many references faster. Git commands search for branches and tags in the .git/packed-refs file if the corresponding .git/refs/<name> file does not exist.

Under .git/refs/ there are several directories that represent the “type” of reference. There is no fundamental difference between these references, only when and how they are used. The references you will use most often are branches. They are stored under .git/refs/heads/. Heads refers to what is sometimes called a “tip” in other systems: The latest commit on a development branch.{fn29} Branches move up when you make commits on a branch, so they remain at the top of the version history.

commit
Figure 2. A branch always references the most recent commit

Branches in other developers' repositories (e.g. the master branch of the official repository), so-called remote tracking branches, are stored under .git/refs/remotes/ (see [sec.remote-tracking-branches]). Tags, static references, which are mostly used for versioning, are stored under .git/refs/tags/ (see Tags — Marking Important Versions).

HEAD and Other Symbolic References

Eine Referenz, die Sie selten explizit, aber ständig implizit benutzen, ist HEAD. Sie referenziert meist den gerade ausgecheckten Branch, hier master:

One reference that you rarely use explicitly, but always implicitly, is HEAD. It usually refers to the branch you just checked out, in this case master:

$ cat .git/HEAD
ref: refs/heads/master

HEAD can also point directly to a commit if you type git checkout <commit-id>. However, you are then in so-called detached-head mode, in which commits may get lost, see also Detached HEAD.

The HEAD determines which files are found in the working tree, which commit becomes the predecessor when a new one is created, which commit is displayed by git show, and so on. When we speak of “the current branch”, we mean the HEAD in a technically correct sense.

The simple commands log, show, and diff take HEAD as their first argument, without any further arguments. The output of git log is the same as the output of git log HEAD, and so on — this applies to most commands that operate on a commit if you don’t specify one explicitly. HEAD is thus similar to the shell variable PWD, which specifies “where you are”.

When we talk about a commit, a command usually doesn’t care whether you specify the commit ID in full or in abbreviated form, or whether you access the commit by reference, such as a tag or branch. However, such a reference may not always be unique. What happens if there is a branch master and a tag with the same name? Git checks if the following references exist:

  • .git/<name> (mostly only useful for HEAD or similar)

  • .git/refs/<name>

  • .git/refs/tags/<name>

  • .git/refs/heads/<name>

  • .git/refs/remotes/<name>

  • .git/refs/remotes/<name>/HEAD

Git will take the first matching reference it finds. So you should always give tags a unique scheme so that they don’t get confused with branches. This way you can address branches directly by name instead of heads/<name>.

Especially important are the suffixes ^ and ~<n>. The syntax <ref>^ indicates the direct ancestor of <ref>. This does not always have to be unique: If two or more branches were merged, the merge commit has several direct ancestors. <ref>^ or <ref>^1 then denotes the first direct ancestor, <ref>^2 the second, and so on.{fn30} So the syntax HEAD^^ means “the two-level previous direct ancestor of the current commit”. Note that ^ may have a special meaning in your shell and you may need to protect it with quotes or a backslash.

relative refs
Figure 3. Relative References, ^ and ~<n>

The syntax <ref>~<n> is equivalent to repeating ^ n times: HEAD~10 thus denotes the tenth direct predecessor of the current commit. Note: This does not mean that only eleven commits are stored between HEAD and HEAD~10: Since ^ only follows the first string in any merge, the eleven commits stored between the two references, and all the other commits integrated by a merge, are the same. The syntax is documented in the git-rev-parse(1) man page in the “Specifying Revisions” section.

Managing Branches

A branch is created in Git in no time. All Git needs to do is identify the currently checked out commit and store the SHA-1 sum in the .git/refs/heads/<branch-name> file.

$ time git branch neuer-branch
git branch neuer-branch  0.00s user 0.00s system 100% cpu 0.008 total

The command is so fast because (unlike other systems) no files need to be copied and no additional metadata needs to be stored. Information about the structure of the version history can always be derived from the commit that a branch references and its ancestors.

Here is an overview of the most important options:

git branch [-v]

Lists local branches. The currently checked-out branch is marked with an asterisk. You can also use -v to display the commit IDs to which the branches point and the first line of the description of the corresponding commits.

$ git branch -v
  maint  65f13f2 Start 1.7.5.1 maintenance track
* master 791a765 Update draft release notes to 1.7.6
  next   b503560 Merge branch _master_ into next
  pu     d7a491c Merge branch _js/info-man-path_ into pu
git branch <branch> [<ref>]

Creates a new branch <branch> pointing to commit <ref> (<ref> can be the SHA-1 sum of a commit, another branch, etc.). If you do not specify a reference, this is HEAD, the current branch.

git branch -m <new-name>

git branch -m <old-name> <new-name>

In the first form the current branch is renamed to <new-name>. In the second form <old-name> is renamed to <new-name>. The command fails if this would overwrite another branch.

$ git branch -m master
fatal: A branch named 'master' already exists.

If you rename a branch, Git will not display a message. So you can check afterwards to make sure the renaming was successful:

$ git branch
* master
  test
$ git branch -m test pu/feature
$ git branch
* master
  pu/feature
git branch -M …​

Like -m, except that a branch is also renamed if it overwrites another branch. Attention: Commits of the overwritten branch may be lost!

git branch -d <branch>

Delete <branch>. You can specify several branches at once. Git refuses to delete a branch if it is not yet fully integrated into its upstream branch, or, if it does not exist, into HEAD, the current branch. (For more on upstream branches, see [sec.pull]).

git branch -D …​

Deletes a branch, even if it contains commits that have not yet been integrated into the upstream or current branch. Note: These commits may be lost unless they are referenced differently.

Changing Branches: Checkout

You can change branches with git checkout <branch>. If you create a Branch and want to switch directly to it, use git checkout -b <branch>. The command is equivalent to git branch <branch> && git checkout <branch>.

What happens during a checkout? Each branch references a commit, which in turn references a tree, that is, the image of a directory structure. A git checkout <branch> now resolves the reference <branch> to a commit and replicates the commit’s tree to the index and to the working tree (i.e., the filesystem).

Since Git knows which version of files are currently in the index and working tree, only the files that differ on the current and new branches need to be checked out.

Git makes it hard for users to lose information. Therefore, a checkout is more likely to fail than overwrite any unsaved changes in a file. This happens in the following two cases:

  • The checkout would overwrite a file in the working tree that contains changes. Git will display the following error message: error: Your local changes to the following files would be overwritten by checkout: file.

  • The checkout would overwrite an untracked file, i.e. a file that is not managed by Git. Git then aborts with the error message: error: The following untracked working tree files would be overwritten by checkout: file.

If, however, changes are stored in the working tree or index that are compatible with both branches, a checkout takes over these changes. This would look like this, for example:

$ git checkout master
A   neue-datei.txt
Switched to branch master

This means that the file new-file.txt was added, which does not exist on either branch. So since no information can be lost here, the file is simply transferred. The message: A new-file.txt reminds you which files you should still take care of. A stands for added, D for deleted and M for modified.

If you’re sure you don’t need your changes anymore, you can use git checkout -f to ignore the error messages and run the checkout anyway.

If you want to keep the changes and change the branch (e.g., interrupt your work and fix a bug on another branch), git stash will help ([sec.stash]).

Branch Naming Conventions

In principle, you can name branches almost arbitrarily. Exceptions are spaces, some special characters with special meaning for Git (e.g. *, ^, :, ~), as well as two consecutive dots (..) or a dot at the beginning of the name.{fn31}

It makes sense to always enter branch names completely in lower case letters. Since Git manages branch names under .git/refs/heads/ as files, it is essential that you use upper and lower case.

You can group branches into “namespaces” by using a / as a separator. Branches that are related to the translation of a software can then be named e.g. i18n/german, i18n/english etc. If several developers share a repository, you can also create “private” branches under <username>/<topic>. These namespaces are represented by a directory structure, so that a directory <username>/ with the branch file <topic> is created under .git/refs/heads/.

The main development branch of your project should always be called master. Bugfixes are often managed on a branch maint (short for “maintenance”). The next release is usually prepared for next. Features that are still in an experimental state should be developed in pu (for “proposed updates”) or in pu/<feature>. For a more detailed description of how to use branches to structure development and organize release cycles, see [ch.workflows] on Workflows.

Deleted Branches and “Lost” Commits

Commits each have one or more predecessors. Therefore, you can walk through the commit graph “directed”, that is, from newer to older commits, until you reach a root commit.

It’s not the other way around: if a commit knew its successor, that version would have to be stored somewhere. This would change the SHA-1 sum of the commit, and the successor would have to reference the corresponding new commit, which would give it a new SHA-1 sum, so the predecessor would have to be changed, and so on. So Git can only go through the commits from a named reference (such as a branch or HEAD) in the direction of earlier commits.

Therefore, if the “top” of a branch is deleted, the topmost commit is no longer referenced (in Git jargon: unreachable). As a result, the predecessor is no longer referenced, and so on, until the next commit comes along that is referenced in some way (either by a branch, or by having a successor that is itself referenced by a branch).

So when you delete a branch, the commits on that branch are not deleted, they are just “lost”. Git simply doesn’t find them anymore.

However, they will still be present in the object database for a while.{fn32} So you can easily restore a branch by explicitly specifying the previous (and supposedly deleted) commit as a reference:

$ git branch -D test
Deleted branch test (was e32bf29).
$ git branch test e32bf29

Another way to retrieve deleted commits is the reflog (see Reflog).

Tags — Marking Important Versions

SHA-1 sums are a very elegant solution to describe versions decentrally, but they are semantically poor and unwieldy for humans. Unlike linear revision numbers, commit IDs alone tell us nothing about the order of versions.

During the development of software projects, different “important” versions need to be marked so that they can be easily found in the repository. The most important ones are usually those that are released, called releases. Release candidates are also often marked in this way, i.e. versions that form the basis for the next version and are checked for critical bugs in the course of quality assurance without adding new features. Depending on the project and development model, there are different conventions for marking releases and procedures for preparing and publishing them.

In the open source area, two versioning schemes have become established: the classic major/minor/micro versioning scheme and, more recently, date-based versioning. With major/minor/micro versioning, which is used e.g. with the Linux kernel and also Git, a version is identified by three (often four) numbers: 2.6.39 or 1.7.1. With date-based versioning, on the other hand, the designation is derived from the time of the release, e.g.: 2011.05 or 2011-05-19. This has the great advantage that the age of a version is easily identifiable.{fn33}

Git offers tags (“labels”) that can be used to mark any Git object — usually commits — to highlight prominent states in its development history. Like branches, tags are implemented as references to objects. Unlike branches, however, tags are static, meaning that they are not moved when new commits are added, and always point to the same object. There are two types of tags: annotated and lightweight. Annotated tags are tagged with metadata, such as author, description, or GPG signature. Lightweight tags, on the other hand, “simply” point to a specific Git object. For both types of tags, Git creates references under .git/refs/tags/ or .git/packed-refs. The difference is that for each annotated tag, Git creates a special Git object — a tag object — in the Object Database to store the metadata and SHA-1 sum of the selected object, while a Lightweight tag points directly to the selected object. The Tag Object shows the contents of a tag object; compare also the other git objects, [fig.git-objects].

tags
Figure 4. The Tag Object

The tag object shown has both a size (158 bytes) and a SHA-1 sum. It contains the name (0.1), the object type and the SHA-1 sum of the referenced object as well as the name and e-mail of the author, which is called tagger in Git jargon. In addition, the tag contains a tag message that describes the version, for example, and optionally a GPG signature. In the Git project, for example, a tag message consists of the current version designation and the signature of the maintainer.

In the following, let’s first look at how you manage tags locally. [sec.remote-tags] describes how you exchange tags between repositories.

Managing Tags

You can manage tags with the command git tag. Without arguments it shows all existing tags. Depending on the size of the project, it is worth limiting the output with the -l option and a corresponding pattern. With the following command you display all variants of version 1.7.1 of the git project, i.e. both the release candidates with the addition -rc* and the (four-digit) maintenance releases:

$ git tag -l v1.7.1*
v1.7.1
v1.7.1-rc0
v1.7.1-rc1
v1.7.1-rc2
v1.7.1.1
v1.7.1.2
v1.7.1.3
v1.7.1.4

The content of a tag is provided by git show:

$ git show 0.1 | head
tag 0.1
Tagger: Valentin Haenel <valentin.haenel@gmx.de>
Date:   Wed Mar 23 16:52:03 2011 +0100

Erste Veröffentlichung

commit e2c67ebb6d2db2aab831f477306baa44036af635
Author: Valentin Haenel <valentin.haenel@gmx.de>
Date:   Sat Jan 8 20:30:58 2011 +0100

Gitk presents tags as yellow, arrow-like boxes that are clearly distinguishable from the green, rectangular branches:

tag screenshot
Figure 5. Tags in Gitk

Lightweight Tags

To add a lightweight tag to the HEAD, pass the desired name to the command (in this example, to mark an important commit)

$ git tag api-aenderung
$ git tag
api-aenderung

To add a lightweight tag to the HEAD, pass the desired name to the command (in this example, to mark an important commit)

$ git tag pre-regression HEAD~23
$ git tag
api-aenderung
pre-regression

Tags are unique — if you try to recreate a tag, Git will abort with an error message:

$ git tag pre-regression
fatal: tag 'pre-regression' already exists

Annotated Tags

Annotated tags are created with the -a option. As with git commit, an editor will open and allow you to write the tag message. Or you can pass the tag message with the option -m — in which case the option -a is redundant:

$ git tag -m "Zweite Veröffentlichung" 0.2

Signed Tags

To verify a signed tag, use the -v (verify) option:

$ git tag -v v1.7.1
object d599e0484f8ebac8cc50e9557a4c3d246826843d
type commit
tag v1.7.1
tagger Junio C Hamano <gitster@pobox.com> 1272072587 -0700

Git 1.7.1
gpg: Signature made Sat Apr 24 03:29:47 2010 CEST using DSA key ID F3119B9A
gpg: Good signature from "Junio C Hamano <junkio@cox.net>"
...

Of course, this assumes that you have both GnuPG installed and that you have already imported the signer’s key.

In order to sign tags yourself, you must first set the preferred key:

$ git config --global user.signingkey <GPG-Key-ID>

Now you can create signed tags with the -s (sign) option:

$ git tag -s -m "Dritte Veröffentlichung" 3.0

Deleting and Overwriting Tags

Use the -d and -f options to delete or overwrite tags:

$ git tag -d 0.2
Deleted tag '0.2' (was 4773c73)

The options should be used with caution, especially if you use the tags not only locally, but also publish them. Under certain circumstances, tags may indicate different commits — version 1.0 in repository X points to a different commit than version 1.0 in repository Y. But see also [sec.remote-tags].

Lightweight vs. Annotated Tags

For public versioning of software, annotated tags are generally more useful. Unlike lightweight tags, they contain meta-information that shows who created a tag and when — the person contact is unique. Users of software can also find out who has approved a particular version. For example, it’s clear that Junio C. Hamano has tagged Git version 1.7.1 — so it has his “seal of approval”. The statement also confirms the cryptographic signature, of course. Lightweight tags, on the other hand, are particularly suitable for applying local markers, for example to identify certain commits relevant to the current task. However, make sure not to upload such tags to a public repository (see [sec.remote-tags]), as they might spread. If you only use the tags locally, you can also delete them once they have fulfilled their service (see above).

Non-Commit Tags

With tags you can mark any Git object, not only commits, but also trees, blobs and even tag objects themselves! The classic example is to put the GPG public key used by the maintainer of a project to sign tags in a blob.

For example, the tag junio-gpg-pub in the Git repository of Git points to the key of Junio C. Hamano:

$ git show junio-gpg-pub | head -5
tag junio-gpg-pub
Tagger: Junio C Hamano <junkio@cox.net>
Date:   Tue Dec 13 16:33:29 2005 -0800

GPG key to sign git.git archive.

Because this blob object is not referenced by any tree, the file is virtually separate from the actual code, but still exists in the repository. In addition, a tag on a “lonely” blob is necessary so that it is not considered unreachable and is deleted during repository maintenance.{fn34}

To use the key, proceed as follows:

$ git cat-file blob junio-gpg-pub | gpg --import
gpg: key F3119B9A: public key "Junio C Hamano <junkio@cox.net>" imported
gpg: Total number processed: 1
gpg:               imported: 1

You can then verify all tags in the Git-via-Git repository, as described above.

Describing Commits

Tags are very useful for describing any commit “better”. The git describe command gives a description consisting of the most recent tag and its relative position in the commit graph. Here’s an example from the git project: we describe a commit with the SHA-1 prefix 28ba96a, which is located in the commit graph seven commits after version 1.7.1:

describe screenshot
Figure 6. The commit to be described highlighted in gray
$ git describe --tags
v1.7.1-7-g28ba96a

The output of git describe is formatted as follows:

<tag>-<position>-g<SHA-1>

The tag is v1.7.1; the position indicates that there are seven new commits between the tag and the described commit.{fn35} The g before the ID indicates that the description is derived from a Git repository, which is useful in environments with multiple version control systems. By default, git describe only searches for annotated tags, but the --tags option extends the search to include lightweight tags.

The command is very useful because it translates a content-based identifier into something useful for humans: v1.7.1-7-g28ba96a is much closer to v1.7.1 than v1.7.1-213-g3183286. This allows you to compile the output directly into the software in a way that makes sense, just like in the Git project:

$ git describe
v1.7.5-rc2-8-g0e73bb4
$ make
GIT_VERSION = 1.7.5.rc2.8.g0e73bb
...
$ ./git --version
git version 1.7.5.rc2.8.g0e73bb

This way a user knows roughly what version he has, and can track which commit the version was compiled from.

Restoring Versions

The goal of version control software is not just to examine changes between commits. Above all, it is also important to restore older versions of a file or entire directory trees, or to undo changes. In Git, the commands checkout, reset, and revert are particularly useful for this.

The Git command checkout can not only change branches, but also restore files from previous commits. The syntax is general:

git checkout [-f] <referenz> -- <muster>

checkout resolves the given reference (and HEAD if missing) to a commit and extracts all files matching <pattern> to the working tree. If <pattern> is a directory, it refers to all files and subdirectories in it. Unless you explicitly specify a pattern, all files are checked out. Changes to a file are not simply overwritten, unless you specify the -f option (see above). HEAD is also set to the corresponding commit (or branch).

However, if you specify a pattern, checkout overwrites this file(s) without prompting. So to discard all changes to <file>, enter git checkout — <file>: Git then replaces <file> with the version in the current branch. This way, you can also reconstruct the older state of a file:

$ git checkout ce66692 -- <datei>

The double minus separates the patterns from the options or arguments. It is not necessary, however: If there are no branches or other references with that name, Git will try to find one. So the separation only makes it clear that you want to recover the file(s) in question.

To view the contents of a file from a particular commit without checking it out, use the following command:

$ git show ce66692:<file>
Tip

Use --patch or -p to call git checkout in interactive mode. The procedure is the same as for git add -p (see [sec.add-p]), but here you can reset hunks of a file step-by-step.

Detached HEAD

If you check out a commit that is not referenced by a branch, you are in detached-HEAD mode:

$ git checkout 3329661
Note: checking out '3329661'.

You are in 'detached HEAD' state. You can look around, make
experimental changes and commit them, and you can discard any
commits you make in this state without impacting any branches
by performing another checkout.

If you want to create a new branch to retain commits you create,
you may do so (now or later) by using -b with the checkout command
again. Example:

  git checkout -b new_branch_name

HEAD is now at 3329661... Add LICENSE file

As the explanation, which you can hide by setting the option advice.detachedHead to false, already warns you, changes you make now will be lost in case of doubt: Since your HEAD is the only direct reference to the commit after that, further commits are not directly referenced by a branch (they are unreachable, see above).

So working in detached HEAD mode is especially useful if you want to try something quickly: Has the bug actually already appeared in commit 3329661? Was there actually a README file at the time of 3329661?

Tip

If you want to do more than just look around from the commit you checked out, for example, to see if your software already had a particular bug at the time, you should create a branch:

$ git checkout -b <temp-branch>

Then you can make commits as usual without fear of losing them.

Rolling Back Commits

If you want to undo all the changes a commit makes, the revert command helps. However, it does not delete a commit, but creates a new one whose changes are exactly the opposite of the other commit: Deleted lines become added lines, and vice versa.

Suppose you have a commit that creates a LICENSE file. The patch of the corresponding commit looks like this:

--- /dev/null
+++ b/LICENSE
@@ -0,0 +1 @@
+This software is released under the GNU GPL version 3 or newer.

Now you can undo the changes:

$ git revert 3329661
Finished one revert.
[master a68ad2d] Revert "Add LICENSE file"
 1 files changed, 0 insertions(+), 1 deletions(-)
 delete mode 100644 LICENSE

Git creates a new commit on the current branch — unless you specify otherwise — with the description Revert "<Old commit message>". This commit looks like this:

$ git show
commit a68ad2d41e9219383449d703521573477ee7da48
Author: Julius Plenz <feh@mali>
Date:   Mon Mar 7 05:28:47 2011 +0100

    Revert "Add LICENSE file"

    This reverts commit 3329661775af3c52e6b2ad7e9e7e7d789ba62712.

diff --git a/LICENSE b/LICENSE
deleted file mode 100644
index 3fd9c20..0000000
--- a/LICENSE
+++ /dev/null
@@ -1 +0,0 @@
-This software is released under the GNU GPL version 3 or newer.

Note that from now on, both the commit and the revert will appear in the version history of a project. You therefore only undo the changes, but do not delete any information from the version history.

You should therefore only use revert if you need to undo a change that has already been published. However, if you are developing locally in a separate branch, it makes more sense to delete these commits completely (see the following section on reset and the topic Rebase, [sec.rebase]).

If you want to perform a rebase, but not for all changes to the commit, but only for those to a file, you can use this procedure:

$ git show -R 3329661 -- LICENSE | git apply --index
$ git commit -m 'Revert change to LICENSE from 3329661'

The git show command prints the changes from commit 3329661 that apply to the LICENSE file. The -R option causes the unified-diff format to be displayed “the other way around” (reverse). The output is passed to git apply to make the changes to the file and index. The changes are then checked in.

Another way to undo a change is to check out a file from a previous commit, add it to the index, and check it in again:

$ git checkout 3329661 -- <datei>
$ git add <datei>
$ git commit -m 'Reverting <datei> to resemble 3329661'

Reset and the Index

If you are deleting a commit completely, not just undoing it, use git reset. The reset command sets the HEAD (and thus the current branch), and optionally the index and working tree, to a particular commit. The syntax is git reset [<option>] [<commit>].

The most important types of resets are the following:

-⁠-⁠soft

Resets only the HEAD; index and working tree remain unaffected.

-⁠-⁠mixed

Default setting if you do not specify an option. Sets HEAD and index to the specified commit, but the files in the working tree are not affected.

-⁠-⁠hard

Synchronizes HEAD, Index and Working Tree and sets them to the same commit. Changes in the working tree may be lost!

If you call git reset without any options, this is equivalent to a git reset --mixed HEAD. We’ve already seen this command: Git sets the current HEAD to HEAD (so it doesn’t change it) and the index to HEAD — in this case, the changes you added before are lost.

The possible uses of this command are many and varied and will reappear in the various command sequences. Therefore it is important to understand the functionality, even if there are sometimes alternative commands that have the same effect.

Suppose you have made two commits to master that you actually want to move to a new branch to work on further. The following command sequence creates a new branch pointing to HEAD, and then resets HEAD and the current branch master two commits. Then check out the new branch <new-feature>.

$ git branch <neues-feature>
$ git reset --hard HEAD^^
$ git checkout <neues-feature>

Alternatively, the following sequence has the same effect: you create a Branch <new-feature> that points to the current commit. Then you delete master and re-create it so that it points to the second predecessor of the current commit.

$ git checkout -b <new-feature>
$ git branch -D master
$ git branch master HEAD^^

Using Reset

With reset you do not delete any commits, but only move references. As a result, the commits that are no longer referenced are lost, and are therefore deleted (unreachable). So you can use reset to delete only the topmost commits on a branch, not arbitrary commits “somewhere in the middle,” as this would destroy the commit graph. (For the somewhat more complicated deletion of commits “in the middle,” see rebase, [sec.rebase]).

Git always stores the original HEAD under ORIG_HEAD. So if you have performed a reset by mistake, use git reset --hard ORIG_HEAD to undo it (even if the commit was supposedly deleted). However, this does not affect lost changes to the working tree (which you have not yet checked in) — they are deleted irrevocably.

The result from above (moving two commits to a new branch) can also be achieved this way:

$ git reset --hard HEAD^^
$ git checkout -b <new-feature> ORIG_HEAD

A common use of reset is to discard changes on a test basis. You want to try a patch? Add some debugging output? Change a few constants? If you don’t like the result, a git reset --hard deletes all changes to the working tree.

You can also use reset to “make your version history nice.” For example, if you have a few commits on a branch <feature> based on master, but they are not well structured (or much too large), you can create a branch <reorder-feature> and pack all changes into new commits:

$ git checkout -b <reorder-feature> <feature>
$ git reset master
$ git add -p
$ git commit
$ ...

The command git reset master sets index and HEAD to the state of master. However, your changes in the working tree are preserved, i.e. all changes that distinguish the branch <feature> from master are now only contained in the files in the working tree. Now you can add the changes incrementally using git add -p and package them into (several) handy commits.{fn36}

Suppose you are working on a change and want to check it in temporarily (to continue working on it later). You can then use the following commands:

$ git commit -m 'feature (noch unfertig)'
(später)
$ git reset --soft HEAD^
(weiterarbeiten)

The command git reset --soft HEAD^ resets the HEAD one commit, but leaves the index and the working tree untouched. So all changes from your temporary commit are still in the index and working tree, but the actual commit is lost. You can now make further changes and create a new commit later. Similar functionality is provided by the --amend option for git commit, as well as the git stash command, which is explained in [sec.stash].

Merging Branches

Merging branches is called merging in Git; the commit that merges two or more branches together is called a merge commit.

Git provides the merge subcommand, which allows you to merge one branch into another. This means that any changes you make to the branch will be reflected in the current one.

Note that the command integrates the specified branch into the currently checked-out branch (i.e., HEAD). The command therefore only needs one argument:

$ git merge <branch-name>

If you handle your branches carefully, there should be no problems with merging. If there are, then this section also presents strategies for resolving merge conflicts.

First, we will look at an object-level merge process.

Two-Branches Merge

The two branches, topic and master, that you want to merge, each reference the most recent commit in a chain of commits (F and D), and these two commits in turn reference a tree (corresponding to the top-level directory of your project).

First, Git calculates a so-called merge base, that is, a commit that both of the commits to be merged have as common ancestors. Usually there are several such bases — in the diagram below, A and B — and then the most recent one (which has the other bases as ancestors) is used.{fn37} In simple terms, this is the commit where the branches diverged (i.e., B).

Now, if you want to merge two commits (D and F to M), then the trees referenced by the commits must be merged.

merge base commit
Figure 7. Merge base and merge commit

Git does this as follows:{fn38} If a tree entry (another tree or a blob) is the same in both commits, then that very tree entry will be taken over in the merge commit. This happens in two cases:

  1. A file has not been changed by either commit, or a subdirectory does not contain a changed file: In the first case, the blob SHA 1 sum of this file is the same in both commits. In the second case, the same tree object is referenced by both commits. The referenced blob or tree is therefore the same as the one referenced in the merge base.

  2. A file was changed on both sides and equivalently (same blobs). This happens, for example, if all changes to a file were copied from one branch using git cherry-pick (see Taking over Individual Commits: Cherry Picking). The referenced blob is then not the same as in the merge base.

If a tree entry disappears in one of the commits, but is still present in the other, and is the same as in the merge base, then it is not taken over. This is equivalent to deleting a file or directory if no changes have been made to the file on the other side. Similarly, if a commit brings a new tree entry, it is copied to the merge tree.

Now what happens if a file from the commits has different blobs, that is, the file has been changed at least on one side? In the event that one of the blobs is the same as in the merge base, only one side of the file has been changed, so Git can simply adopt those changes.

However, if both blobs are different from the merge base, you might run into problems. First, Git tries to apply the changes on both sides.

A 3-way merge algorithm is usually employed for this purpose. Unlike the classic 2-way merge algorithm, which is used when you have two different versions A and B of a file and want to merge them, this 3-way algorithm involves a third version C of the file, extracted from the above merge base. Therefore, because a common ancestor of the file is known, the algorithm can in many cases better (that is, not only based on the line number or context) decide how to merge changes. In practice, so many trivial merge conflicts are already solved automatically without user intervention.

However, there are conflicts that no merge algorithm, no matter how good, can merge. This happens, for example, if the context in version A of the file was changed just before a change in file B, or, worse still, version A and B and C have different versions of a line.

Such a case is called a merge conflict. Git merges all the files as best it can, and then presents the conflicting changes to the user so they can manually merge them (and thus resolve the conflict) (see Resolving Merge Conflicts).

Although it is basically possible to generate a syntactically correct resolution with an algorithm that is specially designed for the respective programming language, an algorithm cannot look beyond the semantics of the code, i.e., cannot grasp the meaning of the code. Therefore, a solution generated in this way would usually not make sense.

Fast Forward Merges: Fast Forwarding One Branch

The git merge command does not always create a merge commit. A trivial case, but one that does occur frequently, is the so-called fast-forward merge, i.e. a fast forward merge of the branch.

A fast forward merge occurs when a branch, for example topic, is the child of a second branch, master:

ff before
Figure 8. Before the fast forward merge

A simple git merge topic in Branch master now causes master to simply be moved forward — no merge commit is created.

ff after
Figure 9. After the fast forward merge — no merge commit was created

Of course, such a behavior only works if the two branches have not diverged, i.e. if the merge base of both branches is one of the two branches itself, in this case master.

This behavior is often desirable:

  1. You want to integrate upstream changes, that is, changes from another Git repository. You typically use a command like git merge origin/master to do this. A git pull will also perform a merge. To learn how to merge changes between git repositories, see [ch.distributed-git].

  2. You want to add an experimental branch. Because it’s quick and easy to create branches in Git, it’s a good idea to start a new branch for each feature. If you’ve tried something experimental on a branch and want to integrate it without being able to tell when it’s “time to integrate”, you can do so by fast-forwarding.

Tip

With the options --ff-only and --no-ff you can adjust the merge behavior. If you use the first option and the branches cannot be merged using fast-forward, Git will abort with an error message. The second option forces Git to create a merge commit even though fast forward would have been possible.

There are different opinions on whether changes should always be integrated via fast-forward or whether it is better to create a merge commit, although this is not absolutely necessary. The results are the same in both cases: Changes from one branch are integrated into another.

However, when you create a Merge-Commit, the integration of a feature becomes clear. Consider the following two excerpts from the version history of a project:

ff no ff vergleich
Figure 10. Integration of a feature with and without fast forward

In the above case, you cannot easily see which commits were previously developed in branch sha1-caching, that is, they have to do with a specific feature of the software.

In the lower version, however, you can see at first glance that there were exactly four commits on that branch, and that it was then integrated. Since nothing was developed in parallel, the merge commit would in principle be unnecessary, but it does make the integration of the feature clear.

Tip

So instead of relying on the magic of git merge, it makes sense to create two aliases (see [sec.git-alias]) that force or forbid fast forward merge:

nfm = merge --no-ff     # no-ff-merge
ffm = merge --ff-only   #    ff-merge

An explicit merge commit is also helpful because you can undo it with a single command. This is useful, for example, if you have integrated a branch but it has bugs: If the code is running in production, it is often desirable to merge the entire change back in until the bug is fixed. Use for this:

git revert -m 1 <merge-commit>

Git then produces a new commit that reverses any changes made by the merge. The -m 1 option here specifies which “side” of the merge should be considered the mainline, or stable line of development: its changes are preserved. In the above example, -m 1 would cause the changes made by the four commits from branch sha1-caching, the second string of the merge, to be undone.

Merge Strategies

Git has five different merge strategies, some of which can be further adjusted by strategy options. You determine the strategy by -s, so a merge call is as follows:

git merge -s <strategy> <branch>

Some of these strategies can only merge two branches, others any number.

resolve

The resolve strategy can merge two branches using a 3-way merge technique. The newest (best) of all possible bases is used as the merge base. This strategy is fast and generally produces good results.

recursive

This is the standard strategy that Git uses to merge two branches. A 3-way merge algorithm is also used here. However, this strategy is more clever than resolve: If several merge bases exist, all of which have “equal rights,”{fn39} then Git first merges these bases together, and then uses the result as the merge base for the 3-way merge algorithm. In addition to the fact that merges with file renames can be processed more easily as a result, a test run on the version history of the Linux kernel has shown that these strategies result in fewer merge conflicts than the resolve strategy. The strategy can be adapted by various options (see below).

octopus

Standard strategy when three or more branches are merged. In contrast to the two strategies mentioned above, the octopus strategy can only perform merges if no error occurs, i.e. if no manual conflict resolution is necessary. The strategy is especially designed to integrate many topic branches that are known to be compatible with the mainline (main development strand).

ours

Can merge any number of branches, but does not use a merge algorithm. Instead, the blobs or trees of the current branch (that is, the branch from which you entered git merge) are always used. This strategy is mainly used when you want to overwrite old developments with the current state of affairs.

subtree

Works like recursive, but the strategy does not compare the trees “on equal footing,” but tries to find the tree of one side as a subtree of the other side and only then merge them. This strategy is useful, for example, if you manage the Documentation/ subdirectory of your project in a separate repository. Then you can merge the changes from that repository into the master repository by using git pull -s subtree <documentation-repo> to apply the subtree strategy, which recognizes the contents of <documentation-repo> as a subdirectory of the master repository and applies the merge process only to that subdirectory. This topic is discussed in more detail in [sec.subprojects].

Options for the Recursive Strategy

The default strategy recursive knows several options that adjust the behavior especially with regard to conflict resolution. You specify them with the option -X; the syntax is:

git merge -s recursive -X <option> <branch>

If you only merge two branches, you do not need to explicitly specify the recursive strategy by -s recursive.

Since the strategy can only merge two branches, it is possible to speak of our version and theirs: our version is the checked-out branch in the merge process, while their version references the branch you want to integrate.

ours

If a merge conflict occurs that would normally need to be resolved manually, our version is used instead. The strategy option is different from ours, however, because it ignores any changes made by the other side(s). The ours option, on the other hand, takes all changes made by our side and the other side, and only gives priority in the event of a conflict and only at the points of conflict on our side.

theirs

Like ours, except that the opposite is true: in case of conflicts, their version is preferred.

ignore-space-change, ignore-all-space, ignore-space-at-eol

Since whitespace does not play a syntactic role in most languages, these options allow you to tell Git to try to resolve a merge conflict automatically if whitespace is not important. A common use case is when an editor or IDE has automatically reformatted source code.

The option ignore-space-at-eol ignores whitespace at the end of the line, which is especially helpful if both sides use different line-end conventions (LF/CRLF). If you specify ignore-space-change, whitespace is also treated as a pure separator: Thus, when comparing a line, it is irrelevant how many spaces or tabs are in one place — indented lines remain indented, and separated words remain separated. The option ignore-all-space ignores any whitespace.

This is the general strategy: If their version brings in only whitespace changes covered by the specified option, they are ignored and our version is used; if they bring in further changes and our version has only whitespace changes, their version is used. However, if both sides have not only whitespace changes, there is still a merge conflict.

In general, after a merge that you could only solve by using one of these options, it is recommended to normalize the corresponding files again, i.e. to make the line endings and indentations uniform.

subtree=<tree>

Similar to the subtree strategy, but an explicit path is specified here. Similar to the above example, you would use:

git pull -Xsubtree=Documentation <documentation-repo>

Resolving Merge Conflicts

As already described, some conflicts cannot be resolved by algorithms — in this case manual rework is necessary. Good team coordination and fast integration cycles can minimize major merge conflicts. But especially in early development, when possibly the internals of a software are changed instead of adding new features, conflicts can occur.

If you are working in a larger team, the developer who has done most of the work on the conflicted code is usually responsible for finding a solution. However, such a conflict resolution is usually not difficult if the developer has a good overview of the software in general and of his piece of code and its interaction with other parts in particular.

We will go through the solution of a merge conflict using a simple example in C. Take a look at the following output.c file:

int i;

for(i = 0; i < nr_of_lines(); i++)
    output_line(i);

print_stats();

The piece of code goes through all lines of an output and outputs them one after the other. Finally it returns a small statistic.

Now two developers change something in this code. The first one, Axel, writes a function that wraps the lines before they are output and replaces output_line in the above piece of code with his improved version output_wrapped_line:

int i;
int tw = 72;

for(i = 0; i < nr_of_lines(); i++)
    output_wrapped_line(i, tw);

print_stats();

The second developer, Beatrice, modifies the code so that her newly introduced configuration setting max_output_lines is honored and not too many lines are output:

int i;

for(i = 0; i < nr_of_lines(); i++) {
    if(i > config_get("max_output_lines"))
        break;
    output_line(i);
}

print_stats();

So Beatrice uses the “obsolete” version output_line, and Axel does not yet have the construct that checks the configuration setting.

Now Beatrice tries to transfer her changes on Branch B to the branch master, where Axel has already integrated his changes:

$ git checkout master
$ git merge B
Auto-merging output.c
CONFLICT (content): Merge conflict in output.c
Automatic merge failed; fix conflicts and then commit the result.

In the output.c file, Git now places conflict markers, highlighted in semi-bold at the bottom to indicate where changes overlap. There are two pages: The first is HEAD, i.e. the branch to which Beatrice wants to apply the changes — in this case master. The other side is the branch to be integrated — B. The two sides are separated by a series of equal signs:

int i;
int tw = 72;

<<<<<<< HEAD
for(i = 0; i < nr_of_lines(); i++)
    output_wrapped_line(i, tw);
=======
for(i = 0; i < nr_of_lines(); i++) {
    if(i > config_get("max_output_lines"))
        break;
    output_line(i);
}
>>>>>>>

print_stats();

It should be noted here that only the actual conflicting changes are objected to by Beatrice. Axel’s definition of tw above is accepted without any problems, although it is not yet available in Beatrice.

Beatrice must now resolve the conflict. This is done by first editing the file directly, modifying the code as it should be, and then removing the conflict markers. If Axel has documented in detail in his commit message{fn40} how his new function works, this should be done quickly:

int i;
int tw = 72;

for(i = 0; i < nr_of_lines(); i++) {
    if(i > config_get("max_output_lines"))
        break;
    output_wrapped_line(i, tw);
}

print_stats();

Beatrice must then add the changes using git add. If no conflict markers remain in the file, Git will indicate that a conflict has been resolved. Finally, the result has to be checked in:

$ git add output.c
$ git commit

The commit message should definitely state how this conflict was resolved. It should also mention possible side effects on other parts of the program.

Normally, merge commits are “empty”, i.e., there is no diff output in git show (because the changes were caused by other commits). This is different in the case of a merge commit that resolves a conflict:

$ git show
commit 6e6c55810c884356402c078f30e45a997047058e
Merge: f894659 256329f
Author: Beatrice <beatrice@gitbu.ch>
Date:   Mon Feb 28 05:59:36 2011 +0100

    Merge branch 'B'

    * B:
      honor max_output_lines config option

    Conflicts:
        output.c

diff --cc output.c
index a2bd8ed,f4c8bec..e39e39d
--- a/output.c
+++ b/output.c
@@@ -1,7 -1,9 +1,10 @@@
  int i;
 +int tw = 72;

- for(i = 0; i < nr_of_lines(); i++)
+ for(i = 0; i < nr_of_lines(); i++) {
+     if(i > config_get("max_output_lines"))
+         break;
 -    output_line(i);
 +    output_wrapped_line(i, tw);
+ }

  print_stats();

This combined diff output differs from the usual unidiff format: There is not only one column with the markers for added (+), removed (-) and context or unchanged (), but two. So Git compares the result with both ancestors. The lines changed in the second column are exactly the same as Axel’s commit; the (semi-bold) changes in the first column are Beatrice’s commit including conflict resolution.

The default way, as seen above, is the following:

  1. Open conflicting file

  2. Resolve conflict, remove markers

  3. Mark file as “resolved” via git add

  4. Repeat steps one to three for all files where conflicts occurred

  5. Check in conflict solutions via git commit

If you don’t know how to resolve the conflict on an ad hoc basis (for example, if you want to hire the original developer to produce a conflict-free version of the code), you can use git merge --abort to abort the merge process — that is, to restore your working tree to the state it was in before you initiated the merge. This command also aborts a merge that you have already partially resolved. Attention: All changes that have not been checked in will be lost.

Tip

To get an overview of which commits caused changes to your file relevant to the merge conflict, you can use the command

git log --merge -p -- <file>

Git then lists the diffs of commits that have made changes to <file> since the merge base.

If you are in a merge conflict, a file with conflicts is stored in three stages: Stage one contains the version of the file in the merge base (that is, the common original version of the file), stage two contains the version from the HEAD (that is, the version from the branch into which you are merging). Finally, stage three contains the file in the version of the branch you are merging into (this has the symbolic reference MERGE_HEAD). The working tree contains the combination of these three stages with conflict markers. However, you can display these versions with git show :<n>:<file>:

$ git show :1:output.c
$ git show :2:output.c
$ git show :3:output.c

With a program specially developed for 3-way merges, however, it is much easier for you to keep an overview. The program looks at the three stages of a file, visualizes them accordingly and offers you options to move changes back and forth.

Help with Merging: Mergetool

In the case of non-trivial merge conflicts, a merge tool is recommended that visualizes the three stages of a file accordingly, thereby facilitating the resolution of the conflict.

Common IDEs and editors such as Vim and Emacs offer such a mode. There are also external tools such as KDiff3{fn41} and Meld.{fn42} The latter visualizes particularly well how a file has changed between commits.

meld example
Figure 11. The example merge conflict, visualized in the merge tool “Meld”

You launch such a merge tool via git mergetool. Git will go through all the files that contain conflicts and display each one (when you press enter) in a merge tool. By default this is Vimdiff.{fn43}

Such a program will usually display the three versions of a file — our page, their page, and the file merged as far as possible, including conflict markers — in three columns side by side, the latter sensibly in the middle. It is always essential that you make the change (conflict resolution) in the middle file, i.e. in the working copy. The other files are temporary and are deleted again when the merge tool is finished.

In principle, you can use any other tool. The mergetool script simply stores the three stages of the file with the corresponding file name and starts the diff tool on these three files. If it quits again, Git checks to see if there are any conflict markers left in the file — if not, Git will assume that the conflict was resolved successfully and automatically add the file to the index using git add. Finally, when you have finished processing all the files, you only need to make one commit call to seal the conflict resolution.

The merge.tool option determines which tool Git starts on the file. The following commands are already preconfigured, meaning that Git already knows in which order the program expects the arguments and which additional options need to be specified:

araxis bc3 codecompare deltawalker diffmerge diffuse
ecmerge emerge gvimdiff gvimdiff2 gvimdiff3 kdiff3
meld opendiff p4merge tkdiff tortoisemerge
vimdiff vimdiff2 vimdiff3 xxdiff

To use your own merge tool, you must set merge.tool to a suitable name, for example mymerge, and then at least specify the mergetool.mymerge.cmd option. The shell evaluates the expression stored in it, and the variables BASE, LOCAL, REMOTE, and MERGED, which are contained in the file with the conflict markers, are set to the corresponding temporary files. You can further configure the properties of your merge command, see the git-config(1) man page in the mergetool configuration section.

Tip

If you temporarily (not permanently) decide to use another merge program, specify it with the -t <tool> option. So to try Meld, during a merge conflict, simply type git mergetool -t meld — of course Meld must be installed for this to work.

Rerere: Reuse Recorded Resolution

Git has a relatively unknown (and poorly documented), but very helpful feature: Rerere, short for Reuse Recorded Resolution. You need to set the rerere.enabled option to true to have the command called automatically (note the d at the end of enabled).

The idea behind Rerere is simple but effective: Whenever a merge conflict occurs, Rerere automatically records a pre-image, an image of the conflict file including markers. In the case of the example above, it would look like this:

$ git merge B
Auto-merging output.c
CONFLICT (content): Merge conflict in output.c
Recorded preimage for 'output.c'
Automatic merge failed; fix conflicts and then commit the result.

If the conflict is resolved as above and the solution is checked in, Rerere saves the conflict resolution:

$ vim output.c
$ git add output.c
$ git commit
Recorded resolution for 'output.c'.
[master 681acc2] Merge branch 'B'

So far Rerere has not really helped. But now we can delete the merge commit completely (and are back to the situation before the merge). Then we execute the merge again:

$ git reset --hard HEAD^
HEAD is now at f894659 wrap output at 72 chars
$ git merge B
Auto-merging output.c
CONFLICT (content): Merge conflict in output.c
Resolved 'output.c' using previous resolution.
Automatic merge failed; fix conflicts and then commit the result.

Rerere notices that the conflict is known and that a solution has already been found.{fn44} So Rerere calculates a 3-way-merge between the saved pre-image, the saved solution and the version of the file in the working tree. This way Rerere can resolve not only the same conflicts, but also similar ones (if in the meantime further lines outside the conflict area have been changed).

The result is not directly added to the index. The solution is simply copied to the file. You can then use git diff to check whether the solution looks useful, run tests if necessary, etc. If everything looks good, you can use the automatic solution via git add as usual.

Why Rerere Makes Sense

One might object: Who voluntarily takes the risk of deleting an already (possibly costly) resolved merge conflict in order to want to repeat it at some point?

However, the procedure is desirable: First of all, it doesn’t make sense to simply periodically and out of habit merge the mainline — i.e. the main development thread, e.g. master — into the topic branch (we will come back to this later). But if you have a long-lived topic branch and want to test it occasionally to see if it is compatible with the mainline, you don’t want to resolve the conflicts by hand every time — once resolved, Rerere will resolve conflicts automatically. This way you can successively develop your feature, knowing that it is in conflict with the mainline. But at the time of the integration of the feature the conflicts are all automatically resolvable (because you have occasionally saved conflict solutions with Rerere).

In addition, Rerere is also called automatically in conflict cases that arise in a rebase process (see [sec.rebase]). Again, once conflicts have been resolved, they can be automatically resolved again. Once you have merged a branch into the mainline for test purposes and resolved a conflict, this solution is automatically applied when you rebuild this branch on the mainline via rebase.

Using Rerere

In order for the Rere functionality to be used, you must set the rerere.enabled option to true, as mentioned above. Rerere will then be called automatically when a merge conflict occurs (to capture the pre-image, possibly to resolve the conflict) and when a conflict resolution is checked in (to save the resolution).

Rerere stores information such as pre-image and resolution in .git/rr-cache/, uniquely identified by a SHA-1 sum. You almost never need to call the git rerere subcommand, as it is already handled by merge and commit. You can also use git rerere gc to delete very old solutions.

What happens if a wrong conflict resolution was checked in? Then you should delete the conflict resolution, otherwise Rerere will reapply the solution when you repeat the conflicted merge. To do this, there is the command git rerere forget <file> — directly after Rerere has checked in a wrong solution, you can delete the wrong solution in this way and restore the original state of the file (i.e. with conflict markers). If you only want to do the latter, a git checkout -m <file> will also help.

Avoiding Conflicts

Decentralized version control systems generally manage merges much better than central ones. This is mainly due to the fact that it is common practice in decentralized systems to check in many small changes locally first. This avoids “monster commits”, which offer much more potential for conflict. This finer granular development history and the fact that merges are usually data in the version history (as opposed to simply copying the lines of code) mean that decentralized systems do not have to look at the mere contents of files when merging.

Prevention is the best way to minimize merge conflicts. Make small commits! Combine your changes so that the resulting commit makes sense as a unit. Always build Topic Branches on the latest release. Merge from topic branches into “collection branches” or directly into master, not the other way around.{fn45} Using Rerere prevents conflicts that have already been resolved from constantly reoccurring.

Obviously, good communication among developers is also important for prevention: If several developers implement different and mutually influencing changes to the same function, this will certainly lead to conflicts sooner or later.

Another factor that unfortunately often leads to unnecessary(!) conflicts is autogenerated content. Suppose you write the documentation of a software in AsciiDoc{fn46} or work on a LaTeX project with several contributors: Never add the compiled man pages or the compiled DVI/PS/PDF to the repository! In the autogenerated formats, small changes to the plaintext (i.e. in the Ascii or LaTeX version) can cause large (and unpredictable) changes to the compiled formats that Git will not resolve adequately. Instead, it makes sense to provide appropriate Makefile targets or scripts to generate the files, and possibly keep the compiled version on a separate branch.{fn47}

Taking over Individual Commits: Cherry Picking

It will happen that you don’t want to integrate an entire branch directly, but rather parts, i.e. individual commits, first. The cherry-pick (“pick the good cherries”) git command is responsible for this.

The command expects one or more commits to be copied to the current branch. For example:

$ git cherry-pick d0c915d
$ git cherry-pick topic~5 topic~1
$ git cherry-pick topic~5..topic~1

The middle command copies two explicitly specified commits; the last command, on the other hand, copies all commits belonging to the specified commit range.

Unlike a merge, however, only the changes are integrated, not the commit itself. To do this, it would have to reference its predecessor, so that the predecessor would also have to be integrated, and so on, which is equivalent to a merge. So when you take over commits with cherry-pick, new commits are created with a new commit ID. Git can’t know that these commits are actually the same.

So if you are merging two branches that you have cherry-picked changes between, conflicts can occur.{fn48} These are usually trivial to resolve, and the strategy options ours and theirs might be helpful (see Options for the Recursive Strategy). The rebase command, on the other hand, recognizes such commit duplications,{fn49} and omits the duplicated commits. This allows you to take some commits “from the middle” and then rebuild the branch the commits came from.

The cherry-pick command also understands these merge strategy options itself: If you want to copy a commit to the current branch, and if you want to make sure the new commit is right in case of conflict, use:

git cherry-pick -Xtheirs <commit>
Tip

The -n or --no-commit option tells Git to commit the changes from a commit to the index, but not to make a commit yet. This allows you to “aggregate” several small commits into the index first, and then package them as one commit:

$ git cherry-pick -n 785aa39 512f3e9 4e4a063
Finished one cherry-pick.
Finished one cherry-pick.
Finished one cherry-pick.
$ git commit -m "Diverse kleine Änderungen"

Visualizing Repositories

When you have created and merged some branches, you will have noticed that the following is the case: it’s easy to lose track.

The arrangement of commits and their relationships to each other is called the topology of a repository. In the following, we will introduce the graphical program gitk, among other things, to examine these topologies.

For small repositories, first call gitk --all, which displays the entire repository as a graph. Clicking on the individual commits displays the meta-information as well as the generated patch.

Revision Parameters

Since the listing of multiple commits is hard to keep track of, we examine a small sample repository with several branches merged together:

revision list commit graph gitk
Figure 12. The graph of commits as displayed in gitk

We recognize four branches (A-D) and one tag release. We can also display this tree on the console with the appropriate command line options using the log command (branch and tag names are printed in semi-bold for better distinction):

$ git log --decorate --pretty=oneline --abbrev-commit --graph --all
* c937566 (HEAD, D) commit on branch D
| *   b0b30ef (release, A) Merge branch 'C' into A
| |\
| | * 807db47 (C) commit on branch C
| | * 996a53b commit on branch C
| |/
|/|
| * 83f6bf3 commit on branch A
| *   5b2c291 Merge branch 'B' into A
| |\
| | * 2417cf7 (B) commit on branch B
| |/
|/|
| * 0bf1433 commit on branch A
|/
* 4783886 initial commit
Tip

The output of the log command is equivalent to the view in Gitk. However, git log is much faster than Gitk and does not require another program window.

So for a quick overview, it’s much more convenient to set up an alias that automatically adds the many long options. The authors use the alias tree for this, which you can define as follows:

$ git config --global alias.tree \'log --decorate \
   --pretty=oneline --abbrev-commit --graph'

By using git tree --all you get an ASCII version of the graph of the git repository. In the following, we use this alias to represent the topology.

Now we change the above command: instead of the --all option, which puts all commits in the tree, we now specify B (the name of the branch)

$ git tree B
* 2417cf7 (B) commit on branch B
* 4783886 initial commit

We receive all commits that are accessible from B. A commit only knows its predecessor(s) (several if branches are merged). “All commits reachable from B” thus refers to the list of commits from B onwards, up to a commit that has no predecessor (called a root commit).

Instead of one, the command can also accept multiple references. So to get the same output as with the --all option, you must specify references A, B, and D. C can be omitted because the commit is already “collected” on the way from A to the root commit.

Of course, you can also specify an SHA-1 sum directly instead of symbolic references:

$ git tree 5b2c291
*   5b2c291 Merge branch 'B' into A
|\
| * 2417cf7 (B) commit on branch B
* | 0bf1433 commit on branch A
|/
* 4783886 initial commit

If a reference is preceded by a caret (^), this negates the meaning.{fn50} So the notation ^A means: not the commits that are accessible from A. However, this switch only excludes these commits, but not the others. So the above log command with the argument ^A will not output anything, because Git only knows which commits should not be displayed. So again, we add --all to list all commits, minus those that are accessible from A:

$ git tree --all ^A
* c937566 (HEAD, D) commit on branch D

An alternative notation is available with --not: Instead of ^A you can also write --not A.

Such commands are especially useful for examining the difference between two branches: Which commits are in branch D that are not in A? The command returns the answer:

$ git tree D ^A
* c937566 (HEAD, D) commit on branch D

Because this question is often asked, there is another, more intuitive notation for it: A..D is equivalent to D ^A:

$ git tree A..D
* c937566 (HEAD, D) commit on branch D

Of course the order is important here: “D without A” is a different set of commits than “A without D”! (Compare also the complete graph.)

In our example there is a tag release. To check which commits from branch D (which could stand for “Development”) are not yet included in the current release, simply specify release..D.

Tip

The syntax A..B can be remembered as the idiom “from A to B”. However, this “difference” is not symmetrical, i.e. A..B are usually not the same commits as B..A.

Alternatively, Git provides the symmetrical difference A..B. It is equivalent to the argument A B --not $(git merge-base A B), so it includes all the commits that can be reached from A or B, but not both.

Reference vs. List of References

In the example, A always refers to all commits that are accessible from A. But actually a branch is just a reference to a single commit. So why does log always list all commits reachable from A, while the git command show with the argument A only shows this one commit?

The difference is what the commands expect as an argument: show expects an object, that is, a reference to a single object, which is then displayed.{fn51} Many other commands expect one (or more) commits instead, and these commands convert the arguments into a list of commits (traversing the list until the root commit).

Gitk

Gitk is a graphical program implemented in Tcl, which is usually packaged by distributors along with the actual Git commands — so you can be sure to find it on almost any system.

It represents individual commits or the entire repository in a three-part view: at the top is the tree structure with two additional columns for author and date, below is a list of changes in unified diff format, and a list of files to restrict the changes displayed.

The graph view is intuitive: Different colors help to distinguish the different version strings. Commits are always blue dots, with two exceptions: The HEAD is highlighted in yellow, and a commit that is not a root commit, but whose predecessor is not displayed, is shown in white.

Branches with an arrowhead indicate that further commits have been made on the branch. However, Gitk hides the branch due to the time distance between commits. A click on the arrowhead will take you to the continuation of the branch.

Branches appear as green labels, the currently checked out branch additionally bold. Tags are shown as yellow arrows.

You can delete or check out a branch with a right click on it. Right-clicking on commits opens a menu in which you can perform actions on the selected commit. The only thing that might be easier to do with Gitk than from the command line is cherry picking, i.e. transferring individual commits to another branch (see also Taking over Individual Commits: Cherry Picking).

gitk
Figure 13. Complex topology in Gitk

Gitk accepts essentially the same options as git log. Some examples:

$ gitk --since=yesterday -- doc/
$ gitk e13404a..48effd3
$ gitk --all -n 100

The first command shows all commits since yesterday that have made changes to a file under the doc/ directory. The second command limits the commits to a specific range, while the third command shows the 100 most recent commits from all branches.

Tip

Experience shows that beginners are often confused because gitk by default only shows the current branch. This is probably because gitk is often called to get an overview of all branches. Therefore the following shell alias is useful: alias gik='gitk --all'.

Many users leave gitk open during work. Then it’s important to update the display from time to time so that more recent commits appear. With kbd:[F5] (Update) you load all new commits and refresh the display of the references. Sometimes, however, if you delete a branch, for example, this is not enough. Although the branch is no longer displayed, there may still be unreachable commits in the GUI as artifacts. The key combination kbd:[Ctrl+F5] (Reload) completely reloads the repository, which solves the problem.

As an alternative to gitk, you can use the GTK-based gitg or Qt-based qgit on UNIX systems; on an OS X system, for example, you can use GitX; for Windows, you can use GitExtensions. Some IDEs now also have corresponding visualizations (e.g. the Eclipse plugin EGit). Furthermore, you can use full-fledged Git clients like Atlassian SourceTree (OS X, Windows; free of charge), Tower (OS X; commercial) as well as SmartGit (Linux, OS X and Windows; free for non-commercial use).

Reflog

The Reference Log (Reflog) are log files that Git creates for each branch and HEAD. They store when a reference was moved from where to where. This happens especially with the checkout, reset, merge and rebase commands.

These log files are stored under .git/logs/ and are named after the reference. The reflog for the master branch can be found under .git/logs/refs/heads/master. There is also the command git reflog show <reference> to list the reflog:

$ git reflog show master
48effd3 master@{0}: HEAD^: updating HEAD
ef51665 master@{1}: rebase -i (finish): refs/heads/master onto 69b9e27
231d0a3 master@{2}: merge @{u}: Fast-forward
...

The Reflog command is rarely used directly and is just an alias for git log -g --oneline. In fact, the -g option causes the command not to show the predecessors in the commit graph, but to process the commits in the order in which they were reflogged.

You can easily try this: Create a test commit, then delete it again with git reset --hard HEAD^. The command git log -g will now first show the HEAD, then the deleted commit, and then the HEAD again.

The reflog thus also references commits that are otherwise no longer referenced, i.e. are “lost” (see Managing Branches). The reflog might help you if you have deleted a branch that you would have needed after all. Although a git branch -D also deletes the branch’s reflog. However, you had to check out the branch to commit to it, so use git log -g HEAD to find the last time you checked out the branch you were looking for. Then create a branch that points to this (seemingly lost) commit ID, and your lost commits should be back.{fn52}

Commands that expect one or more references can also implicitly use Reflog. In addition to the syntax already found in the output of git log -g (e.g. HEAD@{1} for the previous position of the HEAD), Git also understands <ref>@{<when>}. Git interprets the time <when> as an absolute or relative date and then consults the reflog of the corresponding reference to find out what the next log entry in time is. This is then referenced.

Two examples:

$ git log 'master@{two weeks ago}..'
$ git show '@{1st of April, 2011}'

The first command lists all commits between HEAD and the commit the master branch pointed to two weeks ago (note the suffix .. which means a commit range up to HEAD). This doesn’t necessarily have to be a commit that is two weeks old: if you test moved the branch to the very first commit in the repository two weeks ago using git reset --hard <initial-commit>, then that very commit will be referenced.{fn53}

The second line shows the commit to which the currently checked out branch (due to missing explicit reference before the @) pointed on April 1, 2011. In both commands, the argument with a Reflog attachment must be enclosed in quotation marks to make sure Git gets the argument completely.

Note that the reflog is only available locally and therefore does not belong to the repository. If you send a commit ID or tag name to another developer, it references the same commit, but a master@{yesterday} can reference different commits depending on the developer.

Tip

If you don’t specify a branch and time, Git will assume HEAD. This allows you to use @ as the short form for HEAD in commands. Furthermore, many commands understand the argument - as @{-1}, which is “last position of HEAD”:

$ git checkout feature   # vorher auf "master"
$ git commit ...         # Änderungen, Commits machen
$ git checkout -         # zurück auf "master"
$ git merge -            # Merge von "feature"