Skip to content

Docs: Add commit-graph tech docs to Makefile #22

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Documentation/Makefile
Original file line number Diff line number Diff line change
@@ -69,6 +69,8 @@ API_DOCS = $(patsubst %.txt,%,$(filter-out technical/api-index-skel.txt technica
SP_ARTICLES += $(API_DOCS)

TECH_DOCS += SubmittingPatches
TECH_DOCS += technical/commit-graph
TECH_DOCS += technical/commit-graph-format
TECH_DOCS += technical/hash-function-transition
TECH_DOCS += technical/http-protocol
TECH_DOCS += technical/index-format
59 changes: 31 additions & 28 deletions Documentation/technical/commit-graph.txt
Original file line number Diff line number Diff line change
@@ -40,32 +40,32 @@ Values 1-4 satisfy the requirements of parse_commit_gently().

Define the "generation number" of a commit recursively as follows:

* A commit with no parents (a root commit) has generation number one.
* A commit with no parents (a root commit) has generation number one.

* A commit with at least one parent has generation number one more than
the largest generation number among its parents.
* A commit with at least one parent has generation number one more than
the largest generation number among its parents.

Equivalently, the generation number of a commit A is one more than the
length of a longest path from A to a root commit. The recursive definition
is easier to use for computation and observing the following property:

If A and B are commits with generation numbers N and M, respectively,
and N <= M, then A cannot reach B. That is, we know without searching
that B is not an ancestor of A because it is further from a root commit
than A.
If A and B are commits with generation numbers N and M, respectively,
and N <= M, then A cannot reach B. That is, we know without searching
that B is not an ancestor of A because it is further from a root commit
than A.

Conversely, when checking if A is an ancestor of B, then we only need
to walk commits until all commits on the walk boundary have generation
number at most N. If we walk commits using a priority queue seeded by
generation numbers, then we always expand the boundary commit with highest
generation number and can easily detect the stopping condition.
Conversely, when checking if A is an ancestor of B, then we only need
to walk commits until all commits on the walk boundary have generation
number at most N. If we walk commits using a priority queue seeded by
generation numbers, then we always expand the boundary commit with highest
generation number and can easily detect the stopping condition.

This property can be used to significantly reduce the time it takes to
walk commits and determine topological relationships. Without generation
numbers, the general heuristic is the following:

If A and B are commits with commit time X and Y, respectively, and
X < Y, then A _probably_ cannot reach B.
If A and B are commits with commit time X and Y, respectively, and
X < Y, then A _probably_ cannot reach B.

This heuristic is currently used whenever the computation is allowed to
violate topological relationships due to clock skew (such as "git log"
@@ -85,8 +85,11 @@ have generation number represented by the macro GENERATION_NUMBER_ZERO = 0.
Since the commit-graph file is closed under reachability, we can guarantee
the following weaker condition on all commits:

If A and B are commits with generation numbers N amd M, respectively,
and N < M, then A cannot reach B.
[quote]
_____________________________________________________________________
If A and B are commits with generation numbers N amd M, respectively,
and N < M, then A cannot reach B.
_____________________________________________________________________

Note how the strict inequality differs from the inequality when we have
fully-computed generation numbers. Using strict inequality may result in
@@ -121,11 +124,8 @@ Future Work
- After computing and storing generation numbers, we must make graph
walks aware of generation numbers to gain the performance benefits they
enable. This will mostly be accomplished by swapping a commit-date-ordered
priority queue with one ordered by generation number. The following
operations are important candidates:

- 'log --topo-order'
- 'tag --merged'
priority queue with one ordered by generation number. Commands that could
improve include 'git log --topo-order' and 'git tag --merged'.

- A server could provide a commit graph file as part of the network protocol
to avoid extra calculations by clients. This feature is only of benefit if
@@ -148,13 +148,16 @@ Related Links
More discussion about generation numbers and not storing them inside
commit objects. A valuable quote:

"I think we should be moving more in the direction of keeping
repo-local caches for optimizations. Reachability bitmaps have been
a big performance win. I think we should be doing the same with our
properties of commits. Not just generation numbers, but making it
cheap to access the graph structure without zlib-inflating whole
commit objects (i.e., packv4 or something like the "metapacks" I
proposed a few years ago)."
[quote, Jeff "Peff" King]
____________________________________________________________________
I think we should be moving more in the direction of keeping
repo-local caches for optimizations. Reachability bitmaps have been
a big performance win. I think we should be doing the same with our
properties of commits. Not just generation numbers, but making it
cheap to access the graph structure without zlib-inflating whole
commit objects (i.e., packv4 or something like the "metapacks" I
proposed a few years ago).
____________________________________________________________________

[4] https://public-inbox.org/git/20180108154822.54829-1-git@jeffhostetler.com/T/#u
A patch to remove the ahead-behind calculation from 'status'.