This repository has been archived by the owner on Sep 11, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 540
Implement APIs for Git serialized commit graphs #965
Labels
Comments
Example of walking the commit graph with the above APIs: func getLastCommitForPaths(r *git.Repository, c *object.Commit, paths []string) ([]*object.Commit, error) {
commitIndex := r.CommitNodeIndex()
cIter := object.NewCommitNodeIterCTime(c, commitIndex, nil, nil)
result := make([]*object.Commit, len(paths))
remainingResults := len(paths)
cTree, err := c.Tree()
if err != nil {
return nil, err
}
currentEntryHashes := make([]plumbing.Hash, len(paths))
for i, path := range paths {
cEntry, err := cTree.FindEntry(path)
if err != nil {
return nil, err
}
currentEntryHashes[i] = cEntry.Hash
}
cIter.ForEach(func(current object.CommitNode) error {
newEntryHashes := make([]plumbing.Hash, len(paths))
err := commitIndex.ParentNodes(current).ForEach(func(parent object.CommitNode) error {
parentTree, err := parent.Tree()
if err != nil {
return err
}
for i, path := range paths {
// skip path if we already found it
if currentEntryHashes[i] != plumbing.ZeroHash {
// find parents that contain the path
if parentEntry, err := parentTree.FindEntry(path); err == nil {
// if the hash for the path differs in the parent then the current commit changed it
if parentEntry.Hash == currentEntryHashes[i] {
newEntryHashes[i] = currentEntryHashes[i]
} else {
// mark for saving the result below
newEntryHashes[i] = plumbing.ZeroHash
// stop any further processing for this file
currentEntryHashes[i] = plumbing.ZeroHash
}
}
}
}
return nil
})
if err != nil {
return err
}
// if a file didn't exist in any parent commit then it must have been created in the
// current one. also we mark changed files in the loop above as not present in the
// parent to simplify processing
var currentCommit *object.Commit
for i, newEntryHash := range newEntryHashes {
if newEntryHash == plumbing.ZeroHash && result[i] == nil {
if currentCommit == nil {
var err error
currentCommit, err = commitIndex.Commit(current)
if err != nil {
return err
}
}
result[i] = currentCommit
remainingResults--
}
}
if remainingResults == 0 {
return storer.ErrStop
}
currentEntryHashes = newEntryHashes
return nil
})
return result, nil
} |
I'm sharing my proof of concept implementation in hope that at some point a proper API could be implemented. It could be useful as a base for code reading/writing the |
I added few more things to my branch:
|
This was referenced Apr 21, 2019
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
The Git serialized commit graph should offer speed-up for certain types of history traversal. It was added as experimental feature in Git 2.18 and is greatly detailed in the blog series by Derrick Stolee. It would be nice if
go-git
could offer access to the feature, both at low-level (verifying, building and querying the commit graph file) and high-level (speeding up log traversals with it, if applicable).I've started a proof-of-concept code at https://github.com/filipnavara/go-git/compare/perf-read...filipnavara:commitgraph?expand=1. However, the API is far from nice, the code is not tested (besides running it on my own repository) and it is generally messy.
Since
go-git
currently definesCommit
objects asstruct
I was left with no other choice but to introduce aCommitNode
interface, which is watered down version ofCommit
as present in the serialized commit graph. In ideal world there would be only oneCommit
interface and the commit graph implementation of it would lazy-load the realCommit
objects if necessary.Here's a rough approch of my implementation:
At the lowest level there is
commitgraph
package, which provides theNode
andIndex
interfaces representing the data at the file level (https://github.com/git/git/blob/2d3b1c576c85b7f5db1f418907af00ab88e0c303/Documentation/technical/commit-graph-format.txt). There is implementation of the interface using random-access files / memory-mapped files (FileIndex
), in-memory implementation (MemoryIndex
) and anEncoder
, which can write down the memory index into new file.Building a memory index from all commits reachable from an existing commit could be accomplished by a code like this:
The access to the index is exposed through
CommitGraphStorer
interface in thestorer
package that is implemented by the memory storage and filesystem storage. The file system storage implements it using memory-mapped files for reading and using the encoder for writing. [Note to myself: The memory mapped files are currently not closed with the repository!]In the
object
packageCommitNode
andCommitNodeIndex
interfaces are added. TheCommitNode
interface is implemented by existingCommit
structure and also a new lightweightgraphCommitNode
. TheCommitNodeIndex
interface provides methods for looking up commit parents and getting a fullCommit
object fromCommitNode
. Two implementations ofCommitNodeIndex
exist. The first one isobjectCommitNodeIndex
, which only usesCommit
objects and implements the interfaces to behave exactly as if no serialized commit graph existed. Second one,graphCommitNodeIndex
, takes the additionalcommitgraph.Index
object and implements the lookup methods by trying the commit graph first and falling back to loading fullCommit
objects if the commit is not present in the commit graph file.I added
NewCommitNodeIterCTime
iterator as a counterpart toNewCommitIterCTime
, which operates on top of theCommitNode
andCommitNodeIndex
interfaces. Similar thing could be done for otherNewCommit*Iter*
methods. In fact, it is easily possible to reimplementNewCommit*Iter*
on top ofNewCommitNode*Iter*
and switch between the lookup implementations (graphCommitNodeIndex
/objectCommitNodeIndex
) based on the paricular workload at hand. When full commit information, such as Message or Author is needed, then it's more useful to load the objects directly. When only summary information is needed (eg. counting distance between two commits) then the commit graph implementation can be used.Lastly, I added the method
CommitNodeIndex
togit.Repository
, which returns eithergraphCommitNodeIndex
orobjectCommitNodeIndex
depending on the presence of the serialized commit graph. [Note to myself: It should probably consult thegc.commitGraph
config option.]The text was updated successfully, but these errors were encountered: