Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

git/odb/pack: introduce '*Index' type #2420

Merged
merged 3 commits into from
Jul 24, 2017
Merged

git/odb/pack: introduce '*Index' type #2420

merged 3 commits into from
Jul 24, 2017

Conversation

ttaylorr
Copy link
Contributor

This pull request introduces the *Index type, and creates a new package git/odb/pack to hold it.

The goal of this new package is twofold:

  1. Parse index files to find packfile offsets.
  2. Parse and assemble objects within packfiles.

This pull request is the first in a series of four aimed at accomplishing the first goal. Contained in this pull request is the scaffolding necessary to get us going for parsing actual index files. That means:

  • An *Index type, which will eventually get a Entry([]byte) uint64, error) method to resolve object names to their offsets in a packfile.
  • An IndexVersion type, which will learn how to parse index entries in later pull requests in this series.
  • A DecodeIndex() function, which will learn how to use the above IndexVersion instances to parse the index header, and eventually find objects in the index.

To address a point raised in #2415:

@peff wisely points out that I can use mmap(2) in order to avoid the seek(1) cost of bouncing around a file on disk. To me, this means that the 2nd option of lazily loading each entry makes more sense, since:

This is the implementation that I pursued in this series. The idea is that, in order to save memory (and to avoid parsing objects that wont be searched for) we delay parsing index entires until they are asked for.

The next two pull requests in the series will introduce implementations of IndexVersion.Search() which have the following signature:

func (i IndexVersion) Search(idx *Index, name []byte, at uint64) (*IndexEntry, int, error) {
        // ...
}

Or, in other words: given an index, an object to search for, and a location to search in, return either an entry (match), a comparison value (no match), or an error otherwise.

We can use this comparison value (see: bytes.Compare()) in order to direct the next window of our binary search through the index file in order to produce an entry in O(log(n)) time.


/cc @git-lfs/core
/cc #2415

@ttaylorr
Copy link
Contributor Author

A future point of consideration for the IndexVersion interface: perhaps it could be expanded to something like:

package git/odb/pack

type IndexVersion interface {
        Name(r io.ReaderAt, at uint64) ([]byte, error)
        Entry(r io.ReaderAt, at uint64) (*IndexEntry, error)
}

which would remove the comparison int return value from the existing Search function. Instead, the *Index type would be responsible for calling bytes.Compare().

I think that might be a more logical separation of concerns, but I propose we hold off on implementing this change until after this series is reviewed.

@ttaylorr ttaylorr merged commit 83f4cb1 into master Jul 24, 2017
@ttaylorr ttaylorr deleted the git-odb-pack-idx branch July 24, 2017 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants