git/odb/pack: introduce '*Index' type #2420

ttaylorr · 2017-07-15T22:55:49Z

This pull request introduces the *Index type, and creates a new package git/odb/pack to hold it.

The goal of this new package is twofold:

Parse index files to find packfile offsets.
Parse and assemble objects within packfiles.

This pull request is the first in a series of four aimed at accomplishing the first goal. Contained in this pull request is the scaffolding necessary to get us going for parsing actual index files. That means:

An *Index type, which will eventually get a Entry([]byte) uint64, error) method to resolve object names to their offsets in a packfile.
An IndexVersion type, which will learn how to parse index entries in later pull requests in this series.
A DecodeIndex() function, which will learn how to use the above IndexVersion instances to parse the index header, and eventually find objects in the index.

To address a point raised in #2415:

@peff wisely points out that I can use mmap(2) in order to avoid the seek(1) cost of bouncing around a file on disk. To me, this means that the 2nd option of lazily loading each entry makes more sense, since:

This is the implementation that I pursued in this series. The idea is that, in order to save memory (and to avoid parsing objects that wont be searched for) we delay parsing index entires until they are asked for.

The next two pull requests in the series will introduce implementations of IndexVersion.Search() which have the following signature:

func (i IndexVersion) Search(idx *Index, name []byte, at uint64) (*IndexEntry, int, error) {
        // ...
}

Or, in other words: given an index, an object to search for, and a location to search in, return either an entry (match), a comparison value (no match), or an error otherwise.

We can use this comparison value (see: bytes.Compare()) in order to direct the next window of our binary search through the index file in order to produce an entry in O(log(n)) time.

/cc @git-lfs/core
/cc #2415

ttaylorr · 2017-07-15T23:10:48Z

A future point of consideration for the IndexVersion interface: perhaps it could be expanded to something like:

package git/odb/pack

type IndexVersion interface {
        Name(r io.ReaderAt, at uint64) ([]byte, error)
        Entry(r io.ReaderAt, at uint64) (*IndexEntry, error)
}

which would remove the comparison int return value from the existing Search function. Instead, the *Index type would be responsible for calling bytes.Compare().

I think that might be a more logical separation of concerns, but I propose we hold off on implementing this change until after this series is reviewed.

ttaylorr added 3 commits July 15, 2017 13:15

git/odb/pack: introduce '*Index' type

ffdf07c

git/odb/pack: teach 'UnsupportedVersionErr'

5ba26e5

git/odb/pack: implement 'DecodeIndex()' to decode index files

91d694b

ttaylorr added the review label Jul 15, 2017

ttaylorr added this to the v2.3.0 milestone Jul 15, 2017

ttaylorr requested a review from technoweenie July 15, 2017 22:55

ttaylorr mentioned this pull request Jul 15, 2017

git/odb/pack: implement v1 object searching #2421

Merged

This was referenced Jul 15, 2017

git/odb/pack: implement v2 object searching #2422

Merged

git-lfs-migrate(1): packed object reads #2415

Closed

technoweenie approved these changes Jul 24, 2017

View reviewed changes

ttaylorr merged commit 83f4cb1 into master Jul 24, 2017

ttaylorr deleted the git-odb-pack-idx branch July 24, 2017 19:14

ttaylorr mentioned this pull request Jul 31, 2017

git/odb/pack: promote IndexVersion to interface #2461

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

git/odb/pack: introduce '*Index' type #2420

git/odb/pack: introduce '*Index' type #2420

ttaylorr commented Jul 15, 2017

ttaylorr commented Jul 15, 2017

git/odb/pack: introduce '*Index' type #2420

git/odb/pack: introduce '*Index' type #2420

Conversation

ttaylorr commented Jul 15, 2017

ttaylorr commented Jul 15, 2017