Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider moving extra metadata from git notes to other storage #7

Closed
martinvonz opened this issue Apr 7, 2021 · 3 comments
Closed
Labels
enhancement New feature or request

Comments

@martinvonz
Copy link
Member

The git backend gets really slow after many commits have been created. Profiling has shown that the problem is git notes. One problem is that libgit2 doesn't do sharding. Manually editing a note from the command line using git notes helps, but it's still very slow. We should consider moving the extra metadata to some other storage, perhaps a custom format.

Another option might be for the git backend to simply cache the notes tree. I don't know how much that would help.

One advantage of the current storage in git notes is that it lets us exchange the data using regular git commands.

@martinvonz martinvonz added the enhancement New feature or request label Apr 7, 2021
martinvonz added a commit that referenced this issue Oct 14, 2021
I think this is just cleaner, and it gives us room to put other
store-related data in the `.jj/store/` directory. I may want to use
that place for writing the metadata we currently write in Git notes
(#7).
@martinvonz
Copy link
Member Author

Fun fact: I replaced the Git notes storage by a simple custom format and jj log got a few times as fast, even though it naively re-read the whole file for every commit it printed. So the results are promising, I just need to clean that code up and make it not re-read all the time.

@arxanas
Copy link
Contributor

arxanas commented Oct 20, 2021

Do you have a solution for associating metadata with nodes/edges in the commit graph? If so, is it something that can be shared between repositories?

@martinvonz
Copy link
Member Author

I currently use Git notes, which can technically be shared, but it's not obvious how to do that for a regular user. I could of course add some command for making it easier, but there's not much in the extra metadata that is important for sharing, and now that evolution is pretty much gone, there's even less. The only pieces of information are the change id and the open/closed flag. I hadn't planned to exchange it until there's exchange between native jj repos (i.e. years from now :)).

Allow exchange of the information is thus not something I aimed for with the format I'm replacing Git notes with. The format can be thought of as naive Git packfiles without delta encoding or compression.

Does git-branchless also need to associate metadata with nodes and/or edges? What metadata?

martinvonz added a commit that referenced this issue Oct 20, 2021
I'm trying to replace the Git backend's use of Git notes for storing
metadata (#7). This patch adds a file format that I hope can be used
for that. It's a simple generic format for storing fixed-size keys and
associated variable-size values. The keys are stored in sorted
order. Each key is followed by an offset to the value. The offset is
relative to the first value. All values are concatenated after each
other. I suppose it's a bit like Git's pack files but lacking both
delta-encoding and compression.

Each file can also have a parent pointer (just like the index files
have), so we don't have to rewrite the whole file each time. As with
the index files, the new format squashes a file into its parent if it
contains more than half the number of entries of the parent. The code
is also based on `index.rs`.

Perhaps we can alo replace the default operation storage with this
format. Maybe also the native local backend's storage. We'll need
delta-encoding and compression soon then.
martinvonz added a commit that referenced this issue Oct 20, 2021
The new store works the same way as the `OpHeadsStore`. It keeps track
of the current head file(s) by recording their names in a
directory. When a write happens, it adds the new head and then removes
the old head. There will be generally be a single head at a time. The
only exception is when there's been concurrent operations (locally, or
remotely, in the case of a distributed file system). When there are
multiple heads files, they are automatically merged. No guarantee is
given about which value wins if the key exists in several heads; the
store is meant to be used for data that's immutable once written. As
long as different keys are written, this is a CRDT. That makes it fit
for solving both #3 and #7.
martinvonz added a commit that referenced this issue Oct 20, 2021
We don't want to re-read the whole table(s) every time we read extra
metadata for a commit (which is the immediate use-case I'm aiming for
in #7)..
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants