Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace git2 with gix #129

Merged
merged 23 commits into from
Jul 29, 2023
Merged

Conversation

Byron
Copy link
Collaborator

@Byron Byron commented Jul 18, 2023

This PR replaces git2 with gix. It does so trying to make no breaking changes to the public API.
Please note that this is an ongoing effort, this PR will change a lot.

Why now?

With the sparse index slowly becoming the new default, users of this crate will probably be less succeptible
to changes in the git implementation which might come with compatibility issues on windows when ssh is used.
Furthermore, gix now has a robust implementation of the negotiation protocol which allows it to work for
crates.io proxies as well, which is important for users in China.
Making this changes has high value as it can solve a list of issues which have been open for a long time.

Tasks 📝

  • repository access
  • creation and initial fetch (and assure parallel clones block, have a test for that)
  • index updates
  • search for git2_ and Git2 to find leftover references in docs to old names and fix them
  • make improvements to gix for smoother API and refer to new release here
  • fix conflicts with master
  • translate the new changes feature.
  • a new gix release with latest changes to adapt API
  • Make FETCH_HEAD handling work (the only way to see what was actually fetched) (noticed while dog-fooding tooling built with this version)

Breaking Changes ❗️

  • The vendored-openssl feature toggle was removed as there is no equivalent, additive feature in gitoxide. It's replaced with documentation on how to get rustls instead.

Highlights 🔦

  • The test-suite now runs in 5min 30s on CI, down from ~9:30 minutes.
  • The mem usage tests seems to be a little bit faster with ~5s for gix and ~6s for git2. (it's definitely not as fast as I would have hoped though, ODB is the bottleneck)
  • Getting the most recent 500 changes() takes 1.2s on master but only 335ms with gix.

Differences to git2 implementation ≠

  • proxy-options are taken automatically and directly from git.
  • credentials are handled as configured in git. This works particularly well with private registries.
  • ssh options are handled by the ssh program, just like git. ssh-related git configuration is respected as well.

I am leaving it with the defaults as I think they improve compatibility and make it easier to connect to a variety of remotes.

Potential Shortcomings ⚠️

  • Windows might have benefited from a native ssh implementation. gitoxide doesn't have that yet, and uses the ssh program, just like git.
  • gitoxide can't read the FETCH_HEAD, nor does it write it. If that's an issue, this could be implemented. - resolved by making an even more flexible implementation of obtaining the latest commit.

Questions 🙋‍♂️

  • How to provide a pure-Rust HTTPS implementation while maintaining additive feature toggles?
    • With the https feature, one has to make a choice. Using curl provides the best capabilities, but one would have to chose the reqwest based backend to solve the OpenSSL related issues. Ideally, this could be a choice, but that doesn't work with gix feature toggles as these backend features are mutually exclusive. gix accepts that --all-feature builds won't compile, but this crate might not.
    • Answer: document how to deal with that in the crate-root.
  • Use fslock instead of gix::lock?
    • Doing so doesn't have the issue of needing signal handler to properly deal with signals/interrupts, but may have other failure modes.
  • How does this run on the lib.rs machine? @kornelski
    • The mem_usage tests also shows that both peak and total memory usage are up, but I have a feeling it counts virtual memory differently maybe. It's best to just try it and also compare performance on smaller machines.

Related Issues 🐞

@Byron Byron force-pushed the replace-git2-with-gitoxide branch 9 times, most recently from 2fbbbde to b75ccf7 Compare July 19, 2023 06:04
@Byron Byron marked this pull request as ready for review July 19, 2023 07:26
@Byron Byron force-pushed the replace-git2-with-gitoxide branch from 9c7c910 to a054b0d Compare July 20, 2023 05:44
@Byron
Copy link
Collaborator Author

Byron commented Jul 20, 2023

For the review, I recommend using gh checkout 129, as it sets up the local branches such that you can simply push changes into this PR. That tends to be much faster than making suggestions or talking about it, and saves time for everyone.

If I can be of any help otherwise, please let me know :).

@Byron Byron force-pushed the replace-git2-with-gitoxide branch 3 times, most recently from 45453c5 to 52526a1 Compare July 20, 2023 06:44
@Byron Byron marked this pull request as draft July 22, 2023 09:00
Byron added 12 commits July 22, 2023 11:30
The idea is to have it side-by-side until all `git2` usages are replaced.
Note that we also switch to the new `dep:` style with the `gix` dependency.
* Let most tests share the same state, and avoid altering the users cargo registry.
  Make sure that shared registry is cloned only once and teach initialization to wait for it on first clone.
  On CI, we use the freshly cloned crates index that was needed to build the binary in the first place, or put it there
  in case the sparse index was used for additional testing.

* make `mem` check operational again, and adjust expectation so it passes on MacOS (maybe unrelated, it didnt' run for a while probably)
* make `bare_iterator()` test much faster
* reduce startup time of `test_can_parse_all()` by using a shared index
Note that this commit also adds the infrastructure to support `gix` side-by-side.
This is done by renaming the soon-to-be-replaced types and fields with `git2_<name>`.

Additionally, CI is adjusted to be able to deliver best possible performance for some
of the long-running tests.
When cloning the index at the same time, as is done by tests,
it's possible to discover a newly initialized repository that
isn't quite ready yet, causing failures lateron.

To prevent that, we have to be sure there is no opening happening
while we are cloning.
@Byron Byron force-pushed the replace-git2-with-gitoxide branch from 9f09f47 to 11a7522 Compare July 22, 2023 09:35
@Byron
Copy link
Collaborator Author

Byron commented Jul 22, 2023

After noticing this awesome new Changes feature on master I went back and ported that too. Performance seems to be about the same.
I also noticed that FETCH_HEAD is truly required to correctly learn about the HEAD state of typical git2 crates-index checkouts, and I will address that tomorrow probably.

@Byron Byron force-pushed the replace-git2-with-gitoxide branch from 412a442 to f50308f Compare July 22, 2023 16:59
Byron added 4 commits July 23, 2023 08:13
…g the index.

This example can also help to do some real-world testing registries that are less
than optimal. This is particularly important now that the unit tests are sandboxed.
Previously they could get stuck in an outdated position if fast-forwarding
them wasn't possible. This is a problem in case index squashes happen inbetween.

Now the refs will be forcefully overwritten, so we retain the most recent state
that we just fetched, which is vital for consistency.
We really want to be sure we find the most recent commit, and put
extra safe-guards in place to assure this happens.
This way it's easier to add new errors in future, and less boilerplate.
It's already part of the dependency tree now as `gix` uses it, so it's
basically free.
@Byron Byron marked this pull request as ready for review July 23, 2023 07:21
@Byron
Copy link
Collaborator Author

Byron commented Jul 23, 2023

And it's ready for review again, with improvements for finding the most recent commit to be more resilient towards the various kinds of indices it may encounter.

I want to note that I ran into performance problems with version 1.0 due to a particular case where my index was out-of-date, and on each update() it would download the whole index essentially whose resolution took a long time, resulting in high-CPU and delays. What's worse is that the refspecs don't allow forced updates, which were needed as the remote index was squashed, which was the reason it had to download a huge pack. As cargo-smart-release would call update before publishing each crate, there was a long delay each time. Even though FETCH_HEAD was updated, this didn't help to prevent huge fetches next time as git2 performs simplified pack negotiation which remote tracking refs to be up-to-date. gix implements negotiation like git, which doesn't rely on remote tracking branches but the mere presence of a commit in the ODB.

All this is fixed in this branch as well, along with I assume significant performance improvements.

CC @frewsxcv @kornelski

Byron added a commit to GitoxideLabs/gitoxide that referenced this pull request Jul 23, 2023
This means `git2` is now fully removed from the tools used by `gitoxide`.

Note that this also means that the `vendored-ssl` feature has been removed as there is no equivalent.
It might be worth to add feature toggles that change to another backend though.
Byron added a commit to GitoxideLabs/gitoxide that referenced this pull request Jul 23, 2023
This means `git2` is now fully removed from the tools used by `gitoxide`.

Note that this also means that the `vendored-ssl` feature has been removed as there is no equivalent.
It might be worth to add feature toggles that change to another backend though.
@ToBinio
Copy link
Contributor

ToBinio commented Jul 23, 2023

just in case so you know #132

btw:
once again thanks for all the work with this PR and gitoxide in general ❤️

@Byron
Copy link
Collaborator Author

Byron commented Jul 23, 2023

Oh, I wasn't aware, and was sorry to read this :/!

btw:
once again thanks for all the work with this PR and gitoxide in general ❤️

Thank you ☺️.

Even though I probably shouldn't review my own PR, I am happy to become a maintainer of crates-index 🎉.

Due to my work on gitoxide and cargo smart-release (which relies on crates-index) I have stakes in this crate and would love to make it better.
If you consider my application, please feel free to 'tune' me on how to do maintenance here, or point me to a place where I can read up on how it should work. Thanks a lot!

Byron added a commit to GitoxideLabs/gitoxide that referenced this pull request Jul 23, 2023
This means `git2` is now fully removed from the tools used by `gitoxide`.

Note that this also means that the `vendored-ssl` feature has been removed as there is no equivalent.
It might be worth to add feature toggles that change to another backend though.
@ToBinio
Copy link
Contributor

ToBinio commented Jul 24, 2023

@frewsxcv sry for the ping...

but i wanted to make sure that you have seen Byron's application:

Even though I probably shouldn't review my own PR, I am happy to become a maintainer of crates-index 🎉.

Due to my work on gitoxide and cargo smart-release (which relies on crates-index) I have stakes in this crate and would love to make it better.
If you consider my application, please feel free to 'tune' me on how to do maintenance here, or point me to a place where I can read up on how it should work. Thanks a lot!

Jake-Shadle added a commit to EmbarkStudios/tame-index that referenced this pull request Jul 26, 2023
Apparently cargo, when cloning via git2, never sets a remote, so the
logic for fetching updates basically never worked. I will add a test
for this later...

Shamelessly stolen from
frewsxcv/rust-crates-index#129
@ToBinio ToBinio mentioned this pull request Jul 28, 2023
Jake-Shadle added a commit to EmbarkStudios/tame-index that referenced this pull request Jul 28, 2023
This PR makes a big change to how the RemoteGitIndex performs fetches
(and clones) and subsequently reads blobs, to resolve issues discovered
while doing obi1kenobi/cargo-semver-checks#506

Previously when performing a fetch, we also updated references so that
HEAD pointed at the same commit as the remote HEAD. This works fine in
local operation and in tests, but had a drawback, namely

```
Error: Failed to update references to their new position to match their remote locations

Caused by:
    0: The reflog could not be created or updated
    1: reflog messages need a committer which isn't set
```

can occur in CI environments. My attempts to fix this...failed
(configuring a committer name + email before editing references).

Then I noticed frewsxcv/rust-crates-index#129
which....is just way better than that previous approach since we don't
need to edit references any longer just to be able to use
`repo.head_commit` for reading blobs, but rather just store the remote
HEAD when opening/fetching and retrieving the tree to lookup blobs from
that, sidestepping the whole issue with updating references.

There is a big difference between that PR though, in that this PR also
adds a function (`crate::utils::git::write_fetch_head`) to actually
write a FETCH_HEAD _similarly_ to how cargo, via git or libgit2 writes
FETCH_HEAD when performing a fetch on an index.

This was done for 2 reasons:

1. Performing a fetch of an index via this crate will now update the
index repository the same as cargo would, which makes this play nicer
with cargo and thus the wider ecosystem
2. I had already done (a slightly worse) version of writing a FETCH_HEAD
for
[cargo-deny](https://github.com/EmbarkStudios/cargo-deny/blob/main/src/advisories/helpers/db.rs#L451-L457),
because, AFAIK, retrieving the timestamp of the FETCH_HEAD file is by
far the [most/only
reliable](https://stackoverflow.com/questions/2993902/how-do-i-check-the-date-and-time-of-the-latest-git-pull-that-was-executed)
way to determine when a fetch was last performed on a repository. The
reason this was important for cargo deny is that it fetches advisory
databases, but can decide _not_ to fetch them if the user doesn't want
to perform network calls, however if they _never_ fetch from the remote
advisory db they run the risk of mistakenly thinking everything is fine
even if they have crate dependencies that have active advisories that
were added/modified since the last time they fetched. So until `gix`
adds support for writing FETCH_HEAD (which I think is planned, but
couldn't find it), making this function public allows cargo-deny or
others to also have a (simple, definitely not git/libgit2 compliant) way
to write a FETCH_HEAD with gix.
@frewsxcv
Copy link
Owner

Oh, I wasn't aware, and was sorry to read this :/!

btw:
once again thanks for all the work with this PR and gitoxide in general ❤️

Thank you ☺️.

Even though I probably shouldn't review my own PR, I am happy to become a maintainer of crates-index 🎉.

Due to my work on gitoxide and cargo smart-release (which relies on crates-index) I have stakes in this crate and would love to make it better. If you consider my application, please feel free to 'tune' me on how to do maintenance here, or point me to a place where I can read up on how it should work. Thanks a lot!

Thanks for offering! Just sent you an invite to join the repo and crates.io crate

@Byron
Copy link
Collaborator Author

Byron commented Jul 29, 2023

Thank you :), I feel honoured to be allowed to maintain this crate, as I have been using it on and off for many years now!

I will be working on a 2.0 release that will incorporate this PR, and be sure to communicate the one breaking change properly.

@Byron Byron merged commit 2c5d33a into frewsxcv:master Jul 29, 2023
@Byron Byron deleted the replace-git2-with-gitoxide branch July 29, 2023 07:43
Byron added a commit to Byron/cargo-smart-release that referenced this pull request Aug 22, 2023
This means `git2` is now fully removed from the tools used by `gitoxide`.

Note that this also means that the `vendored-ssl` feature has been removed as there is no equivalent.
It might be worth to add feature toggles that change to another backend though.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants