Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace crates-index with tame-index #506

Merged
merged 9 commits into from
Jul 28, 2023
Merged

Conversation

Jake-Shadle
Copy link
Contributor

This PR replaced crates-index with tame-index which gives more friendly access to the crates.io index, whether it is git, sparse, or source replaced with another git or sparse registry.

Resolves: #487

@Jake-Shadle
Copy link
Contributor Author

Oh, didn't realize this crate tested in several old versions, I'll relax it to 1.67.0.

@Jake-Shadle
Copy link
Contributor Author

This was fixed in EmbarkStudios@a966a4e, not sure why the commit isn't being picked up in this PR...

@obi1kenobi
Copy link
Owner

This was fixed in EmbarkStudios@a966a4e, not sure why the commit isn't being picked up in this PR...

Another contributor in another repo reported the same problem yesterday. I have to assume this is a GitHub bug.

A retry like git commit --amend --no-edit followed by a force-push of the branch seemed to resolve it that time.

Copy link
Owner

@obi1kenobi obi1kenobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems very nice! I like many of the changes you've decided to make relative to crates-index and I'm excited to merge this.

I flagged one concern that I believe should be possible to resolve without too much additional trouble.

Also, one more question: would this PR also enable our users to use mirrors of crates.io via source replacement? That's been a commonly requested feature (#160) and I was wondering if this PR might also happen to resolve that as well.

src/rustdoc_gen.rs Show resolved Hide resolved
@Jake-Shadle
Copy link
Contributor Author

Also, one more question: would this PR also enable our users to use mirrors of crates.io via source replacement? That's been a commonly requested feature (#160) and I was wondering if this PR might also happen to resolve that as well.

For source replacement of crates.io, yes, this PR resolves that (IndexUrl::crates_io), however #160 could mean both source replacement of crates.io or custom registries in addition to or instead of crates.io, which this PR alone wouldn't resolve, as each unique index would need to be opened based on the set of unique source indices, similar to what this code is doing in cargo-deny.

@obi1kenobi
Copy link
Owner

The code looks good to me! If you could try to amend your latest commit and force-push it to work around this GitHub glitch we seem to be having, I'm happy to merge as soon as CI passes.

Thanks again for letting me know about your fork of crates-index and for making this PR. cargo-semver-checks is much better off for your expertise and hard work here 🚀

If you might be open to adding support for custom registries at any point, I'd love to have your help on that as well!

@Jake-Shadle
Copy link
Contributor Author

Sorry should have been more clear, the 1.67.0 problem was because of rust-version is fixed, however there is a legitimate bug in the current code that my testing didn't catch because of some peculiarities with cargo, fixing that now before rerunning the tests here.

@Jake-Shadle
Copy link
Contributor Author

Ok, so the error in https://github.com/obi1kenobi/cargo-semver-checks/actions/runs/5671264307/job/15368054854?pr=506 is resolved at the head of the branch I patched to since I could repro the CI environment locally in a container, the issue seems to be that it is using a cached binary where the cache key isn't accounting for Cargo.lock changes, so not sure how to fix that, but I'll publish the fixed version and change this PR to use that published version.

@Jake-Shadle
Copy link
Contributor Author

Oh, and this PR broke the 1.64.0 check, I thought this crate only cared about 1.65.0 so set the rust-version to that, I'll update it to 1.64.0 as otherwise that job will succeed, but by failing on an error that bypasses the actual failure that job is intended to test.

@Jake-Shadle
Copy link
Contributor Author

Oh wait, furthest tame-index can go back is 1.65.0, which is the lowest version supported by gix https://github.com/search?q=repo%3AByron%2Fgitoxide%20rust-version&type=code

@Jake-Shadle
Copy link
Contributor Author

Ahh, so nvm looking how the caching works the cache key is correct and it is using the correct version, I'll continue taking a look tomorrow.

@obi1kenobi
Copy link
Owner

Oh, and this PR broke the 1.64.0 check, I thought this crate only cared about 1.65.0 so set the rust-version to that, I'll update it to 1.64.0 as otherwise that job will succeed, but by failing on an error that bypasses the actual failure that job is intended to test.

That check was actually buggy in itself, fixed in #507 — thanks for getting me to take a look 😅 The actual min supported version is 1.67 and we don't need support for anything older than that.

cargo-semver-checks has two Rust versions it cares about: the version it's compiled with, and the version found at runtime which will generate the rustdoc JSON. That check ensures that if the runtime rustc version is too old, we fail with a better error than "failed to deserialize rustdoc JSON." It tests a binary compiled with a newer rustc (e.g. like a prebuilt binary) running on a system with an older rustc (previously 1.64, now 1.66 in #507).

So supporting 1.67 in tame-index should be totally fine.

Ahh, so nvm looking how the caching works the cache key is correct and it is using the correct version, I'll continue taking a look tomorrow.

Yeah, as you probably already saw, we just build the binary once in the workflow (per-run) and reuse it in multiple jobs to skip needing to compile from scratch every time. The failure is worth looking into, though no particular rush -- tomorrow is fine. Thanks for digging into this!

Jake-Shadle added a commit to EmbarkStudios/tame-index that referenced this pull request Jul 28, 2023
This PR makes a big change to how the RemoteGitIndex performs fetches
(and clones) and subsequently reads blobs, to resolve issues discovered
while doing obi1kenobi/cargo-semver-checks#506

Previously when performing a fetch, we also updated references so that
HEAD pointed at the same commit as the remote HEAD. This works fine in
local operation and in tests, but had a drawback, namely

```
Error: Failed to update references to their new position to match their remote locations

Caused by:
    0: The reflog could not be created or updated
    1: reflog messages need a committer which isn't set
```

can occur in CI environments. My attempts to fix this...failed
(configuring a committer name + email before editing references).

Then I noticed frewsxcv/rust-crates-index#129
which....is just way better than that previous approach since we don't
need to edit references any longer just to be able to use
`repo.head_commit` for reading blobs, but rather just store the remote
HEAD when opening/fetching and retrieving the tree to lookup blobs from
that, sidestepping the whole issue with updating references.

There is a big difference between that PR though, in that this PR also
adds a function (`crate::utils::git::write_fetch_head`) to actually
write a FETCH_HEAD _similarly_ to how cargo, via git or libgit2 writes
FETCH_HEAD when performing a fetch on an index.

This was done for 2 reasons:

1. Performing a fetch of an index via this crate will now update the
index repository the same as cargo would, which makes this play nicer
with cargo and thus the wider ecosystem
2. I had already done (a slightly worse) version of writing a FETCH_HEAD
for
[cargo-deny](https://github.com/EmbarkStudios/cargo-deny/blob/main/src/advisories/helpers/db.rs#L451-L457),
because, AFAIK, retrieving the timestamp of the FETCH_HEAD file is by
far the [most/only
reliable](https://stackoverflow.com/questions/2993902/how-do-i-check-the-date-and-time-of-the-latest-git-pull-that-was-executed)
way to determine when a fetch was last performed on a repository. The
reason this was important for cargo deny is that it fetches advisory
databases, but can decide _not_ to fetch them if the user doesn't want
to perform network calls, however if they _never_ fetch from the remote
advisory db they run the risk of mistakenly thinking everything is fine
even if they have crate dependencies that have active advisories that
were added/modified since the last time they fetched. So until `gix`
adds support for writing FETCH_HEAD (which I think is planned, but
couldn't find it), making this function public allows cargo-deny or
others to also have a (simple, definitely not git/libgit2 compliant) way
to write a FETCH_HEAD with gix.
@Jake-Shadle
Copy link
Contributor Author

Ok, I think this is ready to go now, doing this PR forced me to fix several bugs and add improvements to tame-index, eg EmbarkStudios/tame-index#7, so it's now in a much better state.

@Jake-Shadle
Copy link
Contributor Author

If you'd like, I can also do a follow up PR to replace git2 with gix to reduce compile times/binary size.

@obi1kenobi
Copy link
Owner

This looks great, thanks so much for digging into the issues and fixing them! I'm very excited to merge and release this 🚀

Is my understanding correct that this PR switches our index behavior from "always git index" to "same as cargo" i.e. default-sparse unless configured otherwise?

If so, then I think all our CI tests are now over a sparse index and we have a gap in our test suite. Would you mind adding a test job that makes sure semver-checking with a git index still works as expected? For example, perhaps duplicate the run-on-ref-slice-fork and run-on-ref-slice-fork-windows jobs and run variants of them that first configure a git index?

@Jake-Shadle
Copy link
Contributor Author

The cross-version-caching still uses the git index as it is using 1.67 which hadn't yet stabilized sparse indices, much less made them the default, as seen by the failure before I resolved the issue in tame-index.

@obi1kenobi
Copy link
Owner

Unfortunately, that is only temporary coverage — this next release is the last one that will support 1.67 — so that job will get updated to use two newer Rust versions that happen to emit different rustdoc JSON formats.

I'd really love an explicit test job for testing the git index, because bugs related to index configuration are always such a pain to triage and debug.

How can I make adding that test as easy as possible for you?

I've never added or modified index configs before, so I don't know how to assert that cargo / cargo-semver-checks are using a git index. But if you'd prefer, I can add two new jobs (Linux / Windows) that catch semver issues and have a fill-in-the-blank step that should add the configuration and assert it's being applied correctly.

@Jake-Shadle
Copy link
Contributor Author

Oh it's no problem for me to add them I just thought I would point out there is still at least one test using the git index, I'll add variations for the above tests you recommended.

@obi1kenobi
Copy link
Owner

Thanks, I really appreciate it!

Also, switching to gix in a separate PR would be great as well!

@obi1kenobi
Copy link
Owner

Very interesting, TIL about that env var!

Is tame-index guaranteed to read it? Is there a way to assert that it has?

I'm a bit worried about "test passes for the wrong reasons" situations, like accidentally ignoring the env var and falling back to the default index config.

@Jake-Shadle
Copy link
Contributor Author

Yes, the purpose of the IndexUrl::crates_io is to open the same index that cargo would given the same environment, including the env var. I understand your concern though, but I think in this particular case it's not actually very important, the index information that is used is identical whether it used the git or sparse index for crates.io.

Copy link
Owner

@obi1kenobi obi1kenobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the awesome work you did. cargo-semver-checks and all its users are much better off for it 🚀

GitHub is showing a merge conflict at the moment, but as soon as that's resolved, I'm happy to merge this.

@obi1kenobi obi1kenobi merged commit 4df362a into obi1kenobi:main Jul 28, 2023
@obi1kenobi
Copy link
Owner

Thanks again!

If you're still interested in porting cargo-semver-checks to gix, I'd happily merge a PR for that. But obviously, no pressure at all.

@Jake-Shadle
Copy link
Contributor Author

Next week ;)

@obi1kenobi
Copy link
Owner

Enjoy your weekend!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't use sparse registry mirror
2 participants