-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
simplify SourceID Hash #14800
base: master
Are you sure you want to change the base?
simplify SourceID Hash #14800
Conversation
If this turns out not to be a valid transformation I am happy to update the PR to be tests to prove why it's not okay. |
_ => self.inner.url.as_str().hash(into), | ||
} | ||
self.inner.canonical_url.hash(into); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most CanonicalUrl differences are related to git somehow. I'm assuming that means we can't hit them in any other way. There is the stripping of a trailing /
. However, we probably should have been respecting that in the hash in the first place I assume. The question is if this will change anything significant enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps if you have a index hosted on git, then GITHUB.com/me/my-index.git/
could change a lot. I think the biggest concern here is having to recheck out dependencies. So even if this does affect people we might just eat that cost. On the other hand there's a lot of complexity here and maybe there are bigger problems I forgot about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't yet looked closer. I wonder if this would affect hacks like this one: #5478 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that would be affected by this change. But it is something we have to watch for as I "clean up" this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is true, since [patch.<url>]
uses URLs directly, not SourceId.
This change may affect anywhere we let people define URL and parse it as SourceId
, like the [registries]
table? We could also deem this as a bug and fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would hope that we rely on Eq
for those cases and not Hash
. So that the problem you identified is not a problem with this PR, rather one to investigate on the next PR that changes Eq
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. Have checked other usages of it, none of them seems user-facing like the path to index files. It should be fine to move over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the paths to crates.io Git and spare index on users disks are not affected by this change, you have my vote.
To be clear, the goal here is to simplify the code a little bit? Are there any other intended improvements? And the downside is that if someone has a custom registry that canonicalizes differently (like a trailing slash, or upper-case characters in a GitHub URL), it will hash differently and they will end up with new directories in If that's the tradeoff, I'm a little on the fence and could go either way. If I'm understanding it correctly, would it be possible to update the |
From this PR on its own, no other improvements. On its own, this is a small simplification that does not pull its weight. I have spiked out a larger re-factor that passes tests and that IMO does make things clearer and faster. It relies on several small changes like this that really need to be discussed thoroughly. So I wanted to pull out each of them in turn to make sure we found all of the unintended consequences.
That is the only concern I'm aware of. But it is possible that I missed something.
Would it be helpful for me to post the
We can definitely add a unit test for this changing behavior. I should probably start by making sure it is in fact user facing with an end-to-end test. If it is, then yes a unit test is good idea. I will try and do that. |
While trying to update the tests as requested I thought of another concern. If the user has two custom registries that canonical eyes to the same thing, before this PR they would end up in separate directories in |
Theoretically yes. However I doubt somebody relies on this behavior to distinguish custom registries. It either needs to have trailing |
☔ The latest upstream changes (presumably a8f7db2) made this pull request unmergeable. Please resolve the merge conflicts. |
What does this PR try to resolve?
Despite being interned
SourceId::Eq
is not aptr::eq
. Which in turn is becauseSourceId
s concept of identity is a complete mess. The code relies on having to IDs that areEq
but do not have the same values for their fields.As one measure of this
SourceId
has animpl Hash
which does something different fromfn full_hash
andfn stable_hash
. SeparatelySourceIdInner
has a different implementation. Similar levels of complexity exist forEq
. Every one of theseimpl
s was added due to a real bug/issue we've had that needs to stay fixed. Not all of witch are reproducible enough to have made it into our test suite.I have some ideas for how to reorganize the types so that this is easier to reason about and faster. But given the history and the complexity I want to move extremely carefully.
How should we test and review this PR?
The test pass, and it's a one line change, but this still needs careful review.
Additional information
r? @ehuss I remember you and Alex working very hard to track down most of these bugs.