-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: version parsing #240
Conversation
@@ -3106,7 +3106,7 @@ expression: "conda_lock.packages_for_platform(Platform::Linux64).collect::<Vec<_ | |||
manager: conda | |||
platform: linux-64 | |||
dependencies: | |||
certifi: ">=2020.06.20" | |||
certifi: ">=2020.6.20" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happened here? o.O
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's what I explained in:
As you review this PR you will notice that some version string representations have changed. This is because the string representation is now constructed from the version itself. However, some cases are very hard to reproduce, for instance, the number 06 is represented as 6 in our code. While reviewing, make sure you check these changes and decide if they make sense or not.
This is awesome, of course! I am wondering wether we need (or need not) be able to restore versions identically. There are some places where it might be "required", or confusing if it doesn't happen:
To the last point: the current way of signing packages in conda works by signing the metadata of a package (including the dependencies). To sign the metadata, the entire record from the I am not sure we want to implement the way conda package signing works (it's a pretty custom implementation, and sigstore.dev might be a better fit) but it is at least one place where restoring versions exactly is necessary. We could add some sort of flag to the parser, so that if our representation doesn't match up, we store either a fancy diff or the original string. Or we could wrap the type with an ExactReprVersion type that implements this functionality. Or - and this is controversial - we only store the version as string and reparse it every time we want to compare it / bump it / ... :) |
Yeah indeed. In that case it would make more sense to store a string alonside the Version seperately. Dont think it should be a concern of the Version struct indeed. |
@wolfv Should I also introduce a type that retains the original version string and use that in the RepoData? |
5234e3b adds |
This PR is a follow-up to #227. It refactors the parsing of versions to use
nom
instead of the code that was ported from conda.Using
nom
allows removing some obscure regexes and should make it easier to integrate the same parsing logic in other places like the MatchSpec and VersionSpec parsing logic which are up next for a refactor.The parser is about 3x faster in the common case and 2x faster in more complex cases.
With the new parser, I was also able to extract more information from the string which makes it possible to reproduce the exact string representation it was created from (with some obscure exceptions). Because of that, I was able to remove the "normalized" string representation from the struct further reducing the size from 128 to 112 bytes.
As you review this PR you will notice that some version string representations have changed. This is because the string representation is now constructed from the version itself. However, some cases are very hard to reproduce, for instance, the number
06
is represented as6
in our code. While reviewing, make sure you check these changes and decide if they make sense or not.This code again makes use of some bit twiddling to achieve all this but I wrapped all this behavior into structs to make it simple to reason about (see
Segment
andFlags
).Things that currently fail to parse but did previously parse correctly are:
1__
parsed as1._
1--
parsed as1._
I will look into those!