-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Requirement/Marker parser with context-sensitive tokenisation #624
Conversation
7f6ff35
to
5cbd07f
Compare
Ahaha, I covered an error case that the existing tests don't exercise triggering coverage-related CI errors. |
This makes it a fully-fleshed-out class for holding data.
This also pulls out the error message formatting logic into the error itself.
This helps pyright better understand what's happening.
These provide a consistent call signatures into the parser. This also decouples the tokenizer from the `Marker` class.
This makes it easier to read through the function, with a clearer name.
This draws a clear distinction between this and the user-visible `Requirement` object.
This reduces how many regex patterns would be matched against the input while also enabling the parser to resolve ambiguity in-place.
This allows for nicer error messages, which show the entire requirement string and highlight the marker in particular.
This eliminates a point of duplication and ensures that the error messaging is consistent.
This makes it easier to identify what position the parser was checking, compared to relevant context to the reader.
This is more permitting and better handles tabs used as whitespace.
This follows what PEP 508's grammar says is a valid identifier.
This makes it possible for the arbitrary matches to be used within requirement specifiers without special constraints.
This makes it clearer in the docstring grammars that a name without any specifier is valid.
This makes the control flow slightly easier to understand.
This now exercises more edge cases and validates that the error messages are well-formed.
5cbd07f
to
3a7cdb6
Compare
"How do you relax after dealing with a tricky Python packaging thing at work?" This PR is now ready for review, with rewritten tests for the requirements parser. I like them because they test a lot of test strings (the earlier test suite only checked the parser against ~45 strings in the entire run), many of these strings helped identify logical mistakes in the parser's reworking. Notably, the parser on I reckon the easiest way to review the changes here would be to look at the test suite in its final state; and then to go commit-by-commit to see how things evolve. There's still a giant commit with the main reworking -- couldn't figure out how to break that up in the time I had at hand. :) |
This makes it easier to understand what the state of the parser is and what is expected at that point.
I went through all commits and nothing else struck my eye. Looks good to me. Thank you for the improvements :) . |
This ensures that these error tracebacks correctly describe the causality between the two errors.
This ensures that a marker with whitespace around it is parsed correctly.
This is better aligned with the naming from PEP 508.
I've gone through the issue tracker to check which other issues this PR fixes, and... it's quite a few. Given that, I've stretched this PR a little bit to cover #432 in this. |
This ensures that these are only parsed when they're independent words.
The listed operators were incorrect.
|
This is more consistent with the rest of the format which is largely whitespace agnostic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick perusal didn't turn anything up, but I'm not a parsing expert. 😅
Thanks @brettcannon and @hrnciar for the reviews! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently, my comment wasn't posted and it was hanging in the pending state. I'll file PR to remove it.
PR: #630
Closes #336
Closes #432
Closes #529
Closes #592
Closes #593
Closes #618
Closes #619
Closes #621
Closes #622
Closes #623
This builds upon #484.
The main change here is couple the tokenisation with the parsing, with the regex matches being done based on what the parser is expecting. This should make the tokenisation parts much faster. This now explicitly handles whitespace as a token (explicitly "consuming" it during parsing), and parses markers as a part of the parsing a requirement string (instead of delegating to
Marker
) to provide better syntax error messages.I also took the opportunity to improve the error messages presented by this, to more clearly indicate what went wrong and provide contextual information about the syntax error to the caller (in an exception).
Demo of the error messages:
/cc @hrnciar for inputs and thoughts!