-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add scope parsing to Scope object #642
Conversation
866c0bb
to
b0655dc
Compare
The core of this change is that `MutableScope` has been renamed to `Scope` and now has a pair of parsing functions as classmethods: `Scope.parse` and `Scope.deserialize`. The first parses any valid scope string to request, meaning it may be a space delimited list of scopes, and the second parses any valid scope string with a single top-level scope. As an inverse of `deserialize()`, the stringifier has been renamed to `serialize()`, and `__str__()` now just calls `serialize()`. The parsing itself requires several new internal components. A tokenizer handles converting strings into lists of tokens, e.g. converting `foo[bar *baz]` into `["foo", "[", "bar", "*", "baz", "]"]`. The tokens are small objects pairing strings with a type, which makes it simpler for the parser to handle various cases. The parser is minimally stateful, has no recursion or backtracking, operates with limited lookahead, and is generally meant to be the smallest and simplest thing which can do the job that it does. The parse result, a `Scope`, can now represent the true tree-like structure of scopes. `Scope.dependencies` is a list of child nodes in the tree, a list of `Scope`s. This has implicitations for `Scope.add_dependency` in that a `Scope` is now a valid input. As a result of this introduction, the use of `add_dependency("foo", optional=True)` is considered deprecated and now emits a DeprecationWarning. Building dependencies in the current version should be done either with (newly parseable) strings or with `Scope` objects. i.e. The aforementioned usage can be replaced with either `add_dependency("*foo")` or `add_dependency(Scope("foo", optional=True))`. Some related changes included here: - `globus_sdk.scopes` is now a subpackage (it was getting a bit long) - the internal type alias for ScopeCollectionType is now kept in `_types` for simpler universal access within the SDK - many more unit tests are added for the scope parsing and serialization functionality - documentation is updated to refer to these as "Scope objects" rather than "MutableScopes" - various doc fixes, including missing scope data in the scopes docs and a bug in the listknownscopes custom directive
This avoids having to consider parsing or otherwise handling `Scope("*foo")`. Although this would ideally be an alias for `Scope.deserialize("*foo")`, the interface gets messy if we allow this.
`Scope(...) in Scope(...)` is meant to indicate "permissions containment", meaning that the permissions requested by the first scope are covered by the permissions requested by the second scope. That notion of containment leads to the following defined behaviors: - the top-level scope strings must match - either both have matching optional=t/false OR only the "contained" scope is optional - each dependency needs to be contained in the "containing" scope's dependencies - it doesn't make sense to ask if a non-Scope value is in a Scope This definition feeds directly into an implementation and a large suite of tests to check correctness.
fcee094
to
26d94ba
Compare
This allows better scope constructor behavior from `ScopeBuilder.make_mutable`, as an optional scope can be constructed in a single step.
- Ensure `__contains__` is included in docs - Rebrand the `__contains__` check as "permission coverage" throughout relevant doc - Update the structure of the check to use simpler control-flow (a "reminder" comment should help make the for-else readable to those not familiar with the construct)
Several cases of scopes rejected by Globus Auth but previously accepted by this parser have been added. In particular: `foo[bar]baz` is not allowed, nor is `foo [bar]`. Both of these strings were previously allowed by the parser (but would re-serialize correctly for Globus Auth consumption). In order to support this, the lexer now also implements a "peek" value for lookahead and checks it against various possible values. This enables simpler checks than would otherwise be possible when iterating over the input string.
# space before brackets | ||
"foo [bar]", | ||
# missing space before next scope string after ']' | ||
"foo[bar]baz", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the following possibilities (which might not all be invalid):
"foo * "
(space after '*')"foo[ bar]"
(space after '[')" foo"
(leading space)"foo]"
(unmatched closing bracket 1)"foo bar]"
(unmatched closing bracket 2)"foo[bar] baz]"
(unmatched closing bracket 3)
The reason there are multiple unmatched closing brackets examples is because each might exercise different types of "reset" failures.
Note that if plus-separated string parsing is added after reviewing a comment above, these tests will need to also be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked, and there are two of these which are valid: " foo"
(equivalent to "foo"
) and "foo[ bar]"
. The others are all definitely invalid, and I'm more than happy to add them as test cases; thanks for enumerating them!
I'll also add those two valid cases to our tests, to document/enforce their validity.
At this point, I'm setup with a copy of Auth's scope parser to play with, so I can check these against the service to ensure we agree on things to accept or reject. So if you have other questionable cases that you think up, we can evaluate them and add as positive or negative tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized there's one more: *foo
. I don't think this variant is accounted for yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just pushed 6a97871, which adds a few test cases for "can parse" as opposed to these for "can't parse".
It includes *foo
and gives us a place for any future interesting cases which are valid.
3d969a3
to
45e36af
Compare
We retain all of the tests, but the method is renamed to `_contains`, removed from public docs, and commented as a "non-authoritative convenience function". In particular, after discussion we do not believe that it is valuable to try to build fully accurate client-side scope comparison when this cannot be authoratative with respect to a user's current consents in Auth. For now, the method will exist and it will be available for our own use-cases (e.g. in globus-cli), but not for general use. Also, improve documentation of `add_dependency` to avoid (confusing) chaining usage.
45e36af
to
6e493ee
Compare
This work is something I'm considering ready to merge in its current state. We discussed the differences between the parse tree and the true processing and representation of consents in Globus Auth. The end result of that discussion was that we agreed that parsing is still a very valuable utility to provide, but that one of the pieces here is something which we aren't ready to expose --
Also, per a discussion around the |
These tests do not check anything *about* the parsed scopes. Only that the strings which are being fed to `parse()` are valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good
The only additional piece I'd ask for before merging would be to include data in the unit tests that performs deserializing/serializing on real (or real-formatted) url & urn scope strings.
I doubt it would cause issues by reading the parser myself, but I wouldn't want us to unintentionally break in a way that we could successfully parse the scope string "foo"
but not "urn:globus:auth:scope:auth.globus.org:foo"
or "https://auth.globus.org/scopes/auth.globus.org/foo"
I'm not sure that it's all perfect, but I wrote a parser and I can't seem to find a valid scope which breaks it, so that seems like a good sign.
I did strongly consider using
pyparsing
for this. I've had good experiences usingpyparsing
and it seemed like it might fit this use-case. There were a couple of reasons I decided not to use it here.pyparsing
can be adjusted on a global basis by certain method calls, and I know of at least one other bit of global state (enable_packrat
). I didn't like the idea that our parsing behavior could change depending on what a user application might be doing with pyparsing independently.pyparsing
defines all whitespace as non-semantic by default. This is fine when it fits your language, but Globus Auth requires that an optional marker be adjacent to the scope string (*foo
, but not* foo
). It seemed like this might lead to us usingCombine()
in ways that don't naturally fit our desired grammar.The core of this change is that
MutableScope
has been renamed toScope
and now has a pair of parsing functions as classmethods:Scope.parse
andScope.deserialize
. The first parses any valid scope string to request, meaning it may be a space delimited list of scopes, and the second parses any valid scope string with a single top-level scope.As an inverse of
deserialize()
, the stringifier has been renamed toserialize()
, and__str__()
now just callsserialize()
.The parsing itself requires several new internal components. A tokenizer handles converting strings into lists of tokens, e.g. converting
foo[bar *baz]
into["foo", "[", "bar", "*", "baz", "]"]
.The tokens are small objects pairing strings with a type, which makes it simpler for the parser to handle various cases. The parser is minimally stateful, has no recursion or backtracking, operates with limited lookahead, and is generally meant to be the smallest and simplest thing which can do the job that it does.
The parse result, a
Scope
, can now represent the true tree-like structure of scopes.Scope.dependencies
is a list of child nodes in the tree, a list ofScope
s. This has implicitations forScope.add_dependency
in that aScope
is now a valid input. As a result of this introduction, the use ofadd_dependency("foo", optional=True)
is considered deprecated and now emits a DeprecationWarning. Building dependencies in the current version should be done either with (newly parseable) strings or withScope
objects. i.e. The aforementioned usage can be replaced with eitheradd_dependency("*foo")
oradd_dependency(Scope("foo", optional=True))
.Some related changes included here:
globus_sdk.scopes
is now a subpackage (it was getting a bit long)_types
for simpler universal access within the SDK