Add scope parsing to Scope object #642

sirosen · 2022-11-04T05:46:14Z

I'm not sure that it's all perfect, but I wrote a parser and I can't seem to find a valid scope which breaks it, so that seems like a good sign.

I did strongly consider using pyparsing for this. I've had good experiences using pyparsing and it seemed like it might fit this use-case. There were a couple of reasons I decided not to use it here.

Overkill. If it turns out that we can write a correct lexer trivially, and a correct parser easily in pure python without defining it as a recursive descent parser.
Dependency handling. We'd either add a dependency (possibility of conflicts), or add it as an optional dependency (more complicated codepaths).
Globally scoped ParserElement changes. The whitespace sensitivity of pyparsing can be adjusted on a global basis by certain method calls, and I know of at least one other bit of global state (enable_packrat). I didn't like the idea that our parsing behavior could change depending on what a user application might be doing with pyparsing independently.
Whitespace-sensitivity is messy. pyparsing defines all whitespace as non-semantic by default. This is fine when it fits your language, but Globus Auth requires that an optional marker be adjacent to the scope string (*foo, but not * foo). It seemed like this might lead to us using Combine() in ways that don't naturally fit our desired grammar.

The core of this change is that MutableScope has been renamed to Scope and now has a pair of parsing functions as classmethods: Scope.parse and Scope.deserialize. The first parses any valid scope string to request, meaning it may be a space delimited list of scopes, and the second parses any valid scope string with a single top-level scope.

As an inverse of deserialize(), the stringifier has been renamed to serialize(), and __str__() now just calls serialize().

The parsing itself requires several new internal components. A tokenizer handles converting strings into lists of tokens, e.g. converting foo[bar *baz] into
["foo", "[", "bar", "*", "baz", "]"].
The tokens are small objects pairing strings with a type, which makes it simpler for the parser to handle various cases. The parser is minimally stateful, has no recursion or backtracking, operates with limited lookahead, and is generally meant to be the smallest and simplest thing which can do the job that it does.

The parse result, a Scope, can now represent the true tree-like structure of scopes. Scope.dependencies is a list of child nodes in the tree, a list of Scopes. This has implicitations for Scope.add_dependency in that a Scope is now a valid input. As a result of this introduction, the use of
add_dependency("foo", optional=True) is considered deprecated and now emits a DeprecationWarning. Building dependencies in the current version should be done either with (newly parseable) strings or with Scope objects. i.e. The aforementioned usage can be replaced with either add_dependency("*foo") or
add_dependency(Scope("foo", optional=True)).

Some related changes included here:

globus_sdk.scopes is now a subpackage (it was getting a bit long)
the internal type alias for ScopeCollectionType is now kept in _types for simpler universal access within the SDK
many more unit tests are added for the scope parsing and serialization functionality
documentation is updated to refer to these as "Scope objects" rather than "MutableScopes"
various doc fixes, including missing whitespace, missing scope data in the scopes docs, and a bug in the listknownscopes custom directive

changelog.d/20221104_145601_sirosen_add_scope_parsing.rst

src/globus_sdk/scopes/scope_definition.py

The core of this change is that `MutableScope` has been renamed to `Scope` and now has a pair of parsing functions as classmethods: `Scope.parse` and `Scope.deserialize`. The first parses any valid scope string to request, meaning it may be a space delimited list of scopes, and the second parses any valid scope string with a single top-level scope. As an inverse of `deserialize()`, the stringifier has been renamed to `serialize()`, and `__str__()` now just calls `serialize()`. The parsing itself requires several new internal components. A tokenizer handles converting strings into lists of tokens, e.g. converting `foo[bar *baz]` into `["foo", "[", "bar", "*", "baz", "]"]`. The tokens are small objects pairing strings with a type, which makes it simpler for the parser to handle various cases. The parser is minimally stateful, has no recursion or backtracking, operates with limited lookahead, and is generally meant to be the smallest and simplest thing which can do the job that it does. The parse result, a `Scope`, can now represent the true tree-like structure of scopes. `Scope.dependencies` is a list of child nodes in the tree, a list of `Scope`s. This has implicitations for `Scope.add_dependency` in that a `Scope` is now a valid input. As a result of this introduction, the use of `add_dependency("foo", optional=True)` is considered deprecated and now emits a DeprecationWarning. Building dependencies in the current version should be done either with (newly parseable) strings or with `Scope` objects. i.e. The aforementioned usage can be replaced with either `add_dependency("*foo")` or `add_dependency(Scope("foo", optional=True))`. Some related changes included here: - `globus_sdk.scopes` is now a subpackage (it was getting a bit long) - the internal type alias for ScopeCollectionType is now kept in `_types` for simpler universal access within the SDK - many more unit tests are added for the scope parsing and serialization functionality - documentation is updated to refer to these as "Scope objects" rather than "MutableScopes" - various doc fixes, including missing scope data in the scopes docs and a bug in the listknownscopes custom directive

This avoids having to consider parsing or otherwise handling `Scope("*foo")`. Although this would ideally be an alias for `Scope.deserialize("*foo")`, the interface gets messy if we allow this.

`Scope(...) in Scope(...)` is meant to indicate "permissions containment", meaning that the permissions requested by the first scope are covered by the permissions requested by the second scope. That notion of containment leads to the following defined behaviors: - the top-level scope strings must match - either both have matching optional=t/false OR only the "contained" scope is optional - each dependency needs to be contained in the "containing" scope's dependencies - it doesn't make sense to ask if a non-Scope value is in a Scope This definition feeds directly into an implementation and a large suite of tests to check correctness.

This allows better scope constructor behavior from `ScopeBuilder.make_mutable`, as an optional scope can be constructed in a single step.

- Ensure `__contains__` is included in docs - Rebrand the `__contains__` check as "permission coverage" throughout relevant doc - Update the structure of the check to use simpler control-flow (a "reminder" comment should help make the for-else readable to those not familiar with the construct)

Several cases of scopes rejected by Globus Auth but previously accepted by this parser have been added. In particular: `foo[bar]baz` is not allowed, nor is `foo [bar]`. Both of these strings were previously allowed by the parser (but would re-serialize correctly for Globus Auth consumption). In order to support this, the lexer now also implements a "peek" value for lookahead and checks it against various possible values. This enables simpler checks than would otherwise be possible when iterating over the input string.

src/globus_sdk/scopes/builder.py

src/globus_sdk/scopes/scope_definition.py

kurtmckee · 2022-11-08T15:42:18Z

tests/unit/test_scopes.py

+        # space before brackets
+        "foo [bar]",
+        # missing space before next scope string after ']'
+        "foo[bar]baz",


I don't see the following possibilities (which might not all be invalid):

"foo * " (space after '*')

"foo[ bar]" (space after '[')

" foo" (leading space)

"foo]" (unmatched closing bracket 1)

"foo bar]" (unmatched closing bracket 2)

"foo[bar] baz]" (unmatched closing bracket 3)

The reason there are multiple unmatched closing brackets examples is because each might exercise different types of "reset" failures.

Note that if plus-separated string parsing is added after reviewing a comment above, these tests will need to also be updated.

I just checked, and there are two of these which are valid: " foo" (equivalent to "foo") and "foo[ bar]". The others are all definitely invalid, and I'm more than happy to add them as test cases; thanks for enumerating them!
I'll also add those two valid cases to our tests, to document/enforce their validity.

At this point, I'm setup with a copy of Auth's scope parser to play with, so I can check these against the service to ensure we agree on things to accept or reject. So if you have other questionable cases that you think up, we can evaluate them and add as positive or negative tests.

I just realized there's one more: *foo. I don't think this variant is accounted for yet.

I just pushed 6a97871, which adds a few test cases for "can parse" as opposed to these for "can't parse".
It includes *foo and gives us a place for any future interesting cases which are valid.

We retain all of the tests, but the method is renamed to `_contains`, removed from public docs, and commented as a "non-authoritative convenience function". In particular, after discussion we do not believe that it is valuable to try to build fully accurate client-side scope comparison when this cannot be authoratative with respect to a user's current consents in Auth. For now, the method will exist and it will be available for our own use-cases (e.g. in globus-cli), but not for general use. Also, improve documentation of `add_dependency` to avoid (confusing) chaining usage.

sirosen · 2022-11-11T19:14:24Z

This work is something I'm considering ready to merge in its current state.

We discussed the differences between the parse tree and the true processing and representation of consents in Globus Auth. The end result of that discussion was that we agreed that parsing is still a very valuable utility to provide, but that one of the pieces here is something which we aren't ready to expose -- __contains__. For the immediate, short term, we are keeping this functionality, but under the internal name _contains. And as an extra measure, the docstring for that internal method declares it "non-authoritative", since it could always be misleading to look at a saved scope string rather than at consents in Auth.

scope1 in scope2 has been removed from public docs, including the changelog.

Also, per a discussion around the add_dependency interface, we are not changing its chaining behavior, but that usage no longer appears in the docs. Removing the chaining behavior is slated for SDK v4.0, to ensure it won't be forgotten when we are next liberated to change interfaces.

These tests do not check anything *about* the parsed scopes. Only that the strings which are being fed to `parse()` are valid.

derek-globus

Overall looks good

The only additional piece I'd ask for before merging would be to include data in the unit tests that performs deserializing/serializing on real (or real-formatted) url & urn scope strings.

I doubt it would cause issues by reading the parser myself, but I wouldn't want us to unintentionally break in a way that we could successfully parse the scope string "foo" but not "urn:globus:auth:scope:auth.globus.org:foo" or "https://auth.globus.org/scopes/auth.globus.org/foo"

src/globus_sdk/scopes/builder.py

src/globus_sdk/scopes/scope_definition.py

sirosen requested review from kurtmckee, aaschaer, ada-globus and derek-globus November 4, 2022 05:46

sirosen mentioned this pull request Nov 4, 2022

Fix doc warnings which error locally #643

Merged

sirosen force-pushed the add-scope-parsing branch from 866c0bb to b0655dc Compare November 4, 2022 19:16

kurtmckee reviewed Nov 4, 2022

View reviewed changes

changelog.d/20221104_145601_sirosen_add_scope_parsing.rst Outdated Show resolved Hide resolved

src/globus_sdk/scopes/scope_definition.py Outdated Show resolved Hide resolved

src/globus_sdk/scopes/scope_definition.py Outdated Show resolved Hide resolved

sirosen added 3 commits November 7, 2022 22:22

Forbid special characters in Scope.__init__

bc1fc5c

This avoids having to consider parsing or otherwise handling `Scope("*foo")`. Although this would ideally be an alias for `Scope.deserialize("*foo")`, the interface gets messy if we allow this.

sirosen force-pushed the add-scope-parsing branch from fcee094 to 26d94ba Compare November 7, 2022 22:25

sirosen added 3 commits November 7, 2022 22:28

Allow optional=t/f on make_mutable

d9a4190

This allows better scope constructor behavior from `ScopeBuilder.make_mutable`, as an optional scope can be constructed in a single step.

kurtmckee reviewed Nov 8, 2022

View reviewed changes

Add more test cases for scope parsing

c2dceba

sirosen marked this pull request as ready for review November 11, 2022 19:08

sirosen force-pushed the add-scope-parsing branch from 3d969a3 to 45e36af Compare November 11, 2022 19:09

sirosen force-pushed the add-scope-parsing branch from 45e36af to 6e493ee Compare November 11, 2022 19:11

Add scope parsing tests which only assert validity

6a97871

These tests do not check anything *about* the parsed scopes. Only that the strings which are being fed to `parse()` are valid.

kurtmckee approved these changes Nov 14, 2022

View reviewed changes

derek-globus requested changes Nov 14, 2022

View reviewed changes

src/globus_sdk/scopes/builder.py Show resolved Hide resolved

src/globus_sdk/scopes/scope_definition.py Outdated Show resolved Hide resolved

Test scope parsing on service formatted scopes

e6f6dd6

derek-globus approved these changes Nov 15, 2022

View reviewed changes

sirosen merged commit 013978c into globus:main Nov 15, 2022

sirosen deleted the add-scope-parsing branch November 15, 2022 17:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add scope parsing to Scope object #642

Add scope parsing to Scope object #642

sirosen commented Nov 4, 2022

kurtmckee Nov 8, 2022

sirosen Nov 8, 2022

kurtmckee Nov 11, 2022

sirosen Nov 14, 2022

sirosen commented Nov 11, 2022

derek-globus left a comment •

edited

Loading

Add scope parsing to Scope object #642

Add scope parsing to Scope object #642

Conversation

sirosen commented Nov 4, 2022

kurtmckee Nov 8, 2022

Choose a reason for hiding this comment

sirosen Nov 8, 2022

Choose a reason for hiding this comment

kurtmckee Nov 11, 2022

Choose a reason for hiding this comment

sirosen Nov 14, 2022

Choose a reason for hiding this comment

sirosen commented Nov 11, 2022

derek-globus left a comment • edited Loading

Choose a reason for hiding this comment

derek-globus left a comment •

edited

Loading