Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scope parsing to Scope object #642

Merged
merged 10 commits into from
Nov 15, 2022
Merged

Conversation

sirosen
Copy link
Member

@sirosen sirosen commented Nov 4, 2022

I'm not sure that it's all perfect, but I wrote a parser and I can't seem to find a valid scope which breaks it, so that seems like a good sign.

I did strongly consider using pyparsing for this. I've had good experiences using pyparsing and it seemed like it might fit this use-case. There were a couple of reasons I decided not to use it here.

  1. Overkill. If it turns out that we can write a correct lexer trivially, and a correct parser easily in pure python without defining it as a recursive descent parser.
  2. Dependency handling. We'd either add a dependency (possibility of conflicts), or add it as an optional dependency (more complicated codepaths).
  3. Globally scoped ParserElement changes. The whitespace sensitivity of pyparsing can be adjusted on a global basis by certain method calls, and I know of at least one other bit of global state (enable_packrat). I didn't like the idea that our parsing behavior could change depending on what a user application might be doing with pyparsing independently.
  4. Whitespace-sensitivity is messy. pyparsing defines all whitespace as non-semantic by default. This is fine when it fits your language, but Globus Auth requires that an optional marker be adjacent to the scope string (*foo, but not * foo). It seemed like this might lead to us using Combine() in ways that don't naturally fit our desired grammar.

The core of this change is that MutableScope has been renamed to Scope and now has a pair of parsing functions as classmethods: Scope.parse and Scope.deserialize. The first parses any valid scope string to request, meaning it may be a space delimited list of scopes, and the second parses any valid scope string with a single top-level scope.

As an inverse of deserialize(), the stringifier has been renamed to serialize(), and __str__() now just calls serialize().

The parsing itself requires several new internal components. A tokenizer handles converting strings into lists of tokens, e.g. converting foo[bar *baz] into
["foo", "[", "bar", "*", "baz", "]"].
The tokens are small objects pairing strings with a type, which makes it simpler for the parser to handle various cases. The parser is minimally stateful, has no recursion or backtracking, operates with limited lookahead, and is generally meant to be the smallest and simplest thing which can do the job that it does.

The parse result, a Scope, can now represent the true tree-like structure of scopes. Scope.dependencies is a list of child nodes in the tree, a list of Scopes. This has implicitations for Scope.add_dependency in that a Scope is now a valid input. As a result of this introduction, the use of
add_dependency("foo", optional=True) is considered deprecated and now emits a DeprecationWarning. Building dependencies in the current version should be done either with (newly parseable) strings or with Scope objects. i.e. The aforementioned usage can be replaced with either add_dependency("*foo") or
add_dependency(Scope("foo", optional=True)).

Some related changes included here:

  • globus_sdk.scopes is now a subpackage (it was getting a bit long)
  • the internal type alias for ScopeCollectionType is now kept in _types for simpler universal access within the SDK
  • many more unit tests are added for the scope parsing and serialization functionality
  • documentation is updated to refer to these as "Scope objects" rather than "MutableScopes"
  • various doc fixes, including missing whitespace, missing scope data in the scopes docs, and a bug in the listknownscopes custom directive

changelog.d/20221104_145601_sirosen_add_scope_parsing.rst Outdated Show resolved Hide resolved
src/globus_sdk/scopes/scope_definition.py Outdated Show resolved Hide resolved
src/globus_sdk/scopes/scope_definition.py Outdated Show resolved Hide resolved
The core of this change is that `MutableScope` has been renamed to
`Scope` and now has a pair of parsing functions as classmethods:
`Scope.parse` and `Scope.deserialize`. The first parses any valid
scope string to request, meaning it may be a space delimited list of
scopes, and the second parses any valid scope string with a single
top-level scope.

As an inverse of `deserialize()`, the stringifier has been renamed to
`serialize()`, and `__str__()` now just calls `serialize()`.

The parsing itself requires several new internal components. A
tokenizer handles converting strings into lists of tokens, e.g.
converting `foo[bar *baz]` into
`["foo", "[", "bar", "*", "baz", "]"]`.
The tokens are small objects pairing strings with a type, which makes
it simpler for the parser to handle various cases. The parser is
minimally stateful, has no recursion or backtracking, operates with
limited lookahead, and is generally meant to be the smallest and
simplest thing which can do the job that it does.

The parse result, a `Scope`, can now represent the true tree-like
structure of scopes. `Scope.dependencies` is a list of child nodes in
the tree, a list of  `Scope`s. This has implicitations for
`Scope.add_dependency` in that a `Scope` is now a valid input. As a
result of this introduction, the use of
`add_dependency("foo", optional=True)` is considered deprecated and
now emits a DeprecationWarning. Building dependencies in the current
version should be done either with (newly parseable) strings or with
`Scope` objects. i.e. The aforementioned usage can be replaced with
either `add_dependency("*foo")` or
`add_dependency(Scope("foo", optional=True))`.

Some related changes included here:
- `globus_sdk.scopes` is now a subpackage (it was getting a bit long)
- the internal type alias for ScopeCollectionType is now kept in
  `_types` for simpler universal access within the SDK
- many more unit tests are added for the scope parsing and
  serialization functionality
- documentation is updated to refer to these as "Scope objects" rather
  than "MutableScopes"
- various doc fixes, including missing scope data in the scopes docs
  and a bug in the listknownscopes custom directive
This avoids having to consider parsing or otherwise handling
`Scope("*foo")`. Although this would ideally be an alias for
`Scope.deserialize("*foo")`, the interface gets messy if we allow
this.
`Scope(...) in Scope(...)` is meant to indicate "permissions
containment", meaning that the permissions requested by the first
scope are covered by the permissions requested by the second scope.

That notion of containment leads to the following defined behaviors:
- the top-level scope strings must match
- either both have matching optional=t/false OR only the
  "contained" scope is optional
- each dependency needs to be contained in the "containing" scope's
  dependencies
- it doesn't make sense to ask if a non-Scope value is in a Scope

This definition feeds directly into an implementation and a large
suite of tests to check correctness.
This allows better scope constructor behavior from
`ScopeBuilder.make_mutable`, as an optional scope can be constructed
in a single step.
- Ensure `__contains__` is included in docs
- Rebrand the `__contains__` check as "permission coverage" throughout
  relevant doc
- Update the structure of the check to use simpler control-flow
  (a "reminder" comment should help make the for-else readable to
  those not familiar with the construct)
Several cases of scopes rejected by Globus Auth but previously
accepted by this parser have been added. In particular:
`foo[bar]baz` is not allowed, nor is `foo [bar]`. Both of these
strings were previously allowed by the parser (but would re-serialize
correctly for Globus Auth consumption).

In order to support this, the lexer now also implements a "peek" value
for lookahead and checks it against various possible values. This
enables simpler checks than would otherwise be possible when iterating
over the input string.
src/globus_sdk/scopes/builder.py Show resolved Hide resolved
src/globus_sdk/scopes/scope_definition.py Show resolved Hide resolved
# space before brackets
"foo [bar]",
# missing space before next scope string after ']'
"foo[bar]baz",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the following possibilities (which might not all be invalid):

  • "foo * " (space after '*')
  • "foo[ bar]" (space after '[')
  • " foo" (leading space)
  • "foo]" (unmatched closing bracket 1)
  • "foo bar]" (unmatched closing bracket 2)
  • "foo[bar] baz]" (unmatched closing bracket 3)

The reason there are multiple unmatched closing brackets examples is because each might exercise different types of "reset" failures.

Note that if plus-separated string parsing is added after reviewing a comment above, these tests will need to also be updated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked, and there are two of these which are valid: " foo" (equivalent to "foo") and "foo[ bar]". The others are all definitely invalid, and I'm more than happy to add them as test cases; thanks for enumerating them!
I'll also add those two valid cases to our tests, to document/enforce their validity.

At this point, I'm setup with a copy of Auth's scope parser to play with, so I can check these against the service to ensure we agree on things to accept or reject. So if you have other questionable cases that you think up, we can evaluate them and add as positive or negative tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized there's one more: *foo. I don't think this variant is accounted for yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed 6a97871, which adds a few test cases for "can parse" as opposed to these for "can't parse".
It includes *foo and gives us a place for any future interesting cases which are valid.

@sirosen sirosen marked this pull request as ready for review November 11, 2022 19:08
We retain all of the tests, but the method is renamed to `_contains`,
removed from public docs, and commented as a "non-authoritative
convenience function". In particular, after discussion we do not
believe that it is valuable to try to build fully accurate
client-side scope comparison when this cannot be authoratative with
respect to a user's current consents in Auth.

For now, the method will exist and it will be available for our own
use-cases (e.g. in globus-cli), but not for general use.

Also, improve documentation of `add_dependency` to avoid (confusing)
chaining usage.
@sirosen
Copy link
Member Author

sirosen commented Nov 11, 2022

This work is something I'm considering ready to merge in its current state.

We discussed the differences between the parse tree and the true processing and representation of consents in Globus Auth. The end result of that discussion was that we agreed that parsing is still a very valuable utility to provide, but that one of the pieces here is something which we aren't ready to expose -- __contains__. For the immediate, short term, we are keeping this functionality, but under the internal name _contains. And as an extra measure, the docstring for that internal method declares it "non-authoritative", since it could always be misleading to look at a saved scope string rather than at consents in Auth.

scope1 in scope2 has been removed from public docs, including the changelog.

Also, per a discussion around the add_dependency interface, we are not changing its chaining behavior, but that usage no longer appears in the docs. Removing the chaining behavior is slated for SDK v4.0, to ensure it won't be forgotten when we are next liberated to change interfaces.

These tests do not check anything *about* the parsed scopes. Only that
the strings which are being fed to `parse()` are valid.
Copy link
Contributor

@derek-globus derek-globus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good

The only additional piece I'd ask for before merging would be to include data in the unit tests that performs deserializing/serializing on real (or real-formatted) url & urn scope strings.

I doubt it would cause issues by reading the parser myself, but I wouldn't want us to unintentionally break in a way that we could successfully parse the scope string "foo" but not "urn:globus:auth:scope:auth.globus.org:foo" or "https://auth.globus.org/scopes/auth.globus.org/foo"

src/globus_sdk/scopes/builder.py Show resolved Hide resolved
src/globus_sdk/scopes/scope_definition.py Outdated Show resolved Hide resolved
@sirosen sirosen merged commit 013978c into globus:main Nov 15, 2022
@sirosen sirosen deleted the add-scope-parsing branch November 15, 2022 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants