Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor error parsing to better support JSON:API #725

Merged
merged 10 commits into from
May 31, 2023

Conversation

sirosen
Copy link
Member

@sirosen sirosen commented May 2, 2023

Core error parsing has been rewritten to be a better match for the multiple-errors paradigm of JSON:API and similar APIs.

Parsing is now broken down internally into the following phases:

  • _parse_response: the main parsing method, which aborts or calls the following methods, listed in order
  • _parse_errors_array: extract one or more sub-errors from the document
  • _parse_code: extract a singular "code" field from the document
  • _parse_messages: extract message strings from the document

GlobusAPIError and its subclasses now feature an errors attribute, which is a list of sub-errors, and a messages attribute, which is a list of message strings. message is a property which is defined as "; ".join(messages), and has a deprecated setter.

message will be None when there is no parsed message, as opposed to the previous behavior which falls back to the text of the response.

code parsing can consequently be improved to have clearer semantics with respect to multiple sub-errors with different codes -- in this case, it will no longer assume that the first code from the sub-errors is accurate. This means that we won't show misleading results if an errors array comes back like

{
  "errors": [
    {"code": "foo"},
    {"code": "bar"}
  ]
}

Previously, this would parse the code as foo, but now it will leave it with the default of Error.


I originally wanted to "only" add support for the Flows data format, but when I started changing things, I took a really hard look at what we had for parsing and... 🤷

I'm missing a test-case for the other existing Flows data format, which I'll pull together now, but this is already ready for 90%+ of the meaningful review.

Note that this changes some things which users would see in error repr()s for malformed error data. Testsuite changes show, in several places, a change from error text to the HTTP Reason -- this is a consequence of the change to make message null when it can't parse.


📚 Documentation preview 📚: https://globus-sdk-python--725.org.readthedocs.build/en/725/

Copy link
Contributor

@ada-globus ada-globus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(n.b, we discussed changing the errors array handling, and having subclasses deal with the errror scenarios.)

src/globus_sdk/services/search/errors.py Outdated Show resolved Hide resolved
src/globus_sdk/services/transfer/errors.py Outdated Show resolved Hide resolved
@sirosen
Copy link
Member Author

sirosen commented May 8, 2023

I've updated the uses of raw_json to _dict_data. However, after making the change to try putting the main error into the errors array, I almost immediately found the results awkward and confusing. Ultimately, even though that change sounded interesting while we discussed it, applying it does not have the clarity that we would want, so I decided against it.

@sirosen sirosen marked this pull request as draft May 8, 2023 21:03
@sirosen
Copy link
Member Author

sirosen commented May 11, 2023

To leave a record of why this is marked as a draft:

We're discussing making the parsing behaviors more standardized within our team, so that everyone is in full agreement on what the expected parse of a given error is.

Effectively, if we introduce this changeset without that discussion, we introduce a de-facto standard which is not JSON:API but something closely related (e.g. handling for singular "error" as a supported extension, but not a default).

@sirosen sirosen marked this pull request as ready for review May 23, 2023 19:10
sirosen and others added 5 commits May 23, 2023 14:12
Core error parsing has been rewritten to be a better match for the
multiple-errors paradigm of JSON:API and similar APIs.

Parsing is now broken down internally into the following phases:
- _parse_response: the main parsing method, which aborts or calls the
  following methods, listed in order
- _parse_errors_array: extract one or more sub-errors from the
  document
- _parse_code: extract a singular "code" field from the document
- _parse_messages: extract message strings from the document

GlobusAPIError and its subclasses now feature an `errors` attribute,
which is a list of sub-errors, and a `messages` attribute, which is a
list of message strings. `message` is a property which is defined as
`"; ".join(messages)`, and has a deprecated setter.

`message` will be `None` when there is no parsed message, as
opposed to the previous behavior which falls back to the text of the
response.

`code` parsing can consequently be improved to have clearer semantics
with respect to multiple sub-errors with different codes -- in this
case, it will no longer assume that the first code from the sub-errors
is accurate. This means that we won't show misleading results if an
errors array comes back like
    {
      "errors": [
        {"code": "foo"},
        {"code": "bar"}
      ]
    }

Previously, this would parse the code as `foo`, but now it will leave
it with the default of `Error`.
A Flows error can have a `details` array or a `details` string. Only
the array case was tested -- expand this to include the string case.
pylint flagged an unused string (which was being used as a multiline
comment) as such. To fix, we could just convert to a normal comment or
disable the pylint rule, but the better approach is to get the data
being documented there into test fixtures.

Move example Timer API responses into test fixture data, and test
against those data. These functional tests effectively validate the
same behaviors which are captured by error parsing unit tests from
another angle.
Rather than using the data with a type cast, use the "type asserting
property" which does the same basic thing.
Co-authored-by: Kurt McKee <contactme@kurtmckee.org>
@sirosen sirosen force-pushed the fix-error-shapes-with-messages branch from 8511636 to 573bdeb Compare May 23, 2023 19:12
Detect and parse JSON:API errors based on the mimetype as specified in
the draft Globus Error Specification.

This fully separates the three parsing paths for JSON:API, Type Zero,
and Undefined.

Flows and Timers errors now customize their Undefined parsing methods.

In accordance with the draft spec, request_id is now an attribute of
all parsed error objects.

The mimetype detection revealed a bug in `globus_sdk._testing` in
which the Content-Type header was being set twice for JSON data. As a
result, mimetype parsing did not behave as desired and expected.
@sirosen sirosen force-pushed the fix-error-shapes-with-messages branch from 573bdeb to c2df6c2 Compare May 23, 2023 19:17
@sirosen
Copy link
Member Author

sirosen commented May 23, 2023

I've made some final changes here to attempt to accord best with the draft error specification which we have in progress.
The main change is to do much more explicit detection of different error formats and fully split the three distinct parsing paths between JSON:API, "Type Zero", and Undefined formats.

There's a separate method for each parsing path, so subclasses (see TimerAPIError and FlowsAPIError) can override only the undefined path and have a stronger notion of which part is being customized.

Independently, there's a final parsing hook which can be used to compute attributes after successful error parsing finishes with any of those three paths. This suits the GCS and Search errors well.

I would very much like for us to consider this as a target for our next release, as it fixes a number of problematic behaviors.

There are some subtle cases here which are introduced but are, I think, no better or worse than what came before. For example, the JSON:API parse path will now set code = None on the error. Since this isn't reachable today without explicitly passing the Accept header to the SDK, I think this minor change is acceptable.

Code-wise, the structure changes significantly, but the observable changes with real error payloads are pretty minor.
There are some test changes -- e.g. changing a request_id from 123 to "123" -- which show some of the behavioral strictness at play now, but which aren't really issues with current API response formats.

@sirosen
Copy link
Member Author

sirosen commented May 24, 2023

I'm making another change here momentarily, to populate the errors array with the top-level error in the case of a type_zero parse.
My rationale comes from some discussions about how GlobusAPIError uniformizes the different error formats, and specifically this thought which I shared in a 1-on-1 conversation:

Any adaptations should be done as necessary to make Type Zero errors appear like JSON:API errors, not vice versa. Since the errors array is a new attribute for globus-sdk, I think we should populate it with the top-level error document. The user expectation that errors is a non-empty array of subdocuments can thereby be maintained.

For both the Type Zero and Undefined parse paths, 'errors' is now
defined as the root document if there is no 'errors' key or if the
contents of the 'errors' key are not 100% valid.

This tightens up the logic in the undefined path and adds new
capabilities to the Type Zero path.
@kurtmckee
Copy link
Member

Ping me if this is ready for re-review.

@kurtmckee kurtmckee self-requested a review May 24, 2023 16:08
@sirosen
Copy link
Member Author

sirosen commented May 24, 2023

There's another detail which surfaced in discussion with @derek-globus. I'm going to be trying to convert GlobusAPIError.errors to be an array of objects (globus_sdk.ErrorSubdocument, a new class), so that we have a slightly nicer object model that we produce.

The adjustment is actually pretty small and should be done within the hour. I'll send some messages around when it's ready for (:crossed_fingers:) the final round of review and refinement.

GlobusAPIError.errors is now a list of ErrorSubdocument objects. These
are documented with an ultra-simple usage example as part of the core
exception documentation.

ErrorSubdocuments are defined to match well to the JSON:API notion of
an error object, but reduced to the fields which are appropriate to
our usage: code and message.

Because the `errors` objects now have some capabilities, some of the
parsing around (e.g., `messages` extraction) can be simplified.
`ErrorSubdocument` objects contain the list of fields they use for
`message` lookup, and therefore need to be customized to use
`GlobusAPIGlobusAPIError.MESSAGE_FIELDS` in a few locations.

Test changes are minimal in order to align types -- no test
behaviors are adjusted here other than those to fix the type of
`errors`.
Copy link
Contributor

@derek-globus derek-globus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good & happy with the direction this takes us.

Big fan of the inheritance structures in place to allow for custom overriding.

A few comments here & there about stylistic choices, comments, etc. but generally 👍

docs/core/exceptions.rst Outdated Show resolved Hide resolved
src/globus_sdk/exc/api.py Show resolved Hide resolved
src/globus_sdk/exc/api.py Outdated Show resolved Hide resolved
src/globus_sdk/services/gcs/errors.py Outdated Show resolved Hide resolved
src/globus_sdk/services/flows/errors.py Show resolved Hide resolved
sirosen and others added 2 commits May 26, 2023 11:01
Primarily, this change renames the parsing hook to match its purpose a
bit better. Also adjust some docstrings per review.

Co-authored-by: Derek Schlabach <derek@globus.org>
The case of empty `errors` is niche and only reachable by a few paths.
We therefore agreed that it doesn't need to be called out explicitly.
Copy link
Contributor

@aaschaer aaschaer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Type Zero" seems dramatic for an error classification, but its not really exposed, and its kinda fun to say, so no real complaint there

I like the approach of classification and then parsing and couldn't find anything obviously wrong with one readthrough

src/globus_sdk/exc/api.py Show resolved Hide resolved
@sirosen
Copy link
Member Author

sirosen commented May 26, 2023

"Type Zero" seems dramatic for an error classification, but its not really exposed, and its kinda fun to say, so no real complaint there

To this note, the name Type Zero isn't fully decided on yet. That's something I introduced because

  1. As you say, it's fun
  2. 0 is the first natural number
  3. I'm not sure if "Legacy" is appropriate as a term when the future is still being charted out

But we should be careful not to make "Type Zero" part of the docs or public interface, so that the name can change in the future. It's an implementation detail -- not a secret one, but not a part of the public SDK surface area.

@ada-globus ada-globus self-requested a review May 30, 2023 14:59
@sirosen sirosen merged commit fa47e7c into globus:main May 31, 2023
@sirosen sirosen deleted the fix-error-shapes-with-messages branch May 31, 2023 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants