Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: update timeout error UX #10274

Merged
merged 1 commit into from
Jul 20, 2020

Conversation

etr2460
Copy link
Member

@etr2460 etr2460 commented Jul 9, 2020

SUMMARY

Implements new error designs within Superset, starting with timeout errors. This standardizes all timeout errors, frontend and backend, to the UI component. Future errors will be built following this same pattern. Also adds documentation pointers for potential root causes so users can unblock themselves

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Screen Shot 2020-07-09 at 8 56 29 AM

Screen Shot 2020-07-09 at 8 56 36 AM

Screen Shot 2020-07-09 at 8 56 46 AM

Screen Shot 2020-07-09 at 8 56 53 AM

Screen Shot 2020-07-09 at 8 57 00 AM

Screen Shot 2020-07-09 at 8 57 06 AM

Screen Shot 2020-07-09 at 11 13 38 AM

TEST PLAN

CI + unit tests
Set the timeouts to 1 second and see the new error component across Dashboards, Explore, and SQL Lab. Ensure everything renders correctly when there are no errors.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

to: @graceguo-supercat @ktmud @nytai @rusackas
cc: @Steejay

Copy link
Member

@ktmud ktmud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great feature! Added a couple of comments.

{['chart', 'dashboard'].includes(extra.location) && extra.owners && (
<>
<br />
<p>{t('Please reach out to the Chart Owner(s) for assistance.')}</p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why is "Chart Owners" in title case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is as defined in @Steejay's specs, which would you prefer here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can go either way if we are considering Chart Owner a proper noun for Superset and will be consistent everywhere.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@etr2460 I interpreted Chart Owners as the title of a role so I defined it w title case.

</>
)}
</div>
)}
Copy link
Member

@ktmud ktmud Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of two big code chunks, how about creating two subcomponents SimpleTimeoutErrorMessage and DetailedTimeoutErrorMessage?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me look into simplifying this a bit more. I might pull some things out into smaller, more reusable components. It's a bit tricky, because depending on where the error is being rendered, it's supposed to look different (sometimes using a modal, sometimes expanding inline)

export const ERROR_DATASOURCE_TOO_LARGE = {
message: t('Error 1000 - The datasource is too large to query.'),
link: `${ROOT_ERROR_REFERENCE_URL}error-1000`,
};
Copy link
Member

@ktmud ktmud Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be more structural? I'm imagining a errors.json somewhere that could potentially be shared by both the frontend and doc generator:

{
  "DATASOURCE_TOO_LARGE": {
    "code": 1001,
    "title": "The database is too large to query",
    "description": "It's likely your datasource has grown too large to run the current 
    query, and is timing out. You can resolve this by reducing the size of your datasource or by modifying your query to only process a subset of your data."
  },
  "DATASOURCE_OVERLOAD": {
    "code": 1002,
    ....
  }
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 this. Error messages should be defined in API level. So that API user (that use json response without js rendered chart) should get the same message when some query went wrong.
Front-end should handle how to render error by its code, not define what is this error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there are a couple things being conflated here I think. First off, an idea of API error codes. These are already consistently defined and described on both the backend and the frontend here:
https://github.com/apache/incubator-superset/blob/master/superset/errors.py#L24

and here:
https://github.com/apache/incubator-superset/blob/master/superset-frontend/src/components/ErrorMessage/types.ts#L21

Part of the problem is that each of these errors from a software perspective could have multiple possible root causes and mitigations from the user's perspective. In this case, a timeout error can have multiple different root causes and mitigations, and these root causes/mitigations could apply to many different software errors too. We could add another field onto the error payload coming from the backend to include root causes and mitigations, but I also want to ensure we have freedom on the frontend to render these errors in whatever rich format we see fit. Thus why the software's idea of errors exists on the backend, but the user interpretations live in the frontend.

Finally, as to making the errors more structural: I agree this can be defined better, I'll restructure the TS object. However, I'm a bit anxious about investing a bunch of time into autogenning docs from it since we just voted to change doc systems a couple days ago. It might be better to wait for the doc migration, then to build an autogenned doc page with error details

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. Didn't think it was necessary to build autogen right now either.

superset-frontend/src/components/Icon.tsx Outdated Show resolved Hide resolved
superset/utils/core.py Outdated Show resolved Hide resolved
@etr2460 etr2460 force-pushed the erik-ritter--timeout-error-ux branch from 1c4437d to cc26a70 Compare July 9, 2020 19:35
@graceguo-supercat
Copy link

Do you have specifications on what are errors, what are warnings etc?
Do you have UI samples for warning message?

@etr2460 etr2460 force-pushed the erik-ritter--timeout-error-ux branch 4 times, most recently from 0c3ebcf to 8aa7e32 Compare July 9, 2020 23:39
super(SupersetTimeoutException, self).__init__(message)
self.error = SupersetError(
error_type=error_type, message=message, level=level, extra=extra
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether we can just extend the attributes of SupersetError directly on SupersetException and use a custom JSON encoder to serialize it. This way you remove a layer of abstraction and the need to map the arguments for any new errors your might add in the future,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, possibly. I'm not super familiar with what the plan is for exception handling on the Python side, but having the SupersetExceptions automatically contain the error properties might make sense.

The reason I shied away from it originally was so that it was very clear what attributes the clients need to render the errors and what parts are server specific. But maybe that's not a big issue. Maybe @john-bodley has thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktmud could you expand (possibly with an example) on your thoughts? Are you looking for more of a generic container for handling error properties?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking maybe subclasses of SupersetException itself can handle encoding of error code and messages. Here's an example:

class SupersetException(Exception, SupersetAlert):
    def __init__(...) -> None:
        SupersetAlert.__init__(
            self, code=code, message=message, description=description, extra=extra
        )
        Exception.__init__(self, self.message)

def json_error_response(error: SupersetException) -> FlaskResponse:
   return jsonify({ "error": error.to_json() })

Full gist here

@etr2460 etr2460 force-pushed the erik-ritter--timeout-error-ux branch from 8aa7e32 to 00f8ecd Compare July 10, 2020 16:59
@codecov-commenter
Copy link

codecov-commenter commented Jul 10, 2020

Codecov Report

Merging #10274 into master will increase coverage by 4.88%.
The diff coverage is 32.40%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10274      +/-   ##
==========================================
+ Coverage   65.57%   70.46%   +4.88%     
==========================================
  Files         600      602       +2     
  Lines       32101    32177      +76     
  Branches     3244     3263      +19     
==========================================
+ Hits        21051    22672    +1621     
+ Misses      10869     9401    -1468     
+ Partials      181      104      -77     
Flag Coverage Δ
#cypress 55.22% <14.28%> (?)
#javascript 59.29% <20.22%> (-0.29%) ⬇️
#python 69.77% <57.89%> (+0.01%) ⬆️
Impacted Files Coverage Δ
superset-frontend/src/chart/Chart.jsx 59.25% <0.00%> (+48.84%) ⬆️
superset-frontend/src/chart/chartReducer.js 68.85% <ø> (+49.80%) ⬆️
...frontend/src/components/ErrorMessage/ErrorCode.tsx 0.00% <0.00%> (ø)
...onents/ErrorMessage/ErrorMessageWithStackTrace.tsx 26.31% <0.00%> (ø)
...rset-frontend/src/components/ErrorMessage/types.ts 100.00% <ø> (ø)
.../src/dashboard/components/gridComponents/Chart.jsx 86.02% <ø> (+20.43%) ⬆️
...frontend/src/dashboard/reducers/getInitialState.js 68.53% <ø> (+68.53%) ⬆️
...uperset-frontend/src/dashboard/util/propShapes.jsx 92.85% <ø> (ø)
...ntend/src/explore/components/ExploreChartPanel.jsx 72.22% <ø> (+61.11%) ⬆️
superset/models/slice.py 84.65% <ø> (ø)
... and 169 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b316f72...c44b8aa. Read the comment docs.

const message = (
<>
<p>
{t('This may be triggered by:')}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking the logic for this may be triggered by, should come from a standard, general json format generated from backend.
front-end should not decide what error is caused by what reason. Front-end timeout limit is defined in config, it's not something decided by front-end.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've introduced an error_codes mapping within extra that gets passed down from the backend. You're still free to alter it on the frontend as needed, but it consolidates potential root causes to the backend with the more developer helpful error_type

@etr2460 etr2460 force-pushed the erik-ritter--timeout-error-ux branch 6 times, most recently from 24c85f7 to c44b8aa Compare July 14, 2020 00:23
"message": _("Error 1001 - The database is under an unusual load."),
},
]
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, just realized a bigger question: are there cases that the backend will be able to tell which is the actual reason? If not, why can't they be the same error code?

## Error 1000 - Timeout communicating with datasource.

Your datasource may be too large to query or was under an unusual load.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also possible to remove such mapping altogether:

class SupersetError(IntEnum):
  BACKEND_TIMEOUT = 1001

class SupersetException(Exception):
    code = None
    
    def __init__(self, message=None, description=None):
        if not self.code:
           raise RuntimeError('Must not raise SupersetException directly')
        error_name = SupersetError(self.code).name
        self.message = message or _(f'error.{error_name}.message')
        self.description = description or _(f'error.{error_name}.message')

class SupersetTimeoutException(SupersetException):
    code = SupersetErrorEnum.BACKEND_TIMEOUT

Then add the detailed message and explanation in i18n.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this again points to the differentiation between the actual error and potential root causes. In this case, Error 1001 might be the root cause for many different errors (a timeout error, a generic DB error, a 503/504 from the db engine, etc.). That was the reason for splitting them into two different structures. However, I agree it's getting into a zone of too many different abstraction layers with both the error_type and the error_codes...

@Steejay, this is something where your thoughts would be helpful: e.g. what relationship do you see between error types and the in product user facing error codes?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of the error code is to suggest a path to a solution by highlighting the potential sources. Ideally we'd like to design a message that can pinpoint the source but this can't always be achieved so instead we try to provide as much context as possible to let the technical user troubleshoot. But youre right, the same error source can lead to different errors.

Copy link
Member

@ktmud ktmud Jul 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've expanded my thoughts a little and here's a more detailed example: https://gist.github.com/ktmud/59ccbdf6c96570e1ffe41ee8e5ecd479

Let me know if anything was unclear.

Even if an error may represent different root causes, we can return a generic error code and potential causes in extras if needed. If the backend is able to identify the root cause, then cool, we return a more specific error code. But there doesn't have to be another layer of abstraction with all the mappings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree error codes and possible root causes shouldn't be mixed together as suggesting root causes and providing remedies are more of a documentation issue rather than something the program cares about.

I've removed the description field in my example code, but other parts of the code still feel like an improvement to me though. There really is no need to have two separate entities for exception catching and messaging.

As for the mapping between symptoms and root causes, maybe we can extend the errors.json file I suggested earlier and add a causes field?

{
  "issues": {
    "DATASOURCE_TOO_LARGE": {
      "code": 1001,
      "title": "The database is too large to query",
      "description": "It's likely your datasource has grown too large to run the current query, and is timing out. You can resolve this by reducing the size of your datasource or by modifying your query to only process a subset of your data."
    },
    "DATASOURCE_OVERLOAD": {
      "code": 1002,
      // ....
    },
  },
  "causes": {
    "BACKEND_TIMEOUT_ERROR": ["DATASOURCE_TOO_LARGE", "DATASOURCE_OVERLOAD"],
  }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might need separate entities for exception catching and error messaging. One example is for async queries in sql lab. We would catch an exception from the query failing, but then only want to write a subset of attributes to the query table for returning to the client later. Then when the client polls for query status, we're returning an error payload within that request, but no exception is occuring. They seem like 2 different entities to me, and I'm not sure how much is improved by having a complex inheritance tree consisting of both native Python Exceptions and UI specific payload classes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are indeed two entities, one is runtime exception, another is the result payload used by the API. But it doesn't mean the payload cannot be generated by exception itself.

The improvement is that you don't have to manually generate the payload every time an exception occurs. You just store relevant information as top-level attributes on the Exception and some general error handlers will return a consistent payload whenever needed. The payload doesn't have to be UI specific. We can certainly return different payloads for storage and API by adding arguments to the to_json method, if that's what we want.

I just find it weird that you had to initialize two classes in order to raise an API-conforming exception. We already use class inheritance for Exceptions and used the pattern extensively for superset.commands.exceptions, so I'm not sure an additional mixin to serialize the exception is that much of complexity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@villebro @dpgaspar curious to hear your thoughts on this, too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, i haven't seen the superset.commands.exceptions module yet, i'll look at that and see how other parts of the code handles errors. This has been a wild west of sorts on the backend side, and I've mainly been focused on making sure whatever happens on the backend, the frontend gets something consistent. If there's a more standard approach now (i think it was discussed in SIP-41) then i'm happy to follow it

Copy link
Member

@ktmud ktmud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a caveat of some potential future refactoring:

  1. Create an extended SupersetErrorAlert component that renders conforming SupersetError in consistent format and make TimeoutErrorMessage reusable.
  2. (Optional) Consolidate SupersetError and SupersetException in the backend.

super(SupersetTimeoutException, self).__init__(message)
self.error = SupersetError(
error_type=error_type, message=message, level=level, extra=extra
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktmud could you expand (possibly with an example) on your thoughts? Are you looking for more of a generic container for handling error properties?

superset/utils/core.py Outdated Show resolved Hide resolved
@etr2460 etr2460 force-pushed the erik-ritter--timeout-error-ux branch 4 times, most recently from db3d101 to d041659 Compare July 20, 2020 16:23
@etr2460
Copy link
Member Author

etr2460 commented Jul 20, 2020

@graceguo-supercat and @ktmud, sorry for the delayed response here. I think this is in a good place for final review now. Some significant changes from before:

  • I've moved all issue strings from the frontend to the backend, so any client can generate appropriate error messages from the payload returned
  • I'm using the @handle_api_error wrapper on the backend to construct the appropriate error response when sql queries timeout. This seems to be the standard approach thus far and aligns with how SupersetSecurityException is handled in the new error format. Completely changing the handling methods on the backend seems out of scope here, but I'm going to move the conversation we had into SIP-41 as it seems quite pertinent to it
  • Error Codes have been renamed to Issue Codes as that wording seems more clear
    image

Thanks for taking the time to be diligent here and get this PR into a better state than it started. Hopefully it should be good to go!

@john-bodley
Copy link
Member

Are the constructs of how the errors are defined still a blocking issue? If so SIP-41 describes the somewhat current state of how errors are handled by the backend and there is clearly a number of inconsistencies which need to be addressed. I wonder whether it makes sense to flush out the architecture of error constructs in SIP-41 as the problem is far-reaching and beyond just the scope of the issues/concerns raised here.

Copy link
Member

@ktmud ktmud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, never intended to let the discussions regarding error handling in the backend block this PR. Sorry for not stamping earlier!

@@ -409,6 +409,10 @@ div.tablePopover {
padding-right: 8px;
}

.ResultSetErrorMessage {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mind use "-" not case in css class?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad, updated!

Copy link

@graceguo-supercat graceguo-supercat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one lint comment for css class name. otherwise LGTM!

@etr2460 etr2460 force-pushed the erik-ritter--timeout-error-ux branch from d041659 to e41361f Compare July 20, 2020 21:36
@etr2460 etr2460 merged commit 5fa4680 into apache:master Jul 20, 2020
auxten pushed a commit to auxten/incubator-superset that referenced this pull request Nov 20, 2020
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.38.0 labels Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/XL 🚢 0.38.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants