-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(SIP-39): Async query support for charts #11499
Conversation
Codecov Report
@@ Coverage Diff @@
## master #11499 +/- ##
===========================================
- Coverage 67.56% 54.42% -13.15%
===========================================
Files 942 432 -510
Lines 45812 15253 -30559
Branches 4395 3891 -504
===========================================
- Hits 30955 8302 -22653
+ Misses 14752 6951 -7801
+ Partials 105 0 -105
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi, once this is implemented on the frontend, you'll want to bypass all the domain sharding logic as it'll no longer be needed and it probably wouldn't work with the new cookies anyway
return f"{key_prefix}{hash}" | ||
|
||
|
||
def set_and_log_cache( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bkyryliuk I'm starting to move/centralize the caching logic, and have not included the CacheKey
write that is present in the viz.py
implementation because 1) I don't understand the use case, and 2) we can't afford the performance hit of writing to the metadata DB in the same method that writes to a K/V store (we're currently using Redis). Can you elaborate on CacheKey
and how it's used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, cache key tables stores the mapping between datasources and cache keys to enable the invalidation by datasource though api endpoint. In our use case we have external ETL systems that are updating tables that are powering the superset charts and we are using this model and API end point to evict / invalidate cache records for the tables that were updated in our ETL systems. It is fairly critical for us to keep data in superset fresh and trustworthy. This endpoint has fairly low traffic as is hit only on the explore actions compared to the logging calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to keep it, it would be fine to hide it behind the feature flag is you expect to experience performance issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a STORE_CACHE_KEYS_IN_METADATA_DB
config value to enable this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass comments. Can't wait to see this merged!
return job_metadata | ||
|
||
def set_query_context(self, form_data: Dict[str, Any]) -> QueryContext: | ||
self._form_data = form_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest doing this "init" logic in the ctor
, as these set_xxx
methods may be missed, and adding extra functionality down the road will require all usages of this CMD to be updated to call these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QueryContext
was created via the constructor originally, but varying use cases required breaking it out into it's own method.
{"channel": async_channel_id, "user_id": user_id} | ||
) | ||
|
||
response.set_cookie( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just use FLASK_JWT_EXTENDED here? (https://flask-jwt-extended.readthedocs.io/en/stable/installation/) All of this functionality (cookie/header handling) is handled by them, and as far as I know FAB is adding it soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial intent was to use FLASK_JWT_EXTENDED
for this, but it was incompatible for reasons I don't now remember.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we at least provide a means by which we can extend/override this functionality that doesn't involve monkey patching? I'm thinking something like defining a "JWTManager" in superset which just does these things, and then instantiate it from a string inside the init_app
method, which would allow folks to tune this behavior. Said manager would define a few hooks like "load_jwt" "store_jwt" etc. Also, it would deal with the basic skeleton of the JWT structure, which would be customizable this way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This JWT is narrowly-scoped in terms of functionality, and is intentionally separate from any (optional) application auth JWT usage. The main (currently only) use case here is to securely pass a channel ID between the client, Flask app and (future) websocket server. What kind of customization to you foresee being required here?
Looking back at FLASK_JWT_EXTENDED
, I think the main issue here was that it doesn't support multiple tokens/cookies in the app, if JWT is being used as the main auth mechanism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder have you looked into Authlib before? It seems to have terrific documentation and could be incorporated for future API oauth work, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, we've evaluated Authlib at Preset in other contexts, but it feels like overkill for this use case.
return response | ||
|
||
def generate_jwt(self, data: Dict[str, Any]) -> str: | ||
encoded_jwt = jwt.encode(data, self._jwt_secret, algorithm="HS256") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This algo needs to be configurable. Also, the JWT contents here aren't following the JWT standard. Typically, sub
is used to track the "id" of a user. Again, the lib mentioned above deals with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JWT standard doesn't prescribe any required claim fields, but happy to use the sub
field for user_id
here (or just remove it altogether). Also happy to make the algo configurable, but there's a point of diminishing returns for configuration here, IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional configuration can also be added in future PRs as needs are more clearly defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. The claim
field is up to the implementor. A typical "bare bones" JWT would look something like:
{
"sub": "1234567890",
"iat": 1516239022,
"exp": 1516239022,
"claims": {
... whatever stuff you want
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4.1. Registered Claim Names
None of the claims defined below are intended to be mandatory to use or implement in all
cases, but rather they provide a starting point for a set of useful,
interoperable claims. Applications using JWTs should define which
specific claims they use and when they are required or optional.
return data | ||
|
||
def parse_jwt_from_request(self, request: Request) -> Dict[str, Any]: | ||
token = request.cookies.get(self._jwt_cookie_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I'd delegate this to the lib. One other case that's missing here is Authorization
header support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments above re: FLASK_JWT_EXTENDED
form_data = cached.get("form_data") | ||
response_type = cached.get("response_type") | ||
|
||
datasource_id, datasource_type = get_datasource_info(None, None, form_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming perm checks happen in here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, no. Permission check happens in the check_explore_cache_perms
method in the @etag_cache
decorator ¯\_(ツ)_/¯
Overall, looks good! One thing I noticed was a bit of hand-rolling around JWT handling. I suggest leaning on |
OK @villebro @ktmud @dpgaspar @craig-rueda finally got all tests passing, and have addressed the majority of the feedback here. How are folks feeling about merging this MVP, with the understanding that this feature is still experimental and will continue to be iterated upon? |
Can we update the JWT structure at least? |
superset/config.py
Outdated
@@ -327,6 +327,8 @@ def _try_json_readsha( # pylint: disable=unused-argument | |||
"DISPLAY_MARKDOWN_HTML": True, | |||
# When True, this escapes HTML (rather than rendering it) in Markdown components | |||
"ESCAPE_MARKDOWN_HTML": False, | |||
"GLOBAL_ASYNC_QUERIES": False, | |||
"GLOBAL_ASYNC_QUERIES_OPTIONS": {"transport": "polling", "polling_delay": 250}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I really think we should keep feature flags booleans... What I imagined was to add the options as a global config value, add a separate export entry here, change initiFeatureFlags to something like initCommonBootstrap()
, and add such global options there. Or, pass it to Redux via each page's getInitialState
.
But if you find it easier to expose these in the feature flags, we can always refactor later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to expose config vars to the frontend by adding them here, then they should be available in boostrap_data.common
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, agreed this it not ideal, and that Feature Flags should remain booleans. I'll refactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
Co-authored-by: John Bodley <john.bodley@airbnb.com> (cherry picked from commit c306368)
* upstream/master: (55 commits) feat(explore): time picker enhancement (apache#11418) feat: update alert/report icons and column order (apache#12081) feat(explore): metrics and filters controls redesign (apache#12095) feat(alerts/reports): add refresh action (apache#12071) chore: add latest tag action (apache#11148) fix(reports): increase crontab size and alert fixes (apache#12056) Small typo fix in Athena connection docs (apache#12099) feat(queries): security perm simplification (apache#12072) feat(databases): security perm simplification (apache#12036) feat(dashboards): security permissions simplification (apache#12012) feat(logs): security permissions simplification (apache#12061) chore: Remove unused CodeModal (apache#11972) Fix typescript error (apache#12074) fix: handle context-dependent feature flags in CLI (apache#12088) fix: Fix "View in SQLLab" bug (apache#12086) feat(alert/report): add 'not null' condition option to modal (apache#12077) bumping superset ui to 15.18 and deckgl to 0.3.2 (apache#12078) fix: Python dependencies in apache#11499 (apache#12079) reset active tab on open (apache#12048) fix: improve import flow UI/UX (apache#12070) ...
SUMMARY
Adds support for running chart queries in async workers, as proposed by SIP-39, under a new
GLOBAL_ASYNC_QUERIES
feature flag./api/v1/async_event/
endpoint for polling.superset-frontend
to poll for events when the feature flag is enabled and there are chart components in aloading
state./api/v1/chart/data
API and the legacy/superset/explore_json
endpoint in both dashboards and Explore (behindGLOBAL_ASYNC_QUERIES
feature flag).set_and_log_cache
out ofviz.py
.Remaining issues:
TODO in separate follow-up PRs:
TEST PLAN
GLOBAL_ASYNC_QUERIES
feature flag in your test environment and ensure correct chart rendering in both dashboards and ExploreADDITIONAL INFORMATION