Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(embedded+async queries): support async queries to work with embedded guest user #26332

Merged
11 changes: 11 additions & 0 deletions superset/async_events/async_query_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -203,8 +203,19 @@
def submit_chart_data_job(
self, channel_id: str, form_data: dict[str, Any], user_id: Optional[int]
) -> dict[str, Any]:
# pylint: disable=import-outside-toplevel
from superset import security_manager

# if it's guest user, we want to pass the guest token to the celery task
# chart data cache key is calculated based on the current user
# this way we can keep the cache key consistent between sync and async command
# so that it can be looked up consistently
job_metadata = self.init_job(channel_id, user_id)
if guest_user := security_manager.get_current_guest_user_if_guest():
job_metadata["guest_token"] = guest_user.guest_token

Check warning on line 215 in superset/async_events/async_query_manager.py

View check run for this annotation

Codecov / codecov/patch

superset/async_events/async_query_manager.py#L215

Added line #L215 was not covered by tests
self._load_chart_data_into_cache_job.delay(job_metadata, form_data)
if "guest_token" in job_metadata:
del job_metadata["guest_token"]

Check warning on line 218 in superset/async_events/async_query_manager.py

View check run for this annotation

Codecov / codecov/patch

superset/async_events/async_query_manager.py#L218

Added line #L218 was not covered by tests
return job_metadata

def read_events(
Expand Down
10 changes: 9 additions & 1 deletion superset/common/query_context_processor.py
Original file line number Diff line number Diff line change
Expand Up @@ -600,7 +600,15 @@ def get_payload(
set_and_log_cache(
cache_manager.cache,
cache_key,
{"data": self._query_context.cache_values},
{
"data": {
# setting form_data into query context cache value as well
# so that it can be used to reconstruct form_data field
# for query context object when reading from cache
"form_data": self._query_context.form_data,
Comment on lines +605 to +608
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have concerns here.. We've discussed this at length previously, and I feel we should always consider form_data to be chart specific internal state. IIRC, the original reason that form data is being passed along to the celery task is because the legacy charts require it to function. However, the v1 charts shouldn't need it. I know many fields within form_data are sort of standardized, but the whole point of QueryContext/QueryObject was to standardize the chart data request contract, thus decoupling the viz plugins from the backend. Therefore, I feel any field that is required from form_data should ideally be allocated a dedicated property on either the query context or query object.

**self._query_context.cache_values,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@villebro do you have any thoughts on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zephyring from where does the raise_for_access get called when it is passing in the query context from cache?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first call to /chart/data endpoint will set the query context into cache and return async job metadata.
Then later once job is done frontend will receive a job ready event with the result_url something like (/api/v1/chart/data/qc-2772209e0a4b1ca513375118855b27af).
Then it uses that to call chart/data/<cache> endpoint which invokes the ChartCommand and the command validation will call the raise_for_access

},
},
self.get_cache_timeout(),
)
return_value["cache_key"] = cache_key # type: ignore
Expand Down
22 changes: 14 additions & 8 deletions superset/tasks/async_queries.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,10 +66,16 @@
# pylint: disable=import-outside-toplevel
from superset.commands.chart.data.get_data_command import ChartDataCommand

user = (
security_manager.get_user_by_id(job_metadata.get("user_id"))
or security_manager.get_anonymous_user()
)
if user_id := job_metadata.get("user_id"):
# logged in user
user = security_manager.get_user_by_id(user_id)
elif guest_token := job_metadata.get("guest_token"):

Check warning on line 72 in superset/tasks/async_queries.py

View check run for this annotation

Codecov / codecov/patch

superset/tasks/async_queries.py#L72

Added line #L72 was not covered by tests
# embedded guest user
user = security_manager.get_guest_user_from_token(guest_token)
del job_metadata["guest_token"]

Check warning on line 75 in superset/tasks/async_queries.py

View check run for this annotation

Codecov / codecov/patch

superset/tasks/async_queries.py#L74-L75

Added lines #L74 - L75 were not covered by tests
else:
# default to anonymous user if no user is found
user = security_manager.get_anonymous_user()

Check warning on line 78 in superset/tasks/async_queries.py

View check run for this annotation

Codecov / codecov/patch

superset/tasks/async_queries.py#L78

Added line #L78 was not covered by tests

with override_user(user, force=False):
try:
Expand Down Expand Up @@ -106,10 +112,10 @@
) -> None:
cache_key_prefix = "ejr-" # ejr: explore_json request

user = (
security_manager.get_user_by_id(job_metadata.get("user_id"))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if user_id is None, get_user_by_id() will crash the celery task

or security_manager.get_anonymous_user()
)
if user_id := job_metadata.get("user_id"):
user = security_manager.get_user_by_id(user_id)
else:
user = security_manager.get_anonymous_user()

Check warning on line 118 in superset/tasks/async_queries.py

View check run for this annotation

Codecov / codecov/patch

superset/tasks/async_queries.py#L118

Added line #L118 was not covered by tests
eschutho marked this conversation as resolved.
Show resolved Hide resolved

with override_user(user, force=False):
try:
Expand Down
Loading