Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(embedded+async queries): support async queries to work with embedded guest user #26332

Merged
20 changes: 20 additions & 0 deletions superset/async_events/async_query_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -191,20 +191,40 @@
force: Optional[bool] = False,
user_id: Optional[int] = None,
) -> dict[str, Any]:
# pylint: disable=import-outside-toplevel
from superset import security_manager

job_metadata = self.init_job(channel_id, user_id)
if guest_user := security_manager.get_current_guest_user_if_guest():
job_metadata["guest_token"] = guest_user.guest_token

Check warning on line 199 in superset/async_events/async_query_manager.py

View check run for this annotation

Codecov / codecov/patch

superset/async_events/async_query_manager.py#L199

Added line #L199 was not covered by tests
self._load_explore_json_into_cache_job.delay(
job_metadata,
form_data,
response_type,
force,
)
# clean up guest token so that it doesn't get exposed upper level
if "guest_token" in job_metadata:
del job_metadata["guest_token"]

Check warning on line 208 in superset/async_events/async_query_manager.py

View check run for this annotation

Codecov / codecov/patch

superset/async_events/async_query_manager.py#L208

Added line #L208 was not covered by tests
return job_metadata

def submit_chart_data_job(
self, channel_id: str, form_data: dict[str, Any], user_id: Optional[int]
) -> dict[str, Any]:
# pylint: disable=import-outside-toplevel
from superset import security_manager

# if it's guest user, we want to pass the guest token to the celery task
# chart data cache key is calculated based on the current user
# this way we can keep the cache key consistent between sync and async command
# so that it can be looked up consistently
job_metadata = self.init_job(channel_id, user_id)
if guest_user := security_manager.get_current_guest_user_if_guest():
job_metadata["guest_token"] = guest_user.guest_token

Check warning on line 223 in superset/async_events/async_query_manager.py

View check run for this annotation

Codecov / codecov/patch

superset/async_events/async_query_manager.py#L223

Added line #L223 was not covered by tests
self._load_chart_data_into_cache_job.delay(job_metadata, form_data)
# clean up guest token so that it doesn't get exposed upper level
if "guest_token" in job_metadata:
del job_metadata["guest_token"]

Check warning on line 227 in superset/async_events/async_query_manager.py

View check run for this annotation

Codecov / codecov/patch

superset/async_events/async_query_manager.py#L227

Added line #L227 was not covered by tests
return job_metadata

def read_events(
Expand Down
10 changes: 9 additions & 1 deletion superset/common/query_context_processor.py
Original file line number Diff line number Diff line change
Expand Up @@ -600,7 +600,15 @@ def get_payload(
set_and_log_cache(
cache_manager.cache,
cache_key,
{"data": self._query_context.cache_values},
{
"data": {
# setting form_data into query context cache value as well
# so that it can be used to reconstruct form_data field
# for query context object when reading from cache
"form_data": self._query_context.form_data,
Comment on lines +605 to +608
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have concerns here.. We've discussed this at length previously, and I feel we should always consider form_data to be chart specific internal state. IIRC, the original reason that form data is being passed along to the celery task is because the legacy charts require it to function. However, the v1 charts shouldn't need it. I know many fields within form_data are sort of standardized, but the whole point of QueryContext/QueryObject was to standardize the chart data request contract, thus decoupling the viz plugins from the backend. Therefore, I feel any field that is required from form_data should ideally be allocated a dedicated property on either the query context or query object.

**self._query_context.cache_values,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@villebro do you have any thoughts on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zephyring from where does the raise_for_access get called when it is passing in the query context from cache?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first call to /chart/data endpoint will set the query context into cache and return async job metadata.
Then later once job is done frontend will receive a job ready event with the result_url something like (/api/v1/chart/data/qc-2772209e0a4b1ca513375118855b27af).
Then it uses that to call chart/data/<cache> endpoint which invokes the ChartCommand and the command validation will call the raise_for_access

},
},
self.get_cache_timeout(),
)
return_value["cache_key"] = cache_key # type: ignore
Expand Down
36 changes: 27 additions & 9 deletions superset/tasks/async_queries.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,10 +66,16 @@
# pylint: disable=import-outside-toplevel
from superset.commands.chart.data.get_data_command import ChartDataCommand

user = (
security_manager.get_user_by_id(job_metadata.get("user_id"))
or security_manager.get_anonymous_user()
)
if user_id := job_metadata.get("user_id"):
# logged in user
user = security_manager.get_user_by_id(user_id)
elif guest_token := job_metadata.get("guest_token"):

Check warning on line 72 in superset/tasks/async_queries.py

View check run for this annotation

Codecov / codecov/patch

superset/tasks/async_queries.py#L72

Added line #L72 was not covered by tests
# embedded guest user
user = security_manager.get_guest_user_from_token(guest_token)
del job_metadata["guest_token"]

Check warning on line 75 in superset/tasks/async_queries.py

View check run for this annotation

Codecov / codecov/patch

superset/tasks/async_queries.py#L74-L75

Added lines #L74 - L75 were not covered by tests
else:
# default to anonymous user if no user is found
user = security_manager.get_anonymous_user()

Check warning on line 78 in superset/tasks/async_queries.py

View check run for this annotation

Codecov / codecov/patch

superset/tasks/async_queries.py#L78

Added line #L78 was not covered by tests

with override_user(user, force=False):
try:
Expand Down Expand Up @@ -106,10 +112,16 @@
) -> None:
cache_key_prefix = "ejr-" # ejr: explore_json request

user = (
security_manager.get_user_by_id(job_metadata.get("user_id"))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if user_id is None, get_user_by_id() will crash the celery task

or security_manager.get_anonymous_user()
)
if user_id := job_metadata.get("user_id"):
# logged in user
user = security_manager.get_user_by_id(user_id)
elif guest_token := job_metadata.get("guest_token"):

Check warning on line 118 in superset/tasks/async_queries.py

View check run for this annotation

Codecov / codecov/patch

superset/tasks/async_queries.py#L118

Added line #L118 was not covered by tests
# embedded guest user
user = security_manager.get_guest_user_from_token(guest_token)
del job_metadata["guest_token"]

Check warning on line 121 in superset/tasks/async_queries.py

View check run for this annotation

Codecov / codecov/patch

superset/tasks/async_queries.py#L120-L121

Added lines #L120 - L121 were not covered by tests
else:
# default to anonymous user if no user is found
user = security_manager.get_anonymous_user()

Check warning on line 124 in superset/tasks/async_queries.py

View check run for this annotation

Codecov / codecov/patch

superset/tasks/async_queries.py#L124

Added line #L124 was not covered by tests
eschutho marked this conversation as resolved.
Show resolved Hide resolved

with override_user(user, force=False):
try:
Expand Down Expand Up @@ -140,7 +152,13 @@
"response_type": response_type,
}
cache_key = generate_cache_key(cache_value, cache_key_prefix)
set_and_log_cache(cache_manager.cache, cache_key, cache_value)
cache_instance = cache_manager.cache
cache_timeout = (
cache_instance.cache.default_timeout if cache_instance.cache else None
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using cache_instance default_timeout if found. Otherwise it will fallback to config["CACHE_DEFAULT_TIMEOUT"]

)
set_and_log_cache(
cache_instance, cache_key, cache_value, cache_timeout=cache_timeout
)
result_url = f"/superset/explore_json/data/{cache_key}"
async_query_manager.update_job(
job_metadata,
Expand Down
Loading