-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SIP-141] Global Async Queries 2.0 #29515
Comments
I'd suggest making this configurable or to respect the "cancel queries on window onload event" that exists in the db connection settings. Some workflows involve dashboard being accessed multiple times a day, so there is some benefit to running and caching the queries even though there isn't someone actively waiting for them. Thinking of a case where someone opens a dashboard, it takes too long to load so they pack up and commute to the office, and open up the dashboard once they're at their desk. |
I've been thinking.... the problems of the sync model are clearly described by this SIP. The proposed solution is to use an async model with polling, but @betodealmeida and @rusackas have a valid point about latency for real-time data and also possible push features. Server-side events could be used but it wouldn't be sufficient as we clearly need bi-directional communication. Considering all requirements, shouldn't we use this opportunity to actually fully adopt WebSockets as the single solution to reduce the complexity of many modes of operation? It seems it would address all our requirements and avoid a situation where we add async polling now to later introduce another method for push features. |
I feel the issue ultimately comes down to maintainability of the feature. Currently, the main issues that we're facing with synchronous queries are as follows:
While the current WebSocket solution addresses the third point above, the other points are not addressed by the current solution, and in my experience, they're far more critical for enterprise deployments. I'm all for hardening and even expanding the use of WebSocket solution if we can find people to maintain and develop it further. However, given the low community involvement on the If we do feel WebSockets are critical for the future roadmap, I feel we still need to simplify the architecture, and support query deduplication and cancellation, as they are not within the scope of the current GAQ solution. |
@villebro Exactly! This 👆🏼 what I meant by fully adopting WebSockets. We would need to simplify its architecture, add missing features and use it as the primary solution in Superset to ensure maintainability. |
A quick summary from the discussion on this SIP in the Town Hall meeting on 2024-07-12:
The SIP will shortly be updated to reflect these changes. Furthermore, a weekly meeting has been setup for coordinating the effort to redesign the WS implementation to bring the core features and changes proposed in this SIP, most notably query deduplication, management UI for queued/running queries, and architectural simplification. The meeting will take place on Mondays at noon PST (all interested are welcome to join!) |
To add to the general consensus, WebSockets are crucial for large-scale deployments, especially when handling a significant number of active users. Long polling is inefficient for this purpose, and horizontal scaling isn't ideal since it requires scaling the supporting infrastructure as well. I'd be happy to contribute towards making WebSockets mainstream. |
I think the biggest hinderance to the adoption of the new WebSocket server is mostly the missing documentation. The most I could find this in the README superset/superset-websocket/README.md Line 73 in c7b8ae9
And overall it feels like a feature that sounds very much in development if there is nothing about it in the official documentation. That's why at least at my company we did not deploy the websocket server. |
We are closing this SIP and will be opening a new one to reflect a new direction that was established during discussions. But in summary:
|
Follow-up SIP here: #29839 |
[SIP-141] Proposal for Global Async Queries 2.0
Motivation
With [SIP-39] Global Async Query Support (GAQ for short) still being behind an experimental feature flag, and not actively maintained, I've been thinking about ways we could simplify the architecture, and finally make this feature generally available in a forthcoming Superset release. I feel the following issues have all done their part to keep this feature from gaining wide community traction:
Having said all this, the feature is still as relevant today as it was when the original SIP was opened, and I think stabilizing this feature is very important because Superset's current synchronous query execution model causes lots of issues:
It's also worth noting, that we've had async query support in SQL Lab for a very long time, and it tends to work very well, with a much simpler architecture than that proposed in SIP-39. Therefore, a GAQ framework doesn't necessarily require websockets or Redis Streams.
Proposed Change
To simplify the architecture and reuse existing functionality, I propose the following:
When chart data isn't available in the cache, only the cache_key is returned, along with additional details: when the most recent chart data request has been submitted, status (pending, executing), last heartbeat from the async worker etc.
The async execution flow is changed to be similar to SQL Lab async execution, with the following changes:
poll_ttl
to the query context, which makes it possible to automatically cancel queries that are not being actively polled. Every time the cache key is polled, the latest poll time is updated on the metadata object. While executing, the worker periodically checks the metadata object, and if thepoll_ttl
is defined, and if the last poll time exceeds the TTL, the query is cancelled. This ensures that if a person closes a dashboard with lots of long running queries, the queries are automatically cancelled if nobody is actively waiting for the results. By default, frontend requests have poll_ttl set to whichever value is set in the config (DEFAULT_CHART_DATA_POLL_TTL
). Cache warmup requests would likely not have apoll_ttl
set, so as to avoid unnecessary polling.superset_config.py
, which makes it possible to define how polling backoff should be implemented. The default behavior would be some sort of exponential backoff, where freshly started queries are polled more actively, and queries that have been pending/running for a long time are polled less frequently. When the frontend requests chart data, the backend provides the recommended wait time in the response based on the backoff function. Note, that backoff will be based on time passed since query submission time; this means, that if I open a dashboard with a chart that has a query that's been running for 10 minutes, the browser will repoll much slower than it would if the query would have been dispatched to the async workers right awaySome random thoughts:
New or Changed Public Interfaces
poll_ttl
: if set, the query will be cancelled unless a client has asked for data within the TTL bounds. This will ensure that dashboards that are closed don't leave orphaned chart data requests.execution_mode
: the client can ask the query to be executed sync, async or using the default mode. This is specifically added for programmatic integrations, where implementing the polling mechanism may sometimes add unnecessary complexity.poll_delay
: how many seconds should the client wait before checking if the query has completed. This value will be calculated by the backend based on the backoff function.status
andstart_dttm
: is the query queued or started, and when did the query start executing. This information can be used by the chart component to give the user information of the state of the query, similar to how we currently display how stale the cached data is. So in the future, we may display the following: "Query is queued", or "Query executing for 2 minutes".New dependencies
In the base proposal, I suggest not adding any new dependencies, and simply supporting polling. However, we may consider using Server-sent events as noted by @betodealmeida .
Migration Plan and Compatibility
SIMPLIFIED_GLOBAL_ASYNC_QUERIES
Rejected Alternatives
Server-sent events
to keep the implementation simple.The text was updated successfully, but these errors were encountered: