You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are many cases where a synchronous API will be much better and easier to use. The problem is that query execution time can be long and we don't want to block the worker for an extended period of time.
A reasonable solution in this case is to switch to async IO, so we can trigger query execution and wait for its completion. The main challenge with implementing this is that not all the query runners support async IO.
My suggested implementation is:
Use gunicorn with gevent worker class. This will allows taking advantage of async IO with supported libraries. One of the libraries supported by gevent is the Redis library.
The sync query results API will trigger a Celery job to run the query and wait for its completion. As waiting for job completion uses Redis API, it will be async and won't block gunicorn from serving other requests.
The above solution has two benefits:
It reuses existing code and infrastructure for running queries.
It can be implemented today with no changes to dependencies or requirements in how we run Redash.
In the future, when we move to Python 3 we can revisit this implementation to use async/await.
Of course this is only a suggestion and an invitation for a discussion.
The text was updated successfully, but these errors were encountered:
Yup, that's a sensible strategy, and gunicorn allows us to switch to a AsyncIO based worker on py3k when we get there. What I'm not sure (or at least I'm probably missing some info) is how the redis-py library is support by gevent? Or are you saying it just happens to work well with gevent's socket monkeypatching?
Or are you saying it just happens to work well with gevent's socket monkeypatching?
Exactly.
I tested this in the past and it seemed that doing Redis calls doesn't block the gunicorn's gevent workers. Probably worth validating this before we start implementing :)
Hi all, checking to see if there's been any more thought/movement on this issue? It seems like it would definitely be advantageous to allow API consumers to request the data synchronously (accepting the risks of it being a long-running query). From an API usability perspective, it is certainly simpler and is probably ideal for most small- to medium-size datasets.
Currently the API to refresh queries/get query results is async:
A working example can be found here.
There are many cases where a synchronous API will be much better and easier to use. The problem is that query execution time can be long and we don't want to block the worker for an extended period of time.
A reasonable solution in this case is to switch to async IO, so we can trigger query execution and wait for its completion. The main challenge with implementing this is that not all the query runners support async IO.
My suggested implementation is:
The above solution has two benefits:
In the future, when we move to Python 3 we can revisit this implementation to use async/await.
Of course this is only a suggestion and an invitation for a discussion.
The text was updated successfully, but these errors were encountered: