Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: synchronous API for getting query results #2825

Open
arikfr opened this issue Sep 16, 2018 · 3 comments
Open

Proposal: synchronous API for getting query results #2825

arikfr opened this issue Sep 16, 2018 · 3 comments

Comments

@arikfr
Copy link
Member

arikfr commented Sep 16, 2018

Currently the API to refresh queries/get query results is async:

  1. You ask Redash to run a query and get back a job id.
  2. You poll the jobs API for job status.
  3. When you get a response stating the job completed, you can run another API to get a result.

A working example can be found here.

There are many cases where a synchronous API will be much better and easier to use. The problem is that query execution time can be long and we don't want to block the worker for an extended period of time.

A reasonable solution in this case is to switch to async IO, so we can trigger query execution and wait for its completion. The main challenge with implementing this is that not all the query runners support async IO.

My suggested implementation is:

  • Use gunicorn with gevent worker class. This will allows taking advantage of async IO with supported libraries. One of the libraries supported by gevent is the Redis library.
  • The sync query results API will trigger a Celery job to run the query and wait for its completion. As waiting for job completion uses Redis API, it will be async and won't block gunicorn from serving other requests.

The above solution has two benefits:

  1. It reuses existing code and infrastructure for running queries.
  2. It can be implemented today with no changes to dependencies or requirements in how we run Redash.

In the future, when we move to Python 3 we can revisit this implementation to use async/await.


Of course this is only a suggestion and an invitation for a discussion.

@jezdez
Copy link
Member

jezdez commented Sep 17, 2018

Yup, that's a sensible strategy, and gunicorn allows us to switch to a AsyncIO based worker on py3k when we get there. What I'm not sure (or at least I'm probably missing some info) is how the redis-py library is support by gevent? Or are you saying it just happens to work well with gevent's socket monkeypatching?

@arikfr
Copy link
Member Author

arikfr commented Sep 17, 2018

Or are you saying it just happens to work well with gevent's socket monkeypatching?

Exactly.

I tested this in the past and it seemed that doing Redis calls doesn't block the gunicorn's gevent workers. Probably worth validating this before we start implementing :)

@phillipjohnson
Copy link

Hi all, checking to see if there's been any more thought/movement on this issue? It seems like it would definitely be advantageous to allow API consumers to request the data synchronously (accepting the risks of it being a long-running query). From an API usability perspective, it is certainly simpler and is probably ideal for most small- to medium-size datasets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants