Proposal: synchronous API for getting query results #2825

arikfr · 2018-09-16T07:39:58Z

Currently the API to refresh queries/get query results is async:

You ask Redash to run a query and get back a job id.
You poll the jobs API for job status.
When you get a response stating the job completed, you can run another API to get a result.

A working example can be found here.

There are many cases where a synchronous API will be much better and easier to use. The problem is that query execution time can be long and we don't want to block the worker for an extended period of time.

A reasonable solution in this case is to switch to async IO, so we can trigger query execution and wait for its completion. The main challenge with implementing this is that not all the query runners support async IO.

My suggested implementation is:

Use gunicorn with gevent worker class. This will allows taking advantage of async IO with supported libraries. One of the libraries supported by gevent is the Redis library.
The sync query results API will trigger a Celery job to run the query and wait for its completion. As waiting for job completion uses Redis API, it will be async and won't block gunicorn from serving other requests.

The above solution has two benefits:

It reuses existing code and infrastructure for running queries.
It can be implemented today with no changes to dependencies or requirements in how we run Redash.

In the future, when we move to Python 3 we can revisit this implementation to use async/await.

Of course this is only a suggestion and an invitation for a discussion.

jezdez · 2018-09-17T14:20:46Z

Yup, that's a sensible strategy, and gunicorn allows us to switch to a AsyncIO based worker on py3k when we get there. What I'm not sure (or at least I'm probably missing some info) is how the redis-py library is support by gevent? Or are you saying it just happens to work well with gevent's socket monkeypatching?

arikfr · 2018-09-17T14:36:57Z

Or are you saying it just happens to work well with gevent's socket monkeypatching?

Exactly.

I tested this in the past and it seemed that doing Redis calls doesn't block the gunicorn's gevent workers. Probably worth validating this before we start implementing :)

phillipjohnson · 2019-09-20T19:39:11Z

Hi all, checking to see if there's been any more thought/movement on this issue? It seems like it would definitely be advantageous to allow API consumers to request the data synchronously (accepting the risks of it being a long-running query). From an API usability perspective, it is certainly simpler and is probably ideal for most small- to medium-size datasets.

arikfr added Backend Feature: API labels Sep 16, 2018

jezdez mentioned this issue Sep 24, 2018

Show "results not available" on public dashboard when not available yet #2844

Closed

arikfr mentioned this issue Jan 15, 2019

Ability to force a refresh before fetching results #1293

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: synchronous API for getting query results #2825

Proposal: synchronous API for getting query results #2825

arikfr commented Sep 16, 2018

jezdez commented Sep 17, 2018

arikfr commented Sep 17, 2018

phillipjohnson commented Sep 20, 2019

Proposal: synchronous API for getting query results #2825

Proposal: synchronous API for getting query results #2825

Comments

arikfr commented Sep 16, 2018

jezdez commented Sep 17, 2018

arikfr commented Sep 17, 2018

phillipjohnson commented Sep 20, 2019