Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redash OOM with mid-sized queries #4867

Closed
noxdafox opened this issue May 6, 2020 · 3 comments
Closed

Redash OOM with mid-sized queries #4867

noxdafox opened this issue May 6, 2020 · 3 comments

Comments

@noxdafox
Copy link

noxdafox commented May 6, 2020

Greetings,

we have an instance of Redash 8.0 deployed in an AWS EC2 instance using Redash provided AMI. The local PostgreSQL DB has been replaced with a dedicated AWS RDS one.

The EC2 instance is of type t3.large which comes with 8Gb of memory.

We have a single data source which is Athena.

The following query:

SELECT * FROM <table> limit 1000000;

runs in few seconds on Athena and produces a 444Mb CSV file.

When running it through Redash, it takes an indefinite amount of time until it fails with a Error running query: Worker exited prematurely: signal 9 (SIGKILL)..
Looking at the memory of the instance, we see Redash consumption growing until it fills the instance memory forcing the OOM killer to kick in.

We can reproduce the issue both via the web UI and the API api/query_results.

@noxdafox
Copy link
Author

noxdafox commented May 6, 2020

This is most likely a duplicate of #3241.

I have very similar use case: Users want to download csv and xlsx files to work locally. We want to us Redash both for showing charts and dashboards but also as access point for Athena when it comes to pre-filtering data.

@susodapop
Copy link
Contributor

Agree this is a duplicate of #3241 and Arik's guidance there still stands. This is currently expected behavior. Redash is meant to display aggregated datasets below ~50k rows / 50mb. Even if the query runner didn't run out of memory your browser would crash trying to load 1M rows. If you bypass the front-end, the workers can usually handle results around ~250mb via the API (or larger if you provision them appropriately).

But this is well outside Redash's scope. You should probably consider a different tool for pulling such large datasets into Excel (perhaps PowerQuery?)

@noxdafox
Copy link
Author

Indeed my use case is not accessing to such large results via the browser but rather letting the Users interface programmatically through the API. I will explain my use case in #78.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants