Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Athena implementation is slow for big queries #4684

Closed
masterlittle opened this issue Feb 25, 2020 · 1 comment
Closed

Athena implementation is slow for big queries #4684

masterlittle opened this issue Feb 25, 2020 · 1 comment

Comments

@masterlittle
Copy link

masterlittle commented Feb 25, 2020

Hi,
I am using athena with Redash and found that it is pretty slow for bigger queries. ~20 lac rows

The query finished withing 2 minutes in athena but took more than 12-13 minutes in Redash to get the results. After looking at the code, i found that pyathena cursor is being used which I think may be the bottleneck, as it fetches results and serialises them into JSON. This has become a bit of a blocker at our side.

I've worked on an alternate implementation which involves transferring the CSV result file directly from Athena S3 to Redash machine and converting the records to json with python's csv DictReader. It has improved the speed of fetching data to 2-3 minutes for the same query.

bucket, key = parse_output_location(cursor.output_location)
s3 = boto3.client('s3', **self._get_iam_credentials(user=user))
athena_query_results_file = athena_query_id
with open(athena_query_results_file, 'wb') as w:
s3.download_fileobj(bucket, key, w)
with open(athena_query_results_file, 'r+') as f:
rows = list(csv.DictReader(f))
Crux of the change

P.S - Should I create a PR or first get the idea reviewed here?

@support
Copy link

support bot commented Feb 25, 2020

👋 @masterlittle, we use the issue tracker exclusively for bug reports and planned work. However, this issue appears to be a support request. Please use our forum to get help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant