Athena implementation is slow for big queries #4684

masterlittle · 2020-02-25T16:42:57Z

Hi,
I am using athena with Redash and found that it is pretty slow for bigger queries. ~20 lac rows

The query finished withing 2 minutes in athena but took more than 12-13 minutes in Redash to get the results. After looking at the code, i found that pyathena cursor is being used which I think may be the bottleneck, as it fetches results and serialises them into JSON. This has become a bit of a blocker at our side.

I've worked on an alternate implementation which involves transferring the CSV result file directly from Athena S3 to Redash machine and converting the records to json with python's csv DictReader. It has improved the speed of fetching data to 2-3 minutes for the same query.

bucket, key = parse_output_location(cursor.output_location)
s3 = boto3.client('s3', **self._get_iam_credentials(user=user))
athena_query_results_file = athena_query_id
with open(athena_query_results_file, 'wb') as w:
s3.download_fileobj(bucket, key, w)
with open(athena_query_results_file, 'r+') as f:
rows = list(csv.DictReader(f))
Crux of the change

P.S - Should I create a PR or first get the idea reviewed here?

The text was updated successfully, but these errors were encountered:

support · 2020-02-25T16:43:04Z

👋 @masterlittle, we use the issue tracker exclusively for bug reports and planned work. However, this issue appears to be a support request. Please use our forum to get help.

masterlittle added the Support Question label Feb 25, 2020

support bot closed this as completed Feb 25, 2020

weekly-digest bot mentioned this issue Mar 2, 2020

Weekly Digest (24 February, 2020 - 2 March, 2020) #4702

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Athena implementation is slow for big queries #4684

Athena implementation is slow for big queries #4684

masterlittle commented Feb 25, 2020 •

edited

Loading

support bot commented Feb 25, 2020

Athena implementation is slow for big queries #4684

Athena implementation is slow for big queries #4684

Comments

masterlittle commented Feb 25, 2020 • edited Loading

support bot commented Feb 25, 2020

masterlittle commented Feb 25, 2020 •

edited

Loading