You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I am using athena with Redash and found that it is pretty slow for bigger queries. ~20 lac rows
The query finished withing 2 minutes in athena but took more than 12-13 minutes in Redash to get the results. After looking at the code, i found that pyathena cursor is being used which I think may be the bottleneck, as it fetches results and serialises them into JSON. This has become a bit of a blocker at our side.
I've worked on an alternate implementation which involves transferring the CSV result file directly from Athena S3 to Redash machine and converting the records to json with python's csv DictReader. It has improved the speed of fetching data to 2-3 minutes for the same query.
bucket, key = parse_output_location(cursor.output_location) s3 = boto3.client('s3', **self._get_iam_credentials(user=user)) athena_query_results_file = athena_query_id with open(athena_query_results_file, 'wb') as w: s3.download_fileobj(bucket, key, w) with open(athena_query_results_file, 'r+') as f: rows = list(csv.DictReader(f))
Crux of the change
P.S - Should I create a PR or first get the idea reviewed here?
The text was updated successfully, but these errors were encountered:
👋 @masterlittle, we use the issue tracker exclusively for bug reports and planned work. However, this issue appears to be a support request. Please use our forum to get help.
Hi,
I am using athena with Redash and found that it is pretty slow for bigger queries. ~20 lac rows
The query finished withing 2 minutes in athena but took more than 12-13 minutes in Redash to get the results. After looking at the code, i found that pyathena cursor is being used which I think may be the bottleneck, as it fetches results and serialises them into JSON. This has become a bit of a blocker at our side.
I've worked on an alternate implementation which involves transferring the CSV result file directly from Athena S3 to Redash machine and converting the records to json with python's csv DictReader. It has improved the speed of fetching data to 2-3 minutes for the same query.
bucket, key = parse_output_location(cursor.output_location)
s3 = boto3.client('s3', **self._get_iam_credentials(user=user))
athena_query_results_file = athena_query_id
with open(athena_query_results_file, 'wb') as w:
s3.download_fileobj(bucket, key, w)
with open(athena_query_results_file, 'r+') as f:
rows = list(csv.DictReader(f))
Crux of the change
P.S - Should I create a PR or first get the idea reviewed here?
The text was updated successfully, but these errors were encountered: