-
Notifications
You must be signed in to change notification settings - Fork 700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
awswrangler.athena.read_sql_query
is leaking memory
#1678
Comments
Same here with a very similar setting. I also tested some possible solutions acting directly into the
but none of the above actually worked. Also switching to AWS Fargate after a code refactoring (sequential lambda executions -> iterations within a for loop) didn't solve the issue, suggesting it is service-independent (just takes more iterations to exhaust memory thanks to Fargate better specs). However, I found interesting what highlighted in this AWS Compute Blog:
Thanks @davnn for the initiative! |
Not sure if this will help, but what is your |
I experienced the above behaviour with |
Oops, my mistake, the issue is with |
@AlexMetsai I tried the downgrade to Do you have any insights about the (maybe responsible) changes involved in |
@a-slice-of-py You 're welcome. :) Unfortunately I don't have any insights, we just fell back to a previous configuration that didn't showcase the issue (which had the previous pandas version). I would like to take a closer look into it, but so far I was busy with deployments (especially trying to allocate a no-larger-than-really-needed container), so I was mostly hoping that somebody from pandas or aws will fix this. 😆 |
Can confirm that a downgrade to |
Confirmed this is an issue with pandas
I'll go ahead and update our deps to @aws/aws-sdk-pandas We may want to consider a micro release after this updated to |
We will need more info to fix this on the pandas side. The issue you linked is a false positive. It does not describe a memory leak |
Thanks @phofl I'll scratch that from the above comment. I'll see if I can dig into where exactly in the method we are encountering the memory issues. |
Working to reproduce error using build from |
Does your code show future warnings? If yes, you could wait for 1.5.1, should come out in the next few days |
I have confirmed this is no longer an issue with
Keep in mind python version will need to meet |
Pandas 1.5.0 had a memory leak with awswrangler, but according to aws#1678 it is no longer the case. The current dependency expressed in pyproject.toml locks dependents to use only 1.5.1. Since the bug has been fixed in pandas we can now remove the lock.
Pandas 1.5.0 had a memory leak with awswrangler, but according to aws#1678 it is no longer the case. The current dependency expressed in pyproject.toml locks dependents to use only 1.5.1. Since the bug has been fixed in pandas we can now remove the lock.
Pandas 1.5.0 had a memory leak with awswrangler, but according to aws#1678 it is no longer the case. The current dependency expressed in pyproject.toml locks dependents to use only 1.5.1. Since the bug has been fixed in pandas we can now remove the lock.
Pandas 1.5.0 had a memory leak with awswrangler, but according to aws#1678 it is no longer the case. The current dependency expressed in pyproject.toml locks dependents to use only 1.5.1. Since the bug has been fixed in pandas we can now remove the lock.
Pandas 1.5.0 had a memory leak with awswrangler, but according to #1678 it is no longer the case. The current dependency expressed in pyproject.toml locks dependents to use only 1.5.1. Since the bug has been fixed in pandas we can now remove the lock. Co-authored-by: Pierre Souchay <pierre.souchay@axaclimate.com> Co-authored-by: jaidisido <jaidisido@gmail.com>
Describe the bug
Hey, we stumbled into a memory leak problem with our AWS lambdas and (after a painful search) tracked it down to
awswrangler.athena.read_sql_query
.How to Reproduce
Create an S3 path (
S3_PATH
) to store the following parquet file.Save a test parquet file to the created S3 path
Create a glue database
GLUE_DATABASE
and tableGLUE_TABLE
with ameasurements
column of typedouble
referring to theS3_PATH
.Create a Python3.8 lambda function with 1024mb memory and attach the
AWSSDKPandas-Python38
layer and insert the following athena query script.Memory Size: 1024 MB Max Memory Used: 433 MB Init Duration: 3375.03 ms
Memory Size: 1024 MB Max Memory Used: 518 MB
Memory Size: 1024 MB Max Memory Used: 605 MB
Memory Size: 1024 MB Max Memory Used: 701 MB
Memory Size: 1024 MB Max Memory Used: 786 MB
...
Expected behavior
We expected that the memory consumption of the process would not grow.
Your project
No response
Screenshots
No response
OS
AWS Lambda (x86_64 and arm64)
Python version
Tested using 3.8 and 3.9 (custom container)
AWS SDK for pandas version
2.16.1 and 3.0.0b2 tested
Additional context
No response
The text was updated successfully, but these errors were encountered: