-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Reducing memory footprint for synchronous S3KeySensor
#55070
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc: @dstandish |
providers/amazon/src/airflow/providers/amazon/aws/sensors/s3.py
Outdated
Show resolved
Hide resolved
|
@eladkal, since we're changing the return type of |
|
mentioned it on slack but just adding here should check with @eladkal re backcompat issues. changing but thinking about that..... the problem is it's somewhat ambiguous what the behavior should be for best performance, you would want check_fn to return on first "pass" but, the behavior that would be most similar to current, would be to return all the files for which check_fn evaluates to true |
|
cc: @eladkal |
|
Lets consult first with @o-nikolas and @vincbeck. |
|
To keep it backward compatible, could we make |
|
I think that's feasible - how would we determine whether to return a |
|
In the current implementation we would always return an iterator but having |
Making the type annotation change now. Both cases are handled in |
vincbeck
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually sorry but I realized I provided wrong instructions. This does not solve back compat issues ... What we should do instead is deprecate get_file_metadata and create a new one using Iterator and use this one in the sensor. That way we do not need to create a major release because of this.
|
Got it - so the plan would be to:
What is the best way to deprecate a method and create one with the same name? Or should I create a new method with a new name, and use this one in the Sensor? |
Correct
To deprecate a method, please emit deprecation warning at the beginning of the method. Example below:
Yes, you should create a new one with a new name |
|
@vincbeck, I've updated the PR accordingly! |
vincbeck
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Thanks for the quick turn around!
|
@dstandish, can you take a look at the updated PR? |
|
@eladkal, can you take a look at this for me? |
providers/amazon/src/airflow/providers/amazon/aws/sensors/s3.py
Outdated
Show resolved
Hide resolved
providers/amazon/src/airflow/providers/amazon/aws/sensors/s3.py
Outdated
Show resolved
Hide resolved
|
@ashb should be all set here! |
This PR aims to reduce the memory footprint for the S3KeySensor. This was done by altering the get_file_metadata method in the S3Hook to yield records in the paginated response, rather than loading them into a single list. The return type for the get_file_metadata method is not an Iterator. An assertion was added to validate this, and all appropriate tests were updated
This PR aims to reduce the memory footprint for the S3KeySensor. This was done by altering the get_file_metadata method in the S3Hook to yield records in the paginated response, rather than loading them into a single list. The return type for the get_file_metadata method is not an Iterator. An assertion was added to validate this, and all appropriate tests were updated
This PR aims to reduce the memory footprint for the S3KeySensor. This was done by altering the get_file_metadata method in the S3Hook to yield records in the paginated response, rather than loading them into a single list. The return type for the get_file_metadata method is not an Iterator. An assertion was added to validate this, and all appropriate tests were updated
This PR aims to reduce the memory footprint for the S3KeySensor. This was done by altering the get_file_metadata method in the S3Hook to yield records in the paginated response, rather than loading them into a single list. The return type for the get_file_metadata method is not an Iterator. An assertion was added to validate this, and all appropriate tests were updated
This PR aims to reduce the memory footprint for the S3KeySensor. This was done by altering the get_file_metadata method in the S3Hook to yield records in the paginated response, rather than loading them into a single list. The return type for the get_file_metadata method is not an Iterator. An assertion was added to validate this, and all appropriate tests were updated
This PR aims to reduce the memory footprint for the S3KeySensor. This was done by altering the get_file_metadata method in the S3Hook to yield records in the paginated response, rather than loading them into a single list. The return type for the get_file_metadata method is not an Iterator. An assertion was added to validate this, and all appropriate tests were updated
This PR aims to reduce the memory footprint for the S3KeySensor. This was done by altering the get_file_metadata method in the S3Hook to yield records in the paginated response, rather than loading them into a single list. The return type for the get_file_metadata method is not an Iterator. An assertion was added to validate this, and all appropriate tests were updated
This PR aims to reduce the memory footprint for the S3KeySensor. This was done by altering the get_file_metadata method in the S3Hook to yield records in the paginated response, rather than loading them into a single list. The return type for the get_file_metadata method is not an Iterator. An assertion was added to validate this, and all appropriate tests were updated
This PR aims to reduce the memory footprint for the S3KeySensor. This was done by altering the get_file_metadata method in the S3Hook to yield records in the paginated response, rather than loading them into a single list. The return type for the get_file_metadata method is not an Iterator. An assertion was added to validate this, and all appropriate tests were updated
This PR aims to reduce the memory footprint for the S3KeySensor. This was done by altering the get_file_metadata method in the S3Hook to yield records in the paginated response, rather than loading them into a single list. The return type for the get_file_metadata method is not an Iterator. An assertion was added to validate this, and all appropriate tests were updated
This PR aims to reduce the memory footprint for the S3KeySensor. This was done by altering the get_file_metadata method in the S3Hook to yield records in the paginated response, rather than loading them into a single list. The return type for the get_file_metadata method is not an Iterator. An assertion was added to validate this, and all appropriate tests were updated
This PR aims to reduce the memory footprint for the S3KeySensor. This was done by altering the get_file_metadata method in the S3Hook to yield records in the paginated response, rather than loading them into a single list. The return type for the get_file_metadata method is not an Iterator. An assertion was added to validate this, and all appropriate tests were updated
This PR aims to reduce the memory footprint for the S3KeySensor. This was done by altering the get_file_metadata method in the S3Hook to yield records in the paginated response, rather than loading them into a single list. The return type for the get_file_metadata method is not an Iterator. An assertion was added to validate this, and all appropriate tests were updated
This PR aims to reduce the memory footprint for the S3KeySensor. This was done by altering the get_file_metadata method in the S3Hook to yield records in the paginated response, rather than loading them into a single list. The return type for the get_file_metadata method is not an Iterator. An assertion was added to validate this, and all appropriate tests were updated
This PR aims to reduce the memory footprint for the S3KeySensor. This was done by altering the get_file_metadata method in the S3Hook to yield records in the paginated response, rather than loading them into a single list. The return type for the get_file_metadata method is not an Iterator. An assertion was added to validate this, and all appropriate tests were updated
This PR aims to reduce the memory footprint for the S3KeySensor. This was done by altering the get_file_metadata method in the S3Hook to yield records in the paginated response, rather than loading them into a single list. The return type for the get_file_metadata method is not an Iterator. An assertion was added to validate this, and all appropriate tests were updated
This PR aims to reduce the memory footprint for the S3KeySensor. This was done by altering the get_file_metadata method in the S3Hook to yield records in the paginated response, rather than loading them into a single list. The return type for the get_file_metadata method is not an Iterator. An assertion was added to validate this, and all appropriate tests were updated
This PR aims to reduce the memory footprint for the
S3KeySensor. This was done by altering theget_file_metadatamethod in the S3Hook toyieldrecords in the paginated response, rather than loading them into a single list. The return type for theget_file_metadatamethod is not anIterator. An assertion was added to validate this, and all appropriate tests were updated.closes: #55039