Enhancement: AWS Provider sql to s3 operator pd.read_sql kwargs #53399
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The AWS Provider Transfer operator, SqlToS3Operator allows for Pandas kwargs to be passed to the write function, but not to the get_df()/read_sql() function. This PR adds a new parameter to the operator called read_pd_kwargs that takes a dictionary and unpacks the dictionary in the get_df function. The get_df method in the DbApiHook allows for kwargs to be passed through and unpacks them in read_sql(). I enhanced the unit tests for csv/parquet with the read_pd_kwargs parameter and made sure it was unpacked successfully.
There is functionality in the operator to fix Numpy data types, which aren't necessary if the dtype_backend is set to 'pyarrow', so I created a conditional to only run the method if 'dtype_backend': 'pyarrow' is not found in the read_pd_kwargs. I added a unit test to ensure the _fix_dtype method is NOT called when the pyarrow backend kwarg is present.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in airflow-core/newsfragments.