Skip to content

Conversation

@guan404ming
Copy link
Member

@guan404ming guan404ming commented Apr 7, 2025

Related Issue and PR

closes: #33911

Why

This PR addresses the ongoing discussion regarding the introduction of a generic get_df function, which would abstract the retrieval of dataframes from various sources, supporting both pandas and polars. The goal is to provide a more user-friendly and consistent API.

As discussed, there are arguments against a direct implementation, primarily concerning:

  • Complexity and Maintenance: Concerns that a single implementation could become overly complex, especially considering the syntactic differences between pandas and polars querying APIs.
  • Potential Breaking Changes: Concerns about future breaking changes if existing methods were removed.
  • Current Pandas Issues: Existing issues with the pandas implementation should be addressed first.
  • Over-engineering: The feeling that this change is an over-engineering.

There is also support for a more abstract approach:

  • Abstraction and Simplification: A generic get_df could simplify user interaction and provide a consistent interface.
  • Gradual Transition: Deprecation warnings can be used to facilitate a smooth transition, allowing users time to adapt.

How

This PR implements get_df while maintaining a flexible approach to address the community's concerns.

  • get_df is introduced as a new, generic function that accepts a parameter specifying the desired dataframe type (pandas or polars).
  • This implementation introduces the _get_polars_df and _get_pandas_df methods that are used within the get_df method. This allows the new generic method to function, while allowing the option to expose or deprecate the underlying methods in the future.
  • Adding deprecation for original get_pandas_df
  • This approach allows for a gradual transition and addresses the concerns about immediate breaking changes.
  • The implementation is designed to be easily modified based on community feedback.

This PR aims to provide a flexible solution that can be adapted based on community consensus. The implementation can be refined or reverted as needed, ensuring that the changes align with the project's long-term goals.

Note

This PR is considered a starting point for discussion and may require further triage by the community. I plan to implement the corresponding changes within the Google provider after this PR is reviewed and a consensus is reached, as it is dependent on the outcome of this discussion. This separation allows for focused review and iteration on the core functionality before extending it to specific providers.

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@guan404ming guan404ming requested a review from eladkal as a code owner April 7, 2025 14:25
@guan404ming guan404ming marked this pull request as draft April 7, 2025 14:42
@guan404ming guan404ming changed the title feat: add get_df and get_polar_df feat: add get_df and get_polars_df Apr 7, 2025
@guan404ming guan404ming force-pushed the add-polar-df branch 2 times, most recently from 03d83ec to 8c11d33 Compare April 8, 2025 02:12
@guan404ming guan404ming marked this pull request as ready for review April 8, 2025 02:14
@eladkal eladkal requested a review from potiuk April 11, 2025 10:10
@guan404ming guan404ming force-pushed the add-polar-df branch 3 times, most recently from 188c1fd to 0604217 Compare April 11, 2025 15:51
@guan404ming guan404ming reopened this Apr 11, 2025
@guan404ming guan404ming force-pushed the add-polar-df branch 2 times, most recently from 9d6f704 to 2036fef Compare April 11, 2025 17:39
@guan404ming guan404ming requested a review from eladkal April 11, 2025 18:10
@guan404ming guan404ming requested a review from eladkal April 11, 2025 19:12
@guan404ming guan404ming force-pushed the add-polar-df branch 4 times, most recently from 8dd9cdb to 1fca7de Compare April 12, 2025 13:24
@guan404ming
Copy link
Member Author

@eladkal Hi, I have updated PR with your suggestion thanks for you reviewing~

@guan404ming guan404ming force-pushed the add-polar-df branch 2 times, most recently from a9d0b0f to 2649bb6 Compare April 14, 2025 18:27
Copy link
Contributor

@eladkal eladkal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add get_polars_df() method to BigQueryHook.

2 participants