Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement to_pandas() #197

Merged
merged 3 commits into from
Feb 22, 2023
Merged

Implement to_pandas() #197

merged 3 commits into from
Feb 22, 2023

Conversation

simicd
Copy link
Contributor

@simicd simicd commented Feb 19, 2023

Which issue does this PR close?

Closes #139.

Rationale for this change

Convert datafusion dataframe directly to a pandas dataframe

What changes are included in this PR?

Implement to_pandas() method using pyarrow library

Are there any user-facing changes?

New to_pandas() method

src/dataframe.rs Outdated Show resolved Hide resolved
@andygrove
Copy link
Member

This is looking good so far. Thanks @simicd

@krzysztof-kwitt
Copy link

Can you also update the documentation?
https://github.com/apache/arrow-datafusion-python/blame/main/README.md#L73-L77

-# collect as list of pyarrow.RecordBatch
-results = df.collect()
-# get first batch
-batch = results[0]
-# convert to Pandas
-df = batch.to_pandas()
# collect as pandas
df = df.to_pandas()

and
https://github.com/apache/arrow-datafusion-python/blob/950a5789b612f97794ff7250310ae3289227590f/examples/sql-to-pandas.py#L36-L43

@simicd simicd marked this pull request as ready for review February 21, 2023 21:59
@simicd
Copy link
Contributor Author

simicd commented Feb 21, 2023

Thanks for already looking into the PR @andygrove, implemented your feedback and set the PR as ready for final review.
Also thanks for the additional hint @krzysztof-kwitt, the docs are now updated as well

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @simicd and @krzysztof-kwitt.

@andygrove andygrove merged commit 774ea70 into apache:main Feb 22, 2023
@krzysztof-kwitt
Copy link

krzysztof-kwitt commented Feb 22, 2023

I wonder if this method will still work for empty result - 0 rows/batches. @simicd What do you think?

andygrove added a commit that referenced this pull request Feb 22, 2023
* changelog (#188)

* Add Python wrapper for LogicalPlan::Sort (#196)

* Add Python wrapper for LogicalPlan::Aggregate (#195)

* Add Python wrapper for LogicalPlan::Limit (#193)

* Add Python wrapper for LogicalPlan::Filter (#192)

* Add Python wrapper for LogicalPlan::Filter

* clippy

* clippy

* Update src/expr/filter.rs

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>

---------

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>

* Add tests for recently added functionality (#199)

* Add experimental support for executing SQL with Polars and Pandas (#190)

* Run `maturin develop` instead of `cargo build` in verification script (#200)

* Implement `to_pandas()` (#197)

* Implement to_pandas()

* Update documentation

* Write unit test

* Add support for cudf as a physical execution engine (#205)

* Update README in preparation for 0.8 release (#206)

* Analyze table bindings (#204)

* method for getting the internal LogicalPlan instance

* Add explain plan method

* Add bindings for analyze table

* Add to_variant

* cargo fmt

* blake and flake formatting

* changelog (#209)

---------

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Co-authored-by: Dejan Simic <10134699+simicd@users.noreply.github.com>
Co-authored-by: Jeremy Dyer <jdye64@gmail.com>
@simicd
Copy link
Contributor Author

simicd commented Feb 25, 2023

@krzysztof-kwitt Good catch! Indeed, it fails with an error - I opened #234 to track the issue

@simicd simicd deleted the feature/to-pandas branch February 26, 2023 21:04
@andygrove andygrove added the enhancement New feature or request label Mar 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make it easier to create a Pandas dataframe from DataFusion query results
3 participants