Skip to content

Conversation

@simicd
Copy link
Contributor

@simicd simicd commented Feb 19, 2023

Which issue does this PR close?

Closes #139.

Rationale for this change

Convert datafusion dataframe directly to a pandas dataframe

What changes are included in this PR?

Implement to_pandas() method using pyarrow library

Are there any user-facing changes?

New to_pandas() method

@andygrove
Copy link
Member

This is looking good so far. Thanks @simicd

@krzysztof-kwitt
Copy link

Can you also update the documentation?
https://github.com/apache/arrow-datafusion-python/blame/main/README.md#L73-L77

-# collect as list of pyarrow.RecordBatch
-results = df.collect()
-# get first batch
-batch = results[0]
-# convert to Pandas
-df = batch.to_pandas()
# collect as pandas
df = df.to_pandas()

and
https://github.com/apache/arrow-datafusion-python/blob/950a5789b612f97794ff7250310ae3289227590f/examples/sql-to-pandas.py#L36-L43

@simicd simicd marked this pull request as ready for review February 21, 2023 21:59
@simicd
Copy link
Contributor Author

simicd commented Feb 21, 2023

Thanks for already looking into the PR @andygrove, implemented your feedback and set the PR as ready for final review.
Also thanks for the additional hint @krzysztof-kwitt, the docs are now updated as well

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @simicd and @krzysztof-kwitt.

@andygrove andygrove merged commit 774ea70 into apache:main Feb 22, 2023
@krzysztof-kwitt
Copy link

krzysztof-kwitt commented Feb 22, 2023

I wonder if this method will still work for empty result - 0 rows/batches. @simicd What do you think?

andygrove added a commit that referenced this pull request Feb 22, 2023
* changelog (#188)

* Add Python wrapper for LogicalPlan::Sort (#196)

* Add Python wrapper for LogicalPlan::Aggregate (#195)

* Add Python wrapper for LogicalPlan::Limit (#193)

* Add Python wrapper for LogicalPlan::Filter (#192)

* Add Python wrapper for LogicalPlan::Filter

* clippy

* clippy

* Update src/expr/filter.rs

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>

---------

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>

* Add tests for recently added functionality (#199)

* Add experimental support for executing SQL with Polars and Pandas (#190)

* Run `maturin develop` instead of `cargo build` in verification script (#200)

* Implement `to_pandas()` (#197)

* Implement to_pandas()

* Update documentation

* Write unit test

* Add support for cudf as a physical execution engine (#205)

* Update README in preparation for 0.8 release (#206)

* Analyze table bindings (#204)

* method for getting the internal LogicalPlan instance

* Add explain plan method

* Add bindings for analyze table

* Add to_variant

* cargo fmt

* blake and flake formatting

* changelog (#209)

---------

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Co-authored-by: Dejan Simic <10134699+simicd@users.noreply.github.com>
Co-authored-by: Jeremy Dyer <jdye64@gmail.com>
@simicd
Copy link
Contributor Author

simicd commented Feb 25, 2023

@krzysztof-kwitt Good catch! Indeed, it fails with an error - I opened #234 to track the issue

@simicd simicd deleted the feature/to-pandas branch February 26, 2023 21:04
@andygrove andygrove added the enhancement New feature or request label Mar 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make it easier to create a Pandas dataframe from DataFusion query results

3 participants