Skip to content

PERF/CLN: let pyarrow concat chunks instead of doing it ourselves in __from_arrow__ #52928

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 26, 2023

Conversation

jorisvandenbossche
Copy link
Member

See #52070 (comment) for context. We are currently manually iterating through the chunks of the pyarrow array, converting each chunk to our masked extension array, and then concatenate those at the end. While we could also let pyarrow concatenate the chunks, and then do a single conversion from the concatenated pyarrow Array to our masked array. That is both more performant and simpler to code.

  • All code checks passed.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

@lukemanley lukemanley added Performance Memory or execution speed performance Arrow pyarrow functionality labels Apr 26, 2023
@lukemanley
Copy link
Member

LGTM pending green

Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@mroeschke mroeschke added this to the 2.1 milestone Apr 26, 2023
@mroeschke mroeschke merged commit e4097ca into pandas-dev:main Apr 26, 2023
@mroeschke
Copy link
Member

Awesome, thanks @jorisvandenbossche

@jorisvandenbossche jorisvandenbossche deleted the perf-from-arrow branch April 26, 2023 15:59
topper-123 pushed a commit to topper-123/pandas that referenced this pull request Apr 27, 2023
…__from_arrow__ (pandas-dev#52928)

* PERF: let pyarrow concat chunks instead of doing it ourselves in __from_arrow__

* workaround for empty chunked arrays for older pyarrow
Rylie-W pushed a commit to Rylie-W/pandas that referenced this pull request May 19, 2023
…__from_arrow__ (pandas-dev#52928)

* PERF: let pyarrow concat chunks instead of doing it ourselves in __from_arrow__

* workaround for empty chunked arrays for older pyarrow
Daquisu pushed a commit to Daquisu/pandas that referenced this pull request Jul 8, 2023
…__from_arrow__ (pandas-dev#52928)

* PERF: let pyarrow concat chunks instead of doing it ourselves in __from_arrow__

* workaround for empty chunked arrays for older pyarrow
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants