-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PERF] Use to_arrow_iter in to_arrow to avoid unnecessary array concats #2780
Conversation
CodSpeed Performance ReportMerging #2780 will not alter performanceComparing Summary
|
daft/dataframe/dataframe.py
Outdated
@@ -286,7 +286,9 @@ def iter_rows(self, results_buffer_size: Optional[int] = NUM_CPUS) -> Iterator[D | |||
yield row | |||
|
|||
@DataframePublicAPI | |||
def to_arrow_iter(self, results_buffer_size: Optional[int] = 1) -> Iterator["pyarrow.RecordBatch"]: | |||
def to_arrow_iter( | |||
self, results_buffer_size: Optional[int] = 1, cast_tensors_to_ray_tensor_dtype: bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we document these arguments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah good point... I kind of want to remove them at some point though they're not really useful and add a bunch of tech debt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually for this case I'll make them "private" then by adding an underscore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Removed in #2802 )
3c1eade
to
3b72732
Compare
Fixes
to_arrow()
to use Table.from_batches for performanceDriveby: fix args
results_buffer_size
documentation by using a num_cpus literal