-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CHORE] Remove user-facing arguments for casting to Ray's tensor type #2802
Conversation
CodSpeed Performance ReportMerging #2802 will degrade performances by 13.33%Comparing Summary
Benchmarks breakdown
|
# type since it expects all tensor elements to have the same number of dimensions, which Daft does not enforce. | ||
# TODO(Clark): Convert directly to Ray's variable-shaped tensor extension type when all tensor | ||
# elements have the same number of dimensions, without going through pylist roundtrip. | ||
return ArrowTensorArray.from_numpy(self.to_pylist()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I omitted this logic in this refactor because I have no idea what this is doing. Also there aren't any tests to help me understand so 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, added this back in to pass tests
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2802 +/- ##
========================================
Coverage 63.11% 63.11%
========================================
Files 1008 1007 -1
Lines 114269 114135 -134
========================================
- Hits 72117 72038 -79
+ Misses 42152 42097 -55
|
Summary
Cleanup PR.
cast_tensors_to_ray_tensor_dtype
as a user-facing argument in our export methods (e.g.to_arrow
,to_pandas
etc) -- this is really only intended to be used when a user is converting a Daft dataframe to a Ray dataset anyways and there isn't a need to expose this functionality to a userdaft.DataType.tensor
data to a Ray Data tensor type is done inside of the conversion code for Ray Data (_make_ray_block_from_micropartition
). This lets us contain the ickiness of that code without having it touch all of ourto_arrow
logic_trim_pyarrow_large_arrays
which was a legacy codepath that doesn't get hit anymore