Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CHORE] Remove user-facing arguments for casting to Ray's tensor type #2802

Merged
merged 5 commits into from
Sep 7, 2024

Conversation

jaychia
Copy link
Contributor

@jaychia jaychia commented Sep 6, 2024

Summary

Cleanup PR.

  1. Removes cast_tensors_to_ray_tensor_dtype as a user-facing argument in our export methods (e.g. to_arrow, to_pandas etc) -- this is really only intended to be used when a user is converting a Daft dataframe to a Ray dataset anyways and there isn't a need to expose this functionality to a user
  2. Instead, the logic for casting daft.DataType.tensor data to a Ray Data tensor type is done inside of the conversion code for Ray Data (_make_ray_block_from_micropartition). This lets us contain the ickiness of that code without having it touch all of our to_arrow logic
  3. Also removes _trim_pyarrow_large_arrays which was a legacy codepath that doesn't get hit anymore

@github-actions github-actions bot added the chore label Sep 6, 2024
Copy link

codspeed-hq bot commented Sep 6, 2024

CodSpeed Performance Report

Merging #2802 will degrade performances by 13.33%

Comparing jay/arrow-encode-decode (b2a1e6b) with main (e3fbf88)

Summary

⚡ 1 improvements
❌ 1 regressions
✅ 14 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main jay/arrow-encode-decode Change
test_count[1 Small File] 20.5 ms 23.6 ms -13.33%
test_show[100 Small Files] 298.8 ms 50.9 ms ×5.9

# type since it expects all tensor elements to have the same number of dimensions, which Daft does not enforce.
# TODO(Clark): Convert directly to Ray's variable-shaped tensor extension type when all tensor
# elements have the same number of dimensions, without going through pylist roundtrip.
return ArrowTensorArray.from_numpy(self.to_pylist())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I omitted this logic in this refactor because I have no idea what this is doing. Also there aren't any tests to help me understand so 🤷

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, added this back in to pass tests

@jaychia jaychia requested a review from kevinzwang September 6, 2024 20:50
Copy link

codecov bot commented Sep 6, 2024

Codecov Report

Attention: Patch coverage is 96.07843% with 2 lines in your changes missing coverage. Please review.

Project coverage is 63.11%. Comparing base (6fe408c) to head (b2a1e6b).
Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
daft/runners/ray_runner.py 91.30% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff            @@
##             main    #2802    +/-   ##
========================================
  Coverage   63.11%   63.11%            
========================================
  Files        1008     1007     -1     
  Lines      114269   114135   -134     
========================================
- Hits        72117    72038    -79     
+ Misses      42152    42097    -55     
Files with missing lines Coverage Δ
daft/dataframe/dataframe.py 86.05% <100.00%> (+0.04%) ⬆️
daft/datatype.py 91.10% <100.00%> (ø)
daft/runners/partitioning.py 81.33% <100.00%> (ø)
daft/series.py 89.50% <100.00%> (-0.03%) ⬇️
daft/table/micropartition.py 91.07% <100.00%> (ø)
daft/table/table.py 60.56% <100.00%> (+1.36%) ⬆️
src/daft-core/src/python/datatype.rs 81.29% <100.00%> (-0.62%) ⬇️
daft/runners/ray_runner.py 88.03% <91.30%> (+0.12%) ⬆️

... and 22 files with indirect coverage changes

@jaychia jaychia merged commit 3c2af5a into main Sep 7, 2024
38 of 39 checks passed
@jaychia jaychia deleted the jay/arrow-encode-decode branch September 7, 2024 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant