Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Improved MutableStruct::push #1223

Merged
merged 3 commits into from
Aug 17, 2022
Merged

Improved MutableStruct::push #1223

merged 3 commits into from
Aug 17, 2022

Conversation

hohav
Copy link
Contributor

@hohav hohav commented Aug 15, 2022

Adds push_unchecked to MutableStructArray.

In my tests, this reduced total time to generate a representative StructArray by more than 1/3.

@codecov
Copy link

codecov bot commented Aug 15, 2022

Codecov Report

Merging #1223 (a54672b) into main (cc8f4f2) will increase coverage by 0.06%.
The diff coverage is 81.34%.

❗ Current head a54672b differs from pull request most recent head 095956a. Consider uploading reports for the commit 095956a to get more accurate results

@@            Coverage Diff             @@
##             main    #1223      +/-   ##
==========================================
+ Coverage   83.22%   83.29%   +0.06%     
==========================================
  Files         358      358              
  Lines       37281    37619     +338     
==========================================
+ Hits        31027    31334     +307     
- Misses       6254     6285      +31     
Impacted Files Coverage Δ
src/array/mod.rs 68.54% <ø> (ø)
src/bitmap/immutable.rs 86.41% <ø> (ø)
src/io/parquet/read/deserialize/binary/utils.rs 64.21% <0.00%> (-9.21%) ⬇️
src/io/parquet/read/mod.rs 100.00% <ø> (ø)
src/io/parquet/read/statistics/dictionary.rs 44.73% <ø> (ø)
src/io/parquet/write/nested/mod.rs 98.85% <ø> (-0.04%) ⬇️
src/io/parquet/read/indexes/primitive.rs 33.52% <28.57%> (-1.19%) ⬇️
src/io/parquet/read/indexes/mod.rs 80.92% <80.32%> (+3.50%) ⬆️
src/io/parquet/read/deserialize/binary/basic.rs 81.58% <85.55%> (+1.14%) ⬆️
src/io/parquet/read/row_group.rs 96.58% <91.48%> (-3.42%) ⬇️
... and 18 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@jorgecarleitao
Copy link
Owner

What do you think about simply removing the check in push, or rename push to try_push and make the assert an error?

I am asking because *_unchecked is usually used in Rust to denote an operation that requires unsafe (e.g. get_unchecked).

@hohav
Copy link
Contributor Author

hohav commented Aug 15, 2022

What do you think about simply removing the check in push, or rename push to try_push and make the assert an error?

I am asking because *_unchecked is usually used in Rust to denote an operation that requires unsafe (e.g. get_unchecked).

Removing the check altogether would be fine for my purposes, but OTOH it's still an important invariant. My inclination is to make it "opt out", i.e. have the most "obvious" option (push) do the check, with some other less-obvious function to allow skipping it. I don't have a better name to suggest than push_unchecked; do you know of another naming scheme for this situation?

Re: try_push, I still prefer not return an Err for the reasons we discussed in #1196. That seems independent of when/where to do the validation check, which is more of a performance question.

@jorgecarleitao
Copy link
Owner

You are right. What about push_validity, hinting that we are essentially pushing to the validity?

(note that the integration test is unrelated, it is something upstream)

@hohav
Copy link
Contributor Author

hohav commented Aug 16, 2022

Even knowing how MutableStructArray works, I wouldn't be able to guess at the difference between push and push_validity from the names. What about push_fast? Or maybe push_fast_n_furious? :)

On the other hand I've come around somewhat to removing the check altogether. The invariant will be checked when converting to a StructArray, so we'd still fail reasonably early.

Yet another option would be #[cfg(debug_assertions)], but I don't know if that approach is idiomatic. Anyway, I'm fine with whichever approach you prefer.

@jorgecarleitao
Copy link
Owner

exactly, that was the reason why I was thinking about keeping push as not performing the check.

Another option I can think about is:

fn push<F: Fn(&mut [Box<dyn Array>])>(&mut self, op: F) {
    op(&mut self.values);
    self.validity.push(true); // or something like this
    self.assert_lengths();
}

this allows the user to modify the values at will multiple times before the push

@hohav
Copy link
Contributor Author

hohav commented Aug 17, 2022

I like removing the check more now, especially since it allows pushing to all the children at once before calling push N times which might be convenient.

I removed the check altogether and rebased. Should be good to go now, I hope.

@jorgecarleitao jorgecarleitao merged commit f75ce42 into jorgecarleitao:main Aug 17, 2022
@jorgecarleitao jorgecarleitao changed the title Mutable struct push unchecked Improved MutableStruct::push Aug 17, 2022
@jorgecarleitao jorgecarleitao added the enhancement An improvement to an existing feature label Aug 17, 2022
@jorgecarleitao
Copy link
Owner

Sorry for the back and forth, I just think it is easier for others to keep the API aligned with Rust (specially functions named *_unchecked, that triggers unsafety feelings.

Thanks again for the PR and for the discussion 🙇

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement An improvement to an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants