Alowed reusing compression buffer #60

jorgecarleitao · 2021-10-13T21:24:52Z

This is a major refactor of the page API to make it easier to use and operate on. Main changes:

changed from streaming-iterator to fallible-streaming-iterator, so that we keep ownership of the Result, thereby making the API easier to use
added struct CompressedDataPage and struct DataPage to differentiate when a page is compressed
allowed parquet writers to re-use the compression buffer

codecov-commenter · 2021-10-13T21:33:48Z

Codecov Report

Merging #60 (75fd09f) into main (d73f166) will decrease coverage by 0.85%.
The diff coverage is 38.93%.

@@            Coverage Diff             @@
##             main      #60      +/-   ##
==========================================
- Coverage   67.38%   66.52%   -0.86%     
==========================================
  Files          64       65       +1     
  Lines        3523     3594      +71     
==========================================
+ Hits         2374     2391      +17     
- Misses       1149     1203      +54

Impacted Files	Coverage Δ
integration-tests/src/read/binary.rs	`100.00% <ø> (ø)`
integration-tests/src/read/primitive.rs	`98.00% <ø> (ø)`
src/error.rs	`21.42% <ø> (+2.67%)`	⬆️
src/lib.rs	`76.73% <ø> (ø)`
src/page/mod.rs	`13.79% <0.00%> (+1.44%)`	⬆️
src/page/page_dict/binary.rs	`0.00% <0.00%> (ø)`
src/page/page_dict/fixed_len_binary.rs	`0.00% <0.00%> (ø)`
src/write/column_chunk.rs	`0.00% <ø> (ø)`
src/write/compression.rs	`0.00% <0.00%> (ø)`
src/write/dyn_iter.rs	`0.00% <0.00%> (ø)`
... and 19 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d73f166...75fd09f. Read the comment docs.

jorgecarleitao · 2021-10-15T15:12:22Z

cc @ives9638 . I think I found the root cause: we were incorrectly swapping the buffers, causing re-allocations of the compression buffer. (related to jorgecarleitao/arrow2#529)

houqp · 2021-10-16T06:23:16Z

src/read/compression.rs

 let compressed_buffer = &compressed_page.buffer;
+
+ // prepare the compression buffer
+ buffer.clear();


Does this clear really help with avoiding the memcopy you mentioned in jorgecarleitao/arrow2#529 (comment)?

clear sets len to 0, it looks like when resize size is larger then current length (0), the value copy loop will always be executed? https://doc.rust-lang.org/src/alloc/vec/mod.rs.html#2236

What I am trying to avoid with .clear is the reserve piece (of the extend_with): if len != 0 and reserve reallocates, we need to reallocate all bytes to the new memory location.

extend_with's n: usize argument is calculated as new_len - len in resize, if len is 0, then n is always > 0 right?

Very well spotted: I think that you are right. 🙇

Let me try to re-phrase it: there are 3 main cases when we need a larger region:

let len = vec.len(); if len == 0 { if value == 0 { // case 1 // dealloc old // alloc_zeroed (the sys call) } else { // case 2 // alloc_uninitialized // set `[0, new_len[` to `value` } else { // case 3 // realloc // memcopy `[0, len[` // set `[len, new_len[` to `value` } }

I was hoping to hit case 1, but you are right that .clear followed by .resize does not do this (imo std here is suboptimal: they could specialize to hit case 1). What I was trying to hit here was this SpecFromElem::from_elem specialization.

It seems to me that the way to handle this is to use vec![0; decompressed_len] when the required length is larger.

Specifically, what do you think about using

// prepare the compression buffer if buffer.capacity() < compressed_page.uncompressed_size() { *buffer = vec![0; compressed_page.uncompressed_size()] } else { buffer.truncate(compressed_page.uncompressed_size()) };

ha, very cool, I didn't know 0 was handled as a special case in SpecFromElem.

imo std here is suboptimal: they could specialize to hit case 1

I was thinking about the same thing as well. The abstraction here is quite leaky and not adhering to rust's zero cost philosophy.

Specifically, what do you think about using

This is very close to what I have in mind. However, I think we can further improve the performance if we could do the followings:

Reallocate larger memory without initialization. Zeroing the memory (AllocInit::Zeroed) still has more overhead than AllocInit::Uninitialized. I believe the heap management, i.e. malloc, is all managed in user space. The only syscall we use should be brk or sbrk. So to guarantee zeroed memory, the memset or memcpy overhead is unavoidable. And even if we can let kernel do this for us, the kernel would still need to burn cpu instructions to clear out the memory. If we are always overwriting the full buffer in the subsequent decompress, I feel like requesting for zeroed memory from the allocator results in unnecessary overhead.

I think *buffer = vec![0; compressed_page.uncompressed_size()] is on the right path. What I am not 100% sure is whether it is always optimal. When there are large enough gaps between allocated memory buffers in the heap, the allocator could simply just extend the size of the current buffer instead of always allocating an entire new buffer. Combining this with with my previous point, I think this will still lead to unnecessary memcopy/memset on every resize due to the need of having to zero out the newly allocated memory buffer.

Here is what I think what we really want:

// prepare the compression buffer match buffer.capacity().cmp(compressed_page.uncompressed_size()) { Ord::Equal => _, // avoid the unnecessary truncate Ord::Greater => buffer.truncate(compressed_page.uncompressed_size()), Ord::Less => { // realloc and extend memory buffer: // If not enough space/gap after the current buffer, allocate an entire new memory buffer and free the current one. Don't zero out the newly allocated buffer nor the old one. // If enough free space after the current buffer, simply extend the buffer length in allocator's tracking metadata. Don't zero out the newly extended space. } }

For the less branch, I believe most allocator's grow method (https://doc.rust-lang.org/src/alloc/raw_vec.rs.html#507) should be smart enough to support it. We just need to find a way to avoid the zero initialization. I think Vec::reserve is probably closer to what we want here.

Ahh, I see. Yes, that would be ideal.

However, I think that that is not possible to do safely atm, unfortunately :( most decompressors' API are based on the trait std::io::Read, whose read_exact expects an initialized memory region.

Basically, we would write something like

Decoder::new(compressed_buffer); decompressed_buffer.reserve(capacity); unsafe {decompressed_buffer.set_len(capacity)}; // assumption: we not read from un-initialized codec.read_exact(decompressed_buffer.as_mut_ref())?; // assumption violated

The assumption is violated because (either)

read_exact may read from the ref it receives

if read_exact errors and we recover from it, we may end up reading an uninitialized memory region

This is mentioned in https://doc.rust-lang.org/std/io/trait.Read.html#tymethod.read and has received a number of rustsecs (e.g. here and here).

imo atm a new allocation (i.e. vec![0; ...]) is the best we can do under the safety constraints (but I would be very happy to be proven wrong ^_^)

I see what I am missing now, thanks a lot for pointing out that the read method doesn't guarantee no read from the output buf :( Coming from a heavy C background, I totally didn't expect this behavior. This is very unfortunate. They should have added a new read method to support write only use-case.

In this case agree with you we need to initialize the output buffer on every iteration. Although I think it might be better to use let the allocator decide whether an entire new allocation is needed or not. So I actually think your original code has it right: clear + resize.

I noticed there is a new api in the brew that we could leverage in the future to get rid of the initialization overhead: https://doc.rust-lang.org/nightly/std/io/trait.Read.html#method.initializer

Another possible route would be rolling our own read_exact implementation, i.e. use codec.bytes().take(new_len) to get a decompressed bytes iterator, then write to the uninitialized buffer manually. But I haven't looked into whether the rust compiler can optimize the iterator away into simple memcopy call.

jorgecarleitao added 2 commits October 13, 2021 05:30

Improved internals of dictionary pages.

38223fe

Added helper method.

d20091f

jorgecarleitao added the backwards-incompatible label Oct 13, 2021

jorgecarleitao changed the title ~~Reuse compress~~ Reuse compress buffer Oct 13, 2021

jorgecarleitao added 2 commits October 15, 2021 04:31

Improved decompress

47a7c2e

Fixed error in re-using buffers.

75fd09f

houqp reviewed Oct 16, 2021

View reviewed changes

jorgecarleitao merged commit 7fbf8e4 into main Oct 17, 2021

jorgecarleitao deleted the reuse_compress branch October 17, 2021 04:29

jorgecarleitao changed the title ~~Reuse compress buffer~~ Alowed reusing compression buffer Oct 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alowed reusing compression buffer #60

Alowed reusing compression buffer #60

jorgecarleitao commented Oct 13, 2021

codecov-commenter commented Oct 13, 2021 •

edited

Loading

jorgecarleitao commented Oct 15, 2021

houqp Oct 16, 2021

jorgecarleitao Oct 16, 2021

houqp Oct 16, 2021

jorgecarleitao Oct 16, 2021

jorgecarleitao Oct 16, 2021 •

edited

Loading

houqp Oct 16, 2021 •

edited

Loading

jorgecarleitao Oct 16, 2021

houqp Oct 16, 2021

houqp Oct 16, 2021

houqp Oct 16, 2021 •

edited

Loading

Alowed reusing compression buffer #60

Alowed reusing compression buffer #60

Conversation

jorgecarleitao commented Oct 13, 2021

codecov-commenter commented Oct 13, 2021 • edited Loading

Codecov Report

jorgecarleitao commented Oct 15, 2021

houqp Oct 16, 2021

Choose a reason for hiding this comment

jorgecarleitao Oct 16, 2021

Choose a reason for hiding this comment

houqp Oct 16, 2021

Choose a reason for hiding this comment

jorgecarleitao Oct 16, 2021

Choose a reason for hiding this comment

jorgecarleitao Oct 16, 2021 • edited Loading

Choose a reason for hiding this comment

houqp Oct 16, 2021 • edited Loading

Choose a reason for hiding this comment

jorgecarleitao Oct 16, 2021

Choose a reason for hiding this comment

houqp Oct 16, 2021

Choose a reason for hiding this comment

houqp Oct 16, 2021

Choose a reason for hiding this comment

houqp Oct 16, 2021 • edited Loading

Choose a reason for hiding this comment

codecov-commenter commented Oct 13, 2021 •

edited

Loading

jorgecarleitao Oct 16, 2021 •

edited

Loading

houqp Oct 16, 2021 •

edited

Loading

houqp Oct 16, 2021 •

edited

Loading