Generify ColumnReaderImpl and RecordReader (#1040) #1041

tustvold · 2021-12-13T12:05:01Z

This is highly experimental, I want to get further fleshing out #171 and #1037 before settling on this. In particular I want to get some numbers about performance. However, I wanted to give some visibility into what I'm doing

~~Builds on top of #1021~~

This introduces some limited generics into RecordReader and ColumnReaderImpl to allow for optimisations such as #1054 and #1082. Having implemented initial cuts of these, I am happy that this interface is sufficiently flexible for implementing various arrow-related optimisations.

Which issue does this PR close?

Closes #1040.

Rationale for this change

See ticket

What changes are included in this PR?

See ticket

Are there any user-facing changes?

No 😁

parquet/src/arrow/record_reader.rs

codecov-commenter · 2021-12-13T12:16:10Z

Codecov Report

Merging #1041 (28228b2) into master (07660c6) will increase coverage by 0.01%.
The diff coverage is 81.32%.

@@            Coverage Diff             @@
##           master    #1041      +/-   ##
==========================================
+ Coverage   82.30%   82.31%   +0.01%     
==========================================
  Files         168      172       +4     
  Lines       49026    50082    +1056     
==========================================
+ Hits        40351    41227     +876     
- Misses       8675     8855     +180

Impacted Files	Coverage Δ
parquet/src/arrow/array_reader.rs	`76.72% <ø> (-0.03%)`	⬇️
parquet/src/column/reader.rs	`69.88% <75.94%> (-2.45%)`	⬇️
parquet/src/column/reader/decoder.rs	`76.27% <76.27%> (ø)`
parquet/src/arrow/record_reader/buffer.rs	`85.10% <85.10%> (ø)`
parquet/src/arrow/record_reader.rs	`94.00% <87.17%> (+1.23%)`	⬆️
...rquet/src/arrow/record_reader/definition_levels.rs	`90.32% <90.32%> (ø)`
parquet/src/util/memory.rs	`91.12% <100.00%> (+0.08%)`	⬆️
arrow/src/datatypes/native.rs	`66.66% <0.00%> (-6.25%)`	⬇️
arrow/src/compute/kernels/comparison.rs	`89.75% <0.00%> (-3.48%)`	⬇️
arrow/src/csv/reader.rs	`88.10% <0.00%> (-2.48%)`	⬇️
... and 37 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 07660c6...28228b2. Read the comment docs.

tustvold · 2021-12-14T17:34:12Z

Running benchmarks on my local machine I get somewhat erratic results, from which I conclude this has no major impact on performance

arrow_array_reader/read Int32Array, plain encoded, mandatory, no NULLs - old                                                                             
                        time:   [3.7939 us 3.8031 us 3.8114 us]
                        change: [-3.6579% -3.4154% -3.1951%] (p = 0.00 < 0.05)
                        Performance has improved.
arrow_array_reader/read Int32Array, plain encoded, mandatory, no NULLs - new                                                                             
                        time:   [2.3030 us 2.3048 us 2.3073 us]
                        change: [+2.5908% +2.7441% +2.9142%] (p = 0.00 < 0.05)
                        Performance has regressed.
arrow_array_reader/read Int32Array, plain encoded, optional, no NULLs - old                                                                            
                        time:   [59.193 us 59.275 us 59.363 us]
                        change: [-4.2623% -4.1285% -4.0009%] (p = 0.00 < 0.05)
                        Performance has improved.
arrow_array_reader/read Int32Array, plain encoded, optional, no NULLs - new                                                                             
                        time:   [23.209 us 23.221 us 23.236 us]
                        change: [+32.531% +32.663% +32.835%] (p = 0.00 < 0.05)
                        Performance has regressed.
arrow_array_reader/read Int32Array, plain encoded, optional, half NULLs - old                                                                            
                        time:   [142.37 us 142.41 us 142.44 us]
                        change: [+5.5942% +6.6789% +7.7376%] (p = 0.00 < 0.05)
                        Performance has regressed.
arrow_array_reader/read Int32Array, plain encoded, optional, half NULLs - new                                                                            
                        time:   [139.07 us 139.89 us 140.59 us]
                        change: [+0.4422% +0.9960% +1.6028%] (p = 0.00 < 0.05)
                        Change within noise threshold.
arrow_array_reader/read Int32Array, dictionary encoded, mandatory, no NULLs - old                                                                             
                        time:   [21.919 us 21.923 us 21.927 us]
                        change: [+1.3392% +1.7681% +2.0113%] (p = 0.00 < 0.05)
                        Performance has regressed.
arrow_array_reader/read Int32Array, dictionary encoded, mandatory, no NULLs - new                                                                            
                        time:   [99.347 us 101.00 us 102.37 us]
                        change: [+5.5715% +6.7636% +8.2107%] (p = 0.00 < 0.05)
                        Performance has regressed.
arrow_array_reader/read Int32Array, dictionary encoded, optional, no NULLs - old                                                                            
                        time:   [75.648 us 75.663 us 75.681 us]
                        change: [-1.5816% -1.5384% -1.4963%] (p = 0.00 < 0.05)
                        Performance has improved.
arrow_array_reader/read Int32Array, dictionary encoded, optional, no NULLs - new                                                                            
                        time:   [112.52 us 113.33 us 114.36 us]
                        change: [+5.2751% +7.2166% +9.0108%] (p = 0.00 < 0.05)
                        Performance has regressed.
arrow_array_reader/read Int32Array, dictionary encoded, optional, half NULLs - old                                                                            
                        time:   [144.77 us 144.80 us 144.83 us]
                        change: [-11.013% -10.318% -9.6258%] (p = 0.00 < 0.05)
                        Performance has improved.
arrow_array_reader/read Int32Array, dictionary encoded, optional, half NULLs - new                                                                            
                        time:   [191.06 us 191.12 us 191.18 us]
                        change: [+3.4773% +3.5370% +3.5957%] (p = 0.00 < 0.05)
                        Performance has regressed.
arrow_array_reader/read StringArray, plain encoded, mandatory, no NULLs - old                                                                            
                        time:   [800.06 us 800.19 us 800.32 us]
                        change: [-1.6826% -1.6388% -1.5967%] (p = 0.00 < 0.05)
                        Performance has improved.
arrow_array_reader/read StringArray, plain encoded, mandatory, no NULLs - new                                                                            
                        time:   [124.84 us 124.86 us 124.88 us]
                        change: [+4.1077% +4.1575% +4.2088%] (p = 0.00 < 0.05)
                        Performance has regressed.
arrow_array_reader/read StringArray, plain encoded, optional, no NULLs - old                                                                            
                        time:   [846.35 us 846.59 us 846.87 us]
                        change: [+0.8637% +0.9228% +0.9834%] (p = 0.00 < 0.05)
                        Change within noise threshold.
arrow_array_reader/read StringArray, plain encoded, optional, no NULLs - new                                                                            
                        time:   [143.25 us 143.30 us 143.35 us]
                        change: [+2.6977% +2.7794% +2.8847%] (p = 0.00 < 0.05)
                        Performance has regressed.
arrow_array_reader/read StringArray, plain encoded, optional, half NULLs - old                                                                            
                        time:   [773.74 us 776.61 us 779.87 us]
                        change: [+3.2218% +3.4681% +3.7063%] (p = 0.00 < 0.05)
                        Performance has regressed.
arrow_array_reader/read StringArray, plain encoded, optional, half NULLs - new                                                                            
                        time:   [264.22 us 264.80 us 265.57 us]
                        change: [-1.3401% -1.1712% -0.9903%] (p = 0.00 < 0.05)
                        Change within noise threshold.
arrow_array_reader/read StringArray, dictionary encoded, mandatory, no NULLs - old                                                                            
                        time:   [726.17 us 726.74 us 727.44 us]
                        change: [+1.2812% +1.3725% +1.4618%] (p = 0.00 < 0.05)
                        Performance has regressed.
arrow_array_reader/read StringArray, dictionary encoded, mandatory, no NULLs - new                                                                            
                        time:   [116.83 us 116.91 us 116.99 us]
                        change: [-3.2217% -3.0893% -2.9282%] (p = 0.00 < 0.05)
                        Performance has improved.
arrow_array_reader/read StringArray, dictionary encoded, optional, no NULLs - old                                                                            
                        time:   [802.16 us 803.89 us 805.57 us]
                        change: [-0.4055% -0.2549% -0.1073%] (p = 0.00 < 0.05)
                        Change within noise threshold.
arrow_array_reader/read StringArray, dictionary encoded, optional, no NULLs - new                                                                            
                        time:   [134.39 us 134.43 us 134.48 us]
                        change: [+0.0304% +0.2086% +0.3678%] (p = 0.02 < 0.05)
                        Change within noise threshold.
arrow_array_reader/read StringArray, dictionary encoded, optional, half NULLs - old                                                                            
                        time:   [742.00 us 742.57 us 743.00 us]
                        change: [+3.4464% +3.6453% +3.8440%] (p = 0.00 < 0.05)
                        Performance has regressed.
arrow_array_reader/read StringArray, dictionary encoded, optional, half NULLs - new                                                                            
                        time:   [236.67 us 237.14 us 238.07 us]
                        change: [+1.7094% +1.9629% +2.5264%] (p = 0.00 < 0.05)
                        Performance has regressed.

What is strange to me is that this seems to have a consistent ~5% impact on the "new" ArrowArrayReader despite this change touching none of the code used by it. I suspect we're in the weeds of the wims of LLVM, which I'm not really sure it makes sense to optimise for at this stage - there's a lot of lower hanging fruit. It's also worth noting that ArrowArrayReader is not used for anything bar strings at this stage, and I intend to introduce an optimised StringArrayReader that should be significantly faster.

My takeaway - no major cause for concern at this stage

parquet/src/column/reader/decoder.rs

yordan-pavlov · 2021-12-23T23:49:59Z

parquet/src/column/reader/decoder.rs

+            .current_encoding
+            .expect("current_encoding should be set");
+
+        let current_decoder = self


why not set a current_decoder field in the set_data method (where the decoder has to be selected anyway to call set_data on it), so that it doesn't have to be looked up on every call of read here? It should perform better (no lookup) and simplify this read method as well.

I didn't write this logic, just moved it, but my guess is this is a way to placate the borrow checker. Decoder::get requires a mutable reference, and we wish for decoders, in particular the dictionary decoder, to be usable across multiple set_data calls.

In order to have a current_decoder construct we would either need to perform a convoluted move dance moving data in and out of the decoder map, or use Rc<RefCell>. This is simpler, if possibly a little less performant. FWIW I'd wager that the overheads of a hashmap keyed on a low cardinality enumeration are pretty low.

yordan-pavlov · 2021-12-23T23:58:18Z

parquet/src/column/reader/decoder.rs

+
+/// An implementation of [`ColumnLevelDecoder`] for `[i16]`
+pub struct ColumnLevelDecoderImpl {
+    inner: LevelDecoderInner,


I wonder if the inner level decoder can be a generic parameter instead - wouldn't that remove the need to match &mut self.inner in the read method?

This would require introducing some type representation of the encoding type. This would be a fair bit of additional code/complexity that I don't think would not lead to a meaningful performance uplift. Assuming ColumnLevelDecoderImpl::read is called with a reasonable batch size of ~1024, the overheads of a jump table are likely to be irrelevant.

parquet/src/arrow/record_reader/definition_levels.rs

yordan-pavlov · 2021-12-26T21:07:16Z

parquet/src/arrow/record_reader/definition_levels.rs

+    ) -> impl Iterator<Item = usize> + '_ {
+        let max_def_level = self.max_level;
+        let slice = self.buffer.as_slice();
+        range.rev().filter(move |x| slice[*x] == max_def_level)


it might be more efficient to calculate a boolean array for the null bitmap using arrow::compute::eq_scalar as used in ArrowArrayReader here https://github.com/apache/arrow-rs/blob/master/parquet/src/arrow/arrow_array_reader.rs#L570 , because it can use SIMD (if enabled)

Currently BooleanBufferBuilder doesn't have a story for appending other BooleanBuffers - #1039 adds this but I'd rather not make this PR depend on it.

Additionally the cost of the memory allocation and copy may outweigh the gains from SIMD.

Given this I'm going to leave this as is, especially as #1054 will remove this code from the decode path for files without nested nullability.

yordan-pavlov · 2021-12-26T21:12:00Z

parquet/src/arrow/record_reader/buffer.rs

+    ) {
+        let slice = self.as_slice_mut();
+
+        for (value_pos, level_pos) in range.rev().zip(rev_position_iter) {


it might be more efficient to insert null values using arrow::compute::SlicesIterator as used in ArrowArrayReader here https://github.com/apache/arrow-rs/blob/master/parquet/src/arrow/arrow_array_reader.rs#L606 , since it works with sequences rather than single values

This is a cool suggestion, I was not aware of this component. Unfortunately it does not appear to support reverse iteration, which is required here, so I will leave this as a potential future optimization.

tustvold · 2022-01-01T19:09:22Z

parquet/src/arrow/array_reader.rs

@@ -200,7 +200,6 @@ pub struct PrimitiveArrayReader<T: DataType> {
    rep_levels_buffer: Option<Buffer>,
    column_desc: ColumnDescPtr,
    record_reader: RecordReader<T>,
-    _type_marker: PhantomData<T>,


This seemed to be an orphan so I just removed it

tustvold · 2022-01-01T19:56:20Z

parquet/src/column/reader.rs

-    }
-
-    #[inline]
-    fn configure_dictionary(&mut self, page: Page) -> Result<bool> {


This logic is moved into ColumnValueDecoder

tustvold · 2022-01-01T19:56:34Z

parquet/src/column/reader.rs

@@ -392,38 +419,6 @@ impl<T: DataType> ColumnReaderImpl<T> {
        Ok(true)
    }

-    /// Resolves and updates encoding and set decoder for the current page


This logic is also moved into ColumnValueDecoder

tustvold · 2022-01-01T20:01:41Z

I've renamed a number of the methods and traits based on the great feedback, and also added a load of doc comments. In particular I took inspiration from std::Vec, in particular Vec::spare_capacity_mut and Vec::set_len which is effectively an unsafe version of what is going on here.

I'm happy that this interface is sufficiently flexible for the optimisations I have in mind, many of which I've already got draft PR with initial cuts of, and so I'm marking this ready for review.

I am aware this is a relatively complex change, to an already complex part of the codebase so if anything isn't clear please let me know.

Edit: I have tested this code change with #1053 and the tests are green (with ArrowArrayReader replaced with ComplexObjectArrayReader to workaround #1111)

alamb

Thank you for the comments @tustvold

I went through this code pretty carefully -- and other than the places I noted it looks like a really nice job to me. I think the additional testing such as #1110 gives me extra confidence that this is working as designed

To other reviewers, I would summarize this change as "pulls out common and redundant logic from some of the RecordReader impls into a set of common structures and traits.

parquet/src/arrow/record_reader.rs

alamb · 2022-01-04T20:38:31Z

parquet/src/arrow/record_reader/buffer.rs

+        self.buffer.resize(num_bytes, 0);
+        self.len -= len;
+
+        std::mem::replace(&mut self.buffer, remaining).into()


TIL: std::mem::replace

alamb · 2022-01-04T20:40:36Z

parquet/src/arrow/record_reader/buffer.rs

+    ///
+    /// # Panics
+    ///
+    /// Implementations must panic if `len` is beyond the initialized length


I don't understand the must panic bit here -- how would implementations know what the initialized length (data written to the location returned by space_capacity_mut) is? Or is this referring to the capacity ?

I was trying to distinguish this from Vec::set_len which is unsafe because it doesn't know how much is initialized. In the case of RecordBuffer the entire capacity is initialized, just possibly not set to anything useful. The result may not be desirable, but isn't UB and therefore unsafe

alamb · 2022-01-04T20:43:25Z

parquet/src/arrow/record_reader/buffer.rs

+        self.buffer
+            .resize((self.len + batch_size) * std::mem::size_of::<T>(), 0);


Is it ok to initialize everything to 0? I am wondering if 0 isn't a valid representation for some type T? Perhaps this should be T::default() instead?

Sadly this is not possible with MutableBuffer the second parameter is u8. IMO MutableBuffer is pretty unfortunate and should really be typed based on what it contains, but changing this would be a major breaking change to a lot of arrow...

I believe it is called arrow2

Indeed arrow2 could definitely serve as inspiration for such a change. I have some ideas on how to make such a change without major churn, but nothing fully formed just yet 😁

arrow2 no longer uses MutableBuffer<T: NativeType>: it recently migrated to std::Vec<T: NativeType>, for ease of use (and maintain).

it recently migrated to std::Vec<T: NativeType>

Is there some way to force Vec to use stricter alignment than needed by T? i.e. for SIMD stuffs?

you mean e.g. use 128 bytes instead of the minimum layout required by T? I do not think it is possible on the stable channel, no.

parquet/src/arrow/record_reader/buffer.rs

alamb · 2022-01-04T20:48:09Z

parquet/src/arrow/record_reader/buffer.rs

+/// A [`BufferQueue`] capable of storing column values
+pub trait ValuesBuffer: BufferQueue {
+    /// Iterate through the indexes in `range` in reverse order, moving the value at each
+    /// index to the next index returned by `rev_valid_position_iter`


the code also seems to assume that rev_valid_position_iter is sorted

alamb · 2022-01-04T20:59:31Z

parquet/src/column/reader.rs

    num_decoded_values: u32,

-    // Cache of decoders for existing encodings
-    decoders: HashMap<Encoding, Box<dyn Decoder<T>>>,


For anyone else following along, the cache is moved into ColumnValueDecoderImpl below

alamb · 2022-01-04T21:09:19Z

parquet/src/column/reader/decoder.rs

+use crate::util::bit_util::BitReader;
+
+/// A slice of levels buffer data that is written to by a [`ColumnLevelDecoder`]
+pub trait LevelsBufferSlice {


I think I missed it somewhere along the line -- what is the point of Generisizing (sp?) levels, rather than just using [i16]? Can definition or repetition levels ever be something other than i16?

Yes - #1054

Ah -- got it

alamb · 2022-01-04T21:16:02Z

parquet/src/column/reader.rs

+    num_buffered_values: u32,
+    encoding: Encoding,
+    buf: ByteBufferPtr,
+) -> Result<ByteBufferPtr> {


Is there a reason to replicate the logic in LevelDecoder::v1(enc, max_level); here ? Could that level decoder simply be reused? Especially since it already has tests, etc

The short answer is because I found the interface for LevelDecoder incredibly confusing, and this isn't actually interested in the decoder, just working out how many bytes of level data there are...

I can change if you feel strongly

No, I was just curios

alamb · 2022-01-04T21:22:43Z

cc @nevi-me @sunchao and @jorgecarleitao

Please let us know if anyone else is interested in reviewing this PR. If not I'll plan to merge it in soon

jorgecarleitao · 2022-01-04T22:47:31Z

parquet/src/arrow/record_reader/buffer.rs

+
+    #[inline]
+    pub fn as_slice(&self) -> &[T] {
+        let (prefix, buf, suffix) = unsafe { self.buffer.as_slice().align_to::<T>() };


Thanks @alamb for the ping. I haven't look into this PR semantics in detail because I am not familiar with this code base.

I think that this line is sound iff T: plain old data (in the sense that they fulfill the invariants of Pod).

However, bool, which is not Pod, implements ParquetValueType, and we pass T: DataType::T to TypedBuffer here.

Note that like bool, Int96 contains Option<[u32; 3]> which is also not plain old data, and also implements ParquetValueType.

Maybe restrict T to TypedBuffer<T: PrimitiveType> or something, so that we do not allow non-plain old data types to be passed here?

Yeah the typing here is a bit unfortunate, there is a cludge in PrimitiveArrayReader to handle bools, and prevent Int96 but I'm not going to argue it isn't pretty gross 😅

It's no worse than before, but it certainly isn't ideal... I'll have a think about how to improve this without breaking the APIs 🤔

Maybe we could at least document it (or mark it as unsafe to force the callsites to acknowledge they aren't using bool)?

Going to mark this as a draft whilst I fix #1132 which should in turn fix this

#1155 contains the fix

…eader

tustvold · 2022-01-11T14:56:49Z

Unfortunately the code I added in #1155 didn't quite carry across as I had hoped for, as parquet doesn't have an Int16Type but definition levels and repetition levels are parsed as i16. This required some more finagling, but the general concept of restricting the valid types remains unchanged

tustvold · 2022-01-11T15:00:01Z

parquet/src/data_type.rs

@@ -1033,21 +1032,6 @@ pub(crate) mod private {
            self
        }
    }
-
-    /// A marker trait for [`DataType`] with a [scalar] physical type


This was added in #1155 but unfortunately didn't work as anticipated because of the lack of Int16Type which is needed for decoding levels data

Can we impl ScalarDataType for i16?

If you you need to remove this code, then we should probably reopen the original ticket #1132

impl ScalarDataType for i16

In short, no... DataType is tightly coupled with what it means to be a physical parquet type, which i16 is not

If you you need to remove this code, then we should probably reopen the original ticket

It is an alternative way of fixing that ticket. Rather than constraining T: DataType we constrain T::T. The two approaches are equivalent, but the latter allows implementing the marker trait for types that don't have a corresponding DataType

…eader

tustvold added 2 commits December 10, 2021 13:59

Simplify record reader

9930e50

Generify ColumnReaderImpl and RecordReader (apache#1040)

be0fc1b

github-actions bot added the parquet Changes to the parquet crate label Dec 13, 2021

tustvold commented Dec 13, 2021

View reviewed changes

parquet/src/arrow/record_reader.rs Outdated Show resolved Hide resolved

tustvold commented Dec 13, 2021

View reviewed changes

parquet/src/arrow/record_reader.rs Show resolved Hide resolved

Tweak count_records predicate

8457562

tustvold mentioned this pull request Dec 13, 2021

Simplify parquet arror RecordReader #1021

Merged

tustvold added 2 commits December 14, 2021 17:07

Pre-allocate bitmask

cdc9d69

Merge branch 'master' into generify-column-reader

2a6b576

tustvold added 3 commits December 14, 2021 20:42

fix: TypedBuffer::split update len

0469ffc

Simplify GenericRecordReader

d5c5dd9

Move column decoders into module

dec899c

tustvold mentioned this pull request Dec 15, 2021

Generify ColumnReaderImpl and RecordReader #1040

Closed

tustvold added 4 commits December 15, 2021 11:37

Remove RecordBuffer::create method

90d1399

Remove TypedBuffer<i16>::count_records

e52c569

Pass null count to ColumnValueDecoder::read

14d917e

Pull null padding out of column reader

cb18695

This was referenced Dec 17, 2021

Improve parquet reading performance for columns with nulls by preserving bitmask when possible (#1037) #1054

Merged

parquet: Optimized ByteArrayReader, Add UTF-8 Validation (#1040) #1082

Merged

yordan-pavlov reviewed Dec 23, 2021

View reviewed changes

parquet/src/column/reader/decoder.rs Outdated Show resolved Hide resolved

yordan-pavlov reviewed Dec 23, 2021

View reviewed changes

parquet/src/column/reader/decoder.rs Outdated Show resolved Hide resolved

yordan-pavlov reviewed Dec 23, 2021

View reviewed changes

parquet/src/column/reader/decoder.rs Outdated Show resolved Hide resolved

yordan-pavlov reviewed Dec 23, 2021

View reviewed changes

yordan-pavlov reviewed Dec 25, 2021

View reviewed changes

parquet/src/arrow/record_reader/definition_levels.rs Outdated Show resolved Hide resolved

yordan-pavlov reviewed Dec 26, 2021

View reviewed changes

Review feedback

cb250ae

tustvold added 2 commits January 1, 2022 19:06

Format

4c11590

License headers

0fe966a

tustvold commented Jan 1, 2022

View reviewed changes

Further doc tweaks

6a21ad2

tustvold commented Jan 1, 2022

View reviewed changes

tustvold marked this pull request as ready for review January 1, 2022 20:01

alamb approved these changes Jan 4, 2022

View reviewed changes

Further docs

28228b2

jorgecarleitao reviewed Jan 4, 2022

View reviewed changes

tustvold mentioned this pull request Jan 5, 2022

RecordReader Permits Illegal Types #1132

Closed

tustvold marked this pull request as draft January 10, 2022 22:26

tustvold added 2 commits January 11, 2022 14:18

Merge remote-tracking branch 'upstream/master' into generify-column-r…

5e6bb01

…eader

Restrict ScalarBuffer types

48b6d62

tustvold commented Jan 11, 2022

View reviewed changes

tustvold marked this pull request as ready for review January 11, 2022 15:01

Merge remote-tracking branch 'upstream/master' into generify-column-r…

c6f98f7

…eader

alamb merged commit 06431ee into apache:master Jan 11, 2022

This was referenced Feb 5, 2022

Expose Dictionary to reader #1270

Closed

Restrict Decoder to compatible types (#1276) #1277

Merged

tustvold mentioned this pull request May 30, 2022

Optimized Writing of Arrow Byte Array to Parquet #1764

Closed

tustvold mentioned this pull request Aug 8, 2022

Remove LevelDecoder #2379

Closed

tustvold mentioned this pull request Dec 6, 2023

Use Vec instead of Slice in ColumnReader #5177

Closed

		self.buffer
		.resize((self.len + batch_size) * std::mem::size_of::<T>(), 0);

Generify ColumnReaderImpl and RecordReader (#1040) #1041

Generify ColumnReaderImpl and RecordReader (#1040) #1041

Conversation

tustvold commented Dec 13, 2021 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

codecov-commenter commented Dec 13, 2021 • edited Loading

Codecov Report

tustvold commented Dec 14, 2021 • edited Loading

yordan-pavlov Dec 23, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yordan-pavlov Dec 26, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tustvold commented Jan 1, 2022 • edited Loading

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb commented Jan 4, 2022

Choose a reason for hiding this comment

tustvold Jan 4, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tustvold commented Jan 11, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tustvold commented Dec 13, 2021 •

edited

Loading

codecov-commenter commented Dec 13, 2021 •

edited

Loading

tustvold commented Dec 14, 2021 •

edited

Loading

yordan-pavlov Dec 23, 2021 •

edited

Loading

yordan-pavlov Dec 26, 2021 •

edited

Loading

tustvold commented Jan 1, 2022 •

edited

Loading

tustvold Jan 4, 2022 •

edited

Loading