feat: support fastlanes bitpacking #2886

broccoliSpicy · 2024-09-16T13:43:45Z

This PR uses fastlanes algorithm for bit-pack encoding.

The bit-packing routine is migrated from SpiralDB's fastlanes implementation, the migrated code is modified to allow rust stable build.

#2865

github-actions · 2024-09-16T13:44:08Z

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

broccoliSpicy · 2024-09-16T14:00:31Z

benchmark results:
before: f9f151d, enable bitpacking(comment out https://github.com/westonpace/lance/blob/ace3a400f973b5e3799d414493344ea7ffdd2f05/rust/lance-encoding/src/encoder.rs#L583-L586)
then use the bench_decode2 function in a1e3cdf:

using fastlanes:
a1e3cdf

ce0f798

machine info:

  *-cache:0
       description: L1 cache
       physical id: 5
       slot: L1-Cache
       size: 1536KiB
       capacity: 1536KiB
       capabilities: synchronous internal write-back instruction
       configuration: level=1
  *-cache:1
       description: L2 cache
       physical id: 6
       slot: L2-Cache
       size: 24MiB
       capacity: 24MiB
       capabilities: synchronous internal varies unified
       configuration: level=2
  *-cache:2
       description: L3 cache
       physical id: 7
       slot: L3-Cache
       size: 35MiB
       capacity: 35MiB
       capabilities: synchronous internal varies unified
       configuration: level=3
  *-memory
       description: System Memory
       physical id: 8
       slot: System board or motherboard
       size: 64GiB
     *-bank
          description: DIMM DDR4 Static column Pseudo-static Synchronous Window DRAM 2933 MHz (0.3 ns)
          physical id: 0
          size: 64GiB
          width: 64 bits
          clock: 2933MHz (0.3ns)

➜  ~ uname -a
Linux ip-172-31-2-120 6.1.0-23-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.99-1 (2024-07-15) x86_64 GNU/Linux
➜  ~ cat /proc/cpuinfo | grep 'model name' | uniq

model name	: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz

machine net memory write throughput(single thread write):

machine net memory write throughput test code:

use tokio::runtime::Runtime;
use tokio::io::AsyncWriteExt;
use std::time::Instant;

async fn write_to_memory(size: usize) -> f64 {
    let mut buffer = Vec::with_capacity(size);
    let data = vec![0u8; size];

    let start = Instant::now();
    buffer.write_all(&data).await.unwrap();
    let duration = start.elapsed();

    let bytes_written = size as f64;
    let seconds = duration.as_secs_f64();
    let throughput = bytes_written / seconds;

    throughput
}

fn main() {
    let size = 1024 * 1024 * 1024 * 16;
    let rt = Runtime::new().unwrap();

    let throughput = rt.block_on(write_to_memory(size));

    // Convert throughput from bytes/sec to MB/sec
    let throughput_mb_per_sec = throughput / (1024.0 * 1024.0);

    println!("Throughput: {:.2} MB/sec", throughput_mb_per_sec);
}

rust
cargo run --release for the above code

broccoliSpicy · 2024-09-17T12:36:40Z

I poked around a bit, I didn't know this before, to stress out memory write throughput, we need to do multi-threading, but when I do this, I killed the chance to do multi-threading by spawning many decode

        async move {
            let bytes = bytes.await?;
            let decompressed_output = bitpacked_for_non_neg_decode(
                compressed_bit_width,
                uncompressed_bits_per_value,
                &bytes,
                &bytes_idx_to_range_indices,
                num_rows,
            );
            Ok(Box::new(BitpackedForNonNegPageDecoder {
                uncompressed_bits_per_value,
                decompressed_buf: decompressed_output,
            }) as Box<dyn PrimitivePageDecoder>)
        }
        .boxed()

in this function:

fn bitpacked_for_non_neg_decode(
    compressed_bit_width: u64,
    uncompressed_bits_per_value: u64,
    data: &Vec<Bytes>,
    bytes_idx_to_range_indices: &Vec<Vec<std::ops::Range<u64>>>,
    num_rows: u64,
) -> Vec<u8> {
    match uncompressed_bits_per_value {
        8 => {
            let mut decompressed: Vec<u8> = Vec::with_capacity(num_rows as usize);
            let packed_chunk_size: usize = 1024 * compressed_bit_width as usize / 8;
            let mut decompress_chunk_buf = vec![0_u8; 1024];
            for (i, bytes) in data.iter().enumerate() {
                let mut j = 0;
                let mut ranges_idx = 0;
                let mut curr_range_start = bytes_idx_to_range_indices[i][0].start;
                while j * packed_chunk_size < bytes.len() {
                    let chunk: &[u8] = &bytes[j * packed_chunk_size..][..packed_chunk_size];
                    unsafe {
                        BitPacking::unchecked_unpack(
                            compressed_bit_width as usize,
                            chunk,
                            &mut decompress_chunk_buf[..1024],
                        );
                    }
                    loop {
                        if curr_range_start + 1024 < bytes_idx_to_range_indices[i][ranges_idx].end {
                            let this_part_len = 1024 - curr_range_start % 1024;
                            decompressed.extend_from_slice(
                                &decompress_chunk_buf[curr_range_start as usize % 1024..],
                            );
                            curr_range_start += this_part_len;
                            break;
                        } else {
                            let this_part_len =
                                bytes_idx_to_range_indices[i][ranges_idx].end - curr_range_start;
                            decompressed.extend_from_slice(
                                &decompress_chunk_buf[curr_range_start as usize % 1024..]
                                    [..this_part_len as usize],
                            );
                            ranges_idx += 1;
                            if ranges_idx == bytes_idx_to_range_indices[i].len() {
                                break;
                            }
                            curr_range_start = bytes_idx_to_range_indices[i][ranges_idx].start;
                        }
                    }
                    j += 1;
                }
            }
            decompressed
        }
        _ => panic!("Unsupported data type"),
    }
}

I can do some like this:

                while j * packed_chunk_size * 34 < bytes.len() {
                    // decode first chunk
                    // decode 32 chunks in parallel
                    // decode the last chunk

to decode 32 chunks in parallel, I found it is okay to do something like this in Rust:

 use std::sync::Arc;
use std::thread;
use std::ptr;

fn main() {
    let len = 100;

    // Pre-allocate a Vec<u8> with a specific length, initialized to zero
    let mut data = vec![0u8; len];

    let ptr = data.as_ptr() as u64;

    let mut handles = vec![];

    // Spawn 10 threads, each writing to a different section of the Vec
    for i in 0..10 {
        let ptr = ptr;
        let handle = thread::spawn(move || {
            let start = i * 10;
            let end = start + 10;

            // Each thread writes to its own non-overlapping section
            for j in start..end {
                unsafe {
                    // Use raw pointer to modify the Vec directly
                    (ptr as *mut u8).add(j).write((j - start) as u8);
                }
            }
        });
        handles.push(handle);
    }

    // Join all threads
    for handle in handles {
        handle.join().unwrap();
    }

    // Print the modified Vec
    println!("{:?}", data);
}

broccoliSpicy · 2024-09-18T00:46:50Z

cargo clippy is unhappy with this:

    = help: to override `-W clippy::all` add `#[allow(clippy::uninit_vec)]`

warning: calling `set_len()` immediately after reserving a buffer creates uninitialized values
   --> rust/lance-encoding/src/encodings/physical/bitpack_fastlanes.rs:636:17
    |
636 |                 let mut output: Vec<u32> = Vec::with_capacity(num_rows as usize);
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
637 |                 unsafe {
638 |                     output.set_len(num_rows as usize);
    |                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |

should I just suppress it?

westonpace

Some initial comments

westonpace · 2024-09-18T17:58:32Z

rust/lance-encoding/src/buffer.rs

+    pub fn reinterpret_to_rust_native<T>(&mut self) -> Result<&[T]>
+    where
+        T: Copy, // Ensure `T` can be copied (as needed for safely reinterpreting bytes)
+    {
+        let buffer = self.borrow_and_clone();
+
+        let buffer = buffer.into_buffer();
+
+        // Get the raw byte slice from the buffer.
+        let byte_slice = buffer.as_slice();
+
+        // Safety check - ensure that the byte slice length is a multiple of `T`.
+        if byte_slice.len() % std::mem::size_of::<T>() != 0 {
+            return Err(Error::Internal {
+                message: "Buffer size is not a multiple of the target type size".to_string(),
+                location: location!(),
+            });
+        }
+
+        // Reinterpret the byte slice as a slice of `T`.
+        let typed_slice = unsafe {
+            std::slice::from_raw_parts(
+                byte_slice.as_ptr() as *const T,
+                byte_slice.len() / std::mem::size_of::<T>(),
+            )
+        };
+
+        Ok(typed_slice)
+    }
+


How is this different from borrow_to_typed_slice?

yeah, this is actually the same, I should delete this function

westonpace · 2024-09-18T18:02:11Z

protos/encodings.proto

+// Items are bitpacked in a buffer
+message BitpackedForNonNeg {
+  // the number of bits used for a value in the buffer
+  uint64 compressed_bits_per_value = 1;
+
+  // the number of bits of the uncompressed value. e.g. for a u32, this will be 32
+  uint64 uncompressed_bits_per_value = 2;
+
+  // The items in the list
+  Buffer buffer = 3;
+}


What do you think BitpackedWithNeg` will look like?

I plan to make it a cascading encoding of a BTreeMap of row number index to real value for a few very wide(bit-width) values (negative values) then bitpacking, for arrays that have too many negative values (for example: 50 percent), I think we should not use bitpacking on it.

In theory these should still may be bitpackable right? For example if all data is between [-8, 8] then we could shift to [0, 16] and bitpack? Although I suppose frame-of-reference would do that for us 🤔

yeah, I think there might be some literatures focusing on how to intuitively combine lightweight integer encoding algorithms, I will research on that.

westonpace · 2024-09-18T18:02:29Z

rust/lance-encoding/Cargo.toml

+fastlanes = "0.1.5"
+bytemuck = "=1.18.0"


Are these being used?

yeah, they are used in the bitpacking_fastlanes.rs

westonpace · 2024-09-18T18:03:32Z

rust/lance-encoding/src/encodings/physical/bitpack_fastlanes.rs

+// Compute the compressed_bit_width for a given array of unsigned integers
+// the vortex approach is better, they compute all stastistics before encoding


Maybe just say "todo: compute all statistics before encoding"?

westonpace · 2024-09-18T18:03:46Z

rust/lance-encoding/src/encodings/physical/bitpack_fastlanes.rs

+                    .as_any()
+                    .downcast_ref::<PrimitiveArray<UInt8Type>>()
+                    .unwrap();
+                //println!("primitive_array: {:?}", primitive_array);


Suggested change

//println!("primitive_array: {:?}", primitive_array);

westonpace · 2024-09-18T18:07:06Z

rust/lance-encoding/src/encodings/physical/bitpack_fastlanes.rs

+}
+
+impl ArrayEncoder for BitpackedForNonNegArrayEncoder {
+    fn encode(


Definitely a lot of repetition in this method but we can try and simplify later if it is not easy to collapse these cases (e.g. if it requires a macro lets tackle it later)

yeah, I just need a way to map say, both DataType::UInt8 and 'DataType::Int8tou8` then I should know how to simplify it

I like it. Good work!

rust/lance-encoding/src/encodings/physical/bitpack_fastlanes.rs

westonpace · 2024-09-18T18:08:11Z

rust/lance-encoding/src/encodings/physical/bitpack_fastlanes.rs

+                // I did an extra copy here, not sure how to avoid it and whether it's safe to avoid it
+                let mut output = Vec::with_capacity(num_rows as usize);
+                output.extend_from_slice(
+                    &self.decompressed_buf[rows_to_skip as usize..][..num_rows as usize],
+                );


It should be safe to avoid this copy if I understand what you are doing here. We should be able to slice a LanceBuffer.

yeah, I don't know if it's safe for the consumer of decode to get a Borrowed LanceBuffer, like I don't know if they will modify it,

Ah, ok. Returning a borrowed buffer is always preferable to making a copy just to return an owned buffer.

In the first case the consumer might or might not have to make a copy. If they have to make one, then its no more expensive for them to make the copy themselves.

In the second case we guarantee a copy happens, even if the consumer is able to work with a borrowed buffer.

Consumers should never assume they have an owned buffer. If they need an owned buffer they should use LanceBuffer::into_owned.

gotcha, will modify accordingly.

westonpace · 2024-09-18T18:09:08Z

rust/lance-encoding/src/encodings/physical/bitpack_fastlanes.rs

+        let values: Vec<i16> = vec![0; 10000];
+        let array = Int16Array::from(values);
+        let arr = Arc::new(array) as ArrayRef;
+        check_round_trip_encoding_of_data(vec![arr], &TestCases::default(), HashMap::new()).await;
+
+        let values: Vec<i16> = vec![88; 10000];
+        let array = Int16Array::from(values);
+        let arr = Arc::new(array) as ArrayRef;
+        check_round_trip_encoding_of_data(vec![arr], &TestCases::default(), HashMap::new()).await;
+
+        let values: Vec<i16> = vec![300; 100];
+        let array = Int16Array::from(values);
+        let arr = Arc::new(array) as ArrayRef;
+        check_round_trip_encoding_of_data(vec![arr], &TestCases::default(), HashMap::new()).await;
+
+        let values: Vec<i16> = vec![800; 100];
+        let array = Int16Array::from(values);
+        let arr = Arc::new(array) as ArrayRef;
+        check_round_trip_encoding_of_data(vec![arr], &TestCases::default(), HashMap::new()).await;
+
+        let arr = lance_datagen::gen()
+            .anon_col(lance_datagen::array::rand_type(
+                &DataType::Int16,
+            ))
+            .into_batch_rows(RowCount::from(1))
+            .unwrap()
+            .column(0)
+            .clone();
+        check_round_trip_encoding_of_data(vec![arr], &TestCases::default(), HashMap::new()).await;
+
+        let arr = lance_datagen::gen()
+            .anon_col(lance_datagen::array::rand_type(
+                &DataType::Int16,
+            ))
+            .into_batch_rows(RowCount::from(20))
+            .unwrap()
+            .column(0)
+            .clone();
+        check_round_trip_encoding_of_data(vec![arr], &TestCases::default(), HashMap::new()).await;
+
+        let arr = lance_datagen::gen()
+            .anon_col(lance_datagen::array::rand_type(
+                &DataType::Int16,
+            ))
+            .into_batch_rows(RowCount::from(50))
+            .unwrap()
+            .column(0)
+            .clone();
+        check_round_trip_encoding_of_data(vec![arr], &TestCases::default(), HashMap::new()).await;
+
+        let arr = lance_datagen::gen()
+            .anon_col(lance_datagen::array::rand_type(
+                &DataType::Int16,
+            ))
+            .into_batch_rows(RowCount::from(100))
+            .unwrap()
+            .column(0)
+            .clone();
+        check_round_trip_encoding_of_data(vec![arr], &TestCases::default(), HashMap::new()).await;
+
+        let arr = lance_datagen::gen()
+            .anon_col(lance_datagen::array::rand_type(
+                &DataType::Int16,
+            ))
+            .into_batch_rows(RowCount::from(1000))
+            .unwrap()
+            .column(0)
+            .clone();
+        check_round_trip_encoding_of_data(vec![arr], &TestCases::default(), HashMap::new()).await;
+
+        let arr = lance_datagen::gen()
+            .anon_col(lance_datagen::array::rand_type(
+                &DataType::Int16,
+            ))
+            .into_batch_rows(RowCount::from(1024))
+            .unwrap()
+            .column(0)
+            .clone();
+        check_round_trip_encoding_of_data(vec![arr], &TestCases::default(), HashMap::new()).await;
+
+        let arr = lance_datagen::gen()
+            .anon_col(lance_datagen::array::rand_type(
+                &DataType::Int16,
+            ))
+            .into_batch_rows(RowCount::from(2000))
+            .unwrap()
+            .column(0)
+            .clone();
+        check_round_trip_encoding_of_data(vec![arr], &TestCases::default(), HashMap::new()).await;
+
+        let arr = lance_datagen::gen()
+            .anon_col(lance_datagen::array::rand_type(
+                &DataType::Int16,
+            ))
+            .into_batch_rows(RowCount::from(3000))
+            .unwrap()
+            .column(0)
+            .clone();
+        check_round_trip_encoding_of_data(vec![arr], &TestCases::default(), HashMap::new()).await;
+        */


Clean this up?

westonpace · 2024-09-18T18:10:11Z

rust/lance-encoding/src/encoder.rs

+            DataType::UInt8 | DataType::UInt16 | DataType::UInt32 | DataType::UInt64 => {
+                let compressed_bit_width = compute_compressed_bit_width_for_non_neg(arrays);
+                Ok(Box::new(BitpackedForNonNegArrayEncoder::new(
+                    compressed_bit_width as usize,
+                    data_type.clone(),
+                )))
+            }
+
+            // for signed integers, I intend to make it a cascaded encoding, a sparse array for the negative values and very wide(bit-width) values,
+            // then a bitpacked array for the narrow(bit-width) values, I need `BitpackedForNeg` to be merged first
+            DataType::Int8 | DataType::Int16 | DataType::Int32 | DataType::Int64 => {
+                let compressed_bit_width = compute_compressed_bit_width_for_non_neg(arrays);
+                Ok(Box::new(BitpackedForNonNegArrayEncoder::new(
+                    compressed_bit_width as usize,
+                    data_type.clone(),
+                )))
+            }


Let's make sure we only use the bitpacking encoder if the version is 2.1.

`bench_decode2`function

broccoliSpicy · 2024-09-19T15:59:04Z

benchmark results on my mac(2023 M2 Pro) 16GiB of memory:
3ad773c

to reproduce:
change RowCount in bench_decode to 512 * 1024 * 1024:

     for data_type in PRIMITIVE_TYPES {
         let data = lance_datagen::gen()
             .anon_col(lance_datagen::array::rand_type(data_type))
-            .into_batch_rows(lance_datagen::RowCount::from(1024 * 1024))
+            .into_batch_rows(lance_datagen::RowCount::from(512 * 1024 * 1024))
             .unwrap();

then run rustup run nightly cargo bench --bench decoder -- decode_primitive/uint32

…n encoder for dict indices

…tlanes

codecov-commenter · 2024-09-24T18:48:49Z

Codecov Report

Attention: Patch coverage is 96.23431% with 72 lines in your changes missing coverage. Please review.

Project coverage is 78.98%. Comparing base (47eb00e) to head (655a063).
Report is 8 commits behind head on main.

Files with missing lines	Patch %	Lines
...t/lance-encoding/src/compression_algo/fastlanes.rs	97.24%	24 Missing and 17 partials ⚠️
...coding/src/encodings/physical/bitpack_fastlanes.rs	92.17%	22 Missing and 5 partials ⚠️
rust/lance-encoding/src/buffer.rs	85.71%	2 Missing ⚠️
rust/lance-encoding/src/testing.rs	80.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2886      +/-   ##
==========================================
+ Coverage   77.75%   78.98%   +1.22%     
==========================================
  Files         231      234       +3     
  Lines       70485    72822    +2337     
  Branches    70485    72822    +2337     
==========================================
+ Hits        54805    57516    +2711     
+ Misses      12769    12327     -442     
- Partials     2911     2979      +68

Flag	Coverage Δ
unittests	`78.98% <96.23%> (+1.22%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

westonpace

Some minor feedback, nothing major

westonpace · 2024-09-24T20:37:00Z

protos/encodings.proto

+// Items are bitpacked in a buffer
+message BitpackedForNonNeg {
+  // the number of bits used for a value in the buffer
+  uint64 compressed_bits_per_value = 1;
+
+  // the number of bits of the uncompressed value. e.g. for a u32, this will be 32
+  uint64 uncompressed_bits_per_value = 2;
+
+  // The items in the list
+  Buffer buffer = 3;
+}


In theory these should still may be bitpackable right? For example if all data is between [-8, 8] then we could shift to [0, 16] and bitpack? Although I suppose frame-of-reference would do that for us 🤔

westonpace · 2024-09-24T20:38:27Z

rust/lance-encoding/src/buffer.rs

+        let original_buffer_len = match self {
+            Self::Borrowed(buffer) => buffer.len(),
+            Self::Owned(buffer) => buffer.len(),
+        };


Maybe we should add a len function?

rust/lance-encoding/src/buffer.rs

westonpace · 2024-09-24T20:40:43Z

rust/lance-encoding/src/buffer.rs

+        );
+        match self {
+            Self::Borrowed(buffer) => Self::Borrowed(buffer.slice_with_length(offset, length)),
+            // A copy happened during `Buffer::from_slice_ref`


Move this comment up to the method description. It should be clear, from the method docs, when a copy is made / not made. So, something like "if the buffer is owned this method will require a copy"

westonpace · 2024-09-24T20:41:02Z

rust/lance-encoding/src/buffer.rs

+            Self::Borrowed(buffer) => Self::Borrowed(buffer.slice_with_length(offset, length)),
+            // A copy happened during `Buffer::from_slice_ref`
+            Self::Owned(buffer) => {
+                Self::Borrowed(Buffer::from_slice_ref(&buffer[offset..offset + length]))


If we're going to make a copy anyways we should at least make it an owned copy.

westonpace · 2024-09-24T21:11:34Z

rust/lance-encoding/src/encodings/physical/bitpack_fastlanes.rs

+    res
+}
+
+macro_rules! encode_fixed_width {


Please document these macros. Rust macros aren't as evil as C macros but they can still be hard to read. What is this macro expected to do?

westonpace · 2024-09-24T21:11:53Z

rust/lance-encoding/src/encodings/physical/bitpack_fastlanes.rs

+}
+
+impl ArrayEncoder for BitpackedForNonNegArrayEncoder {
+    fn encode(


I like it. Good work!

westonpace · 2024-09-24T21:26:09Z

rust/lance-encoding/src/encodings/physical/bitpack_fastlanes.rs

+        // can we get here?
+        if ranges.is_empty() {
+            panic!("cannot schedule empty ranges");
+        }


shouldn't be able to. Feel free to make a (not debug) assert

westonpace · 2024-09-24T21:28:33Z

rust/lance-encoding/src/encodings/physical/bitpack_fastlanes.rs

+
+            // when the current range start is in the same chunk as the previous range's end, we colaesce this two bytes ranges
+            // when the current range start is not in the same chunk as the previous range's end, we create a new bytes range
+            if this_start == locate_chunk_start(self, ranges[i - 1].end - 1) {


Instead of re-running locate_chunk_start can we look at the last item in byte_ranges?

actually, the last item in byte_ranges is calculated using locate_chunk_start(range.start) and locate_chunk_end(range.end - 1) and here I am comparing this_start with locate_chunk_start(range.end - 1) to see if I want to coalesce bytes_ranges`

westonpace · 2024-09-24T21:45:55Z

rust/lance-encoding/src/encodings/physical/bitpack_fastlanes.rs

+    use crate::testing::{check_round_trip_encoding_of_data, TestCases};
+
+    #[test_log::test(tokio::test)]
+    async fn test_bitpack_fastlanes_u8() {


I'm not sure these unit tests are using 2.1. I made a PR to change that.

thanks! I will merge it then,
also, in this PR, the tests in test_bitpack_primitive in bitpack will go through bit_pack_fastlanes

oh my god, these unit tests doesn't go through the codebase at all, fortunately these tests are here before I added the Lance Version 2.1 gate.

shivers down my spine

merged and thanks so much!

broccoliSpicy · 2024-09-24T21:54:27Z

I found that after I added test cases like this:

    macro_rules! test_pack_unpack {
        ($type:ty, $bit_width:expr) => {{
            let mut values: [$type; 1024] = [0; 1024];
            let mut rng = rand::thread_rng();
            for value in values.iter_mut() {
                *value = (rng.gen_range(0..(1u64 << $bit_width)) % (1 << $bit_width)) as $type;
            }

            let mut packed: [$type; 1024 * $bit_width / (std::mem::size_of::<$type>() * 8)] =
                [0; 1024 * $bit_width / (std::mem::size_of::<$type>() * 8)];
            for lane in 0..<$type>::LANES {
                // Always loop over lanes first. This is what the compiler vectorizes.
                pack!($type, $bit_width, packed, lane, |$pos| { values[$pos] });
            }

            let mut packed_orig: [$type; 1024 * $bit_width / (std::mem::size_of::<$type>() * 8)] =
                [0; 1024 * $bit_width / (std::mem::size_of::<$type>() * 8)];
            unsafe {
                BitPacking::unchecked_pack($bit_width, &values, &mut packed_orig);
            }

            let mut unpacked: [$type; 1024] = [0; 1024];
            for lane in 0..<$type>::LANES {
                // Always loop over lanes first. This is what the compiler vectorizes.
                unpack!($type, $bit_width, packed, lane, |$idx, $elem| {
                    unpacked[$idx] = $elem;
                });
            }

            assert_eq!(values, unpacked);
        }};
    }

    #[test]
    fn test_pack() {
        test_pack_unpack!(u8, 1);
        test_pack_unpack!(u8, 2);
        test_pack_unpack!(u8, 3);
        test_pack_unpack!(u8, 4);
        test_pack_unpack!(u8, 5);
        test_pack_unpack!(u8, 6);
        test_pack_unpack!(u8, 7);
        test_pack_unpack!(u8, 8);

the test build process takes extremely long, did a force push to revert

Co-authored-by: Weston Pace <weston.pace@gmail.com>

…stonpace/lance into fastlanes

westonpace

Some minor changes and then we're good to merge.

westonpace · 2024-09-27T19:17:46Z

rust/lance-encoding/src/buffer.rs

@@ -283,6 +283,32 @@ impl LanceBuffer {
    pub fn copy_array<const N: usize>(array: [u8; N]) -> Self {
        Self::Owned(Vec::from(array))
    }
+
+    #[allow(clippy::len_without_is_empty)]


We might as well add an is_empty

westonpace · 2024-09-27T19:20:09Z

rust/lance-encoding/src/compression_algo/fastlanes.rs

+// NOTICE:
+// This file is a modification of the `fastlanes` crate: https://github.com/spiraldb/fastlanes
+// It is modified to allow a rust stable build
+// The original code can be accessed at https://github.com/spiraldb/fastlanes/blob/8e0ff374f815...


Suggested change

// The original code can be accessed at https://github.com/spiraldb/fastlanes/blob/8e0ff374f815...

// The original code can be accessed at

https://github.com/spiraldb/fastlanes/blob/7b1df5362e54f38af602463fc6785200af6a6f1c/src/bitpacking.rs

Sorry, my link was cutoff

westonpace · 2024-09-27T19:56:27Z

Tagging this as related to #2857

feature: support fastlanes bitpacking for uint8 type

c089644

broccoliSpicy requested review from westonpace and albertlockett September 16, 2024 13:43

minor fix

a1e3cdf

broccoliSpicy added 3 commits September 17, 2024 13:57

fix a bug, add self.buffer_offset in byte range

3f340a3

minor fix 2

9a6c489

feat: add fastlanes bitpacking for other types

f55a445

broccoliSpicy changed the title ~~feature: support fastlanes bitpacking for uint8 type(draft, wip)~~ feature: support fastlanes bitpacking Sep 18, 2024

westonpace reviewed Sep 18, 2024

View reviewed changes

broccoliSpicy and others added 6 commits September 18, 2024 19:03

address initial PR comments

7c21438

Merge branch 'main' into fastlanes

3b82ec5

fix lint

f0bd3a8

return a slice of LanceBuffer in decode

ce0f798

use elems_per_chunk constant to represent 1024, delete

fb9ede2

`bench_decode2`function

use macro in encode method

3ad773c

Don't pass strings to the choose_array_encoder method when choosing a…

403e89d

…n encoder for dict indices

westonpace mentioned this pull request Sep 19, 2024

Exclude sign bit from bitpacked encoding if all values are negative #2714

Open

broccoliSpicy added 3 commits September 20, 2024 22:19

fix a bug in bitpacked_for_non_neg_decode

0ae8362

add stable rust fastlanes

23e261c

Merge remote-tracking branch 'refs/remotes/origin/fastlanes' into fas…

3f92fcd

…tlanes

broccoliSpicy changed the title ~~feature: support fastlanes bitpacking~~ feat: support fastlanes bitpacking Sep 20, 2024

github-actions bot added the enhancement New feature or request label Sep 20, 2024

broccoliSpicy added 3 commits September 20, 2024 22:39

fix lint

dba9a48

remove external fastlanes crate

1eb75e2

license header

1485759

broccoliSpicy added 3 commits September 21, 2024 02:43

fix lint

8543f54

delete a unnecessary file rust/lance-encoding/compression-algo/mod.rs

ee78fc6

delete two redundant file

fe3fda8

westonpace mentioned this pull request Sep 23, 2024

Support dictionary encoding with bitpacked indices #2922

Open

broccoliSpicy added 4 commits September 23, 2024 19:36

hangle nullable and all null data block in encode.

697af4a

fix choose_array_encoder issue when enable V2.1

922c2fe

fix lint

f09cad7

fix a bug scheduling ranges for data types other than 32-bit width

fc89bf4

Make sure to use version 2.1 in tests for bitpacking

d5b9201

westonpace reviewed Sep 24, 2024

View reviewed changes

broccoliSpicy force-pushed the fastlanes branch from 90f8ea3 to fc89bf4 Compare September 24, 2024 21:51

broccoliSpicy and others added 7 commits September 24, 2024 22:11

make locate_chunk_start and locate_chunk_end a method

13b757a

Merge branch 'main' into fastlanes

ca4dba3

add test_pack

f42af4c

add test_unchecked_pack

1c2878b

address PR comments

688bb1f

Update rust/lance-encoding/src/buffer.rs

4d7557f

Co-authored-by: Weston Pace <weston.pace@gmail.com>

Merge branch 'fix/use-v2-1-on-bitpack-tests' of https://github.com/we…

c7ecb08

…stonpace/lance into fastlanes

broccoliSpicy requested a review from westonpace September 27, 2024 17:44

westonpace approved these changes Sep 27, 2024

View reviewed changes

broccoliSpicy added 2 commits September 27, 2024 19:32

fix fastlanes original code link

dedb306

lint

655a063

westonpace mentioned this pull request Sep 27, 2024

Integer compression in 2.1 #2857

Open

6 tasks

broccoliSpicy merged commit 681db8c into lancedb:main Sep 27, 2024
24 checks passed

		// Compute the compressed_bit_width for a given array of unsigned integers
		// the vortex approach is better, they compute all stastistics before encoding

	// The original code can be accessed at https://github.com/spiraldb/fastlanes/blob/8e0ff374f815...
	// The original code can be accessed at
	https://github.com/spiraldb/fastlanes/blob/7b1df5362e54f38af602463fc6785200af6a6f1c/src/bitpacking.rs

feat: support fastlanes bitpacking #2886

feat: support fastlanes bitpacking #2886

Conversation

broccoliSpicy commented Sep 16, 2024 • edited Loading

github-actions bot commented Sep 16, 2024

broccoliSpicy commented Sep 16, 2024 • edited Loading

broccoliSpicy commented Sep 17, 2024 • edited Loading

broccoliSpicy commented Sep 18, 2024

westonpace left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

broccoliSpicy Sep 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

broccoliSpicy commented Sep 19, 2024

codecov-commenter commented Sep 24, 2024 • edited Loading

Codecov Report

westonpace left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

broccoliSpicy commented Sep 24, 2024 • edited Loading

westonpace left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

westonpace commented Sep 27, 2024

broccoliSpicy commented Sep 16, 2024 •

edited

Loading

broccoliSpicy commented Sep 16, 2024 •

edited

Loading

broccoliSpicy commented Sep 17, 2024 •

edited

Loading

broccoliSpicy Sep 18, 2024 •

edited

Loading

codecov-commenter commented Sep 24, 2024 •

edited

Loading

broccoliSpicy commented Sep 24, 2024 •

edited

Loading