Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 46 additions & 4 deletions parquet-variant-compute/src/variant_array_builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ use std::sync::Arc;
/// # use arrow::array::Array;
/// # use parquet_variant::{Variant, VariantBuilder, VariantBuilderExt};
/// # use parquet_variant_compute::VariantArrayBuilder;
/// # use parquet_variant::ShortString;
/// // Create a new VariantArrayBuilder with a capacity of 100 rows
/// let mut builder = VariantArrayBuilder::new(100);
/// // append variant values
Expand All @@ -56,9 +57,13 @@ use std::sync::Arc;
/// .with_field("foo", "bar")
/// .finish();
///
/// // bulk insert a list of values
/// // `Option::None` is a null value
/// builder.extend([None, Some(Variant::from("norm"))]);
///
/// // create the final VariantArray
/// let variant_array = builder.build();
/// assert_eq!(variant_array.len(), 3);
/// assert_eq!(variant_array.len(), 5);
/// // // Access the values
/// // row 1 is not null and is an integer
/// assert!(!variant_array.is_null(0));
Expand All @@ -70,6 +75,12 @@ use std::sync::Arc;
/// let value = variant_array.value(2);
/// let obj = value.as_object().expect("expected object");
/// assert_eq!(obj.get("foo"), Some(Variant::from("bar")));
/// // row 3 is null
/// assert!(variant_array.is_null(3));
/// // row 4 is not null and is a short string
/// assert!(!variant_array.is_null(4));
/// let value = variant_array.value(4);
/// assert_eq!(value, Variant::ShortString(ShortString::try_new("norm").unwrap()));
/// ```
#[derive(Debug)]
pub struct VariantArrayBuilder {
Expand Down Expand Up @@ -162,6 +173,17 @@ impl VariantArrayBuilder {
}
}

impl<'m, 'v> Extend<Option<Variant<'m, 'v>>> for VariantArrayBuilder {
fn extend<T: IntoIterator<Item = Option<Variant<'m, 'v>>>>(&mut self, iter: T) {
Comment on lines +176 to +177
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized... we probably want two impl Extend:

impl<'m, 'v, V: Into<Variant<'m, 'v>> Extend<V> for VariantArrayBuilder {
    fn extend<T: IntoIterator<Item = V>>(&mut self, iter: T) {
        for v in iter {
            self.append_variant(v.into())
        }
    }
}

and

impl<'m, 'v, V: Into<Variant<'m, 'v>> Extend<Option<V>> for VariantArrayBuilder {
    fn extend<T: IntoIterator<Item = Option<V>>>(&mut self, iter: T) {
        for v in iter {
            match v {
                Some(v) => self.append_variant(v.into()),
                None => self.append_null(),
            }
        }
    }
}

I think that's typical for the other variant arrays, to capture both nullable and non-nullable data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I wonder if it's fine to just support the Option<T> case, as it follows the other builder Extend impls.

For example:

Copy link
Contributor Author

@friendlymatthew friendlymatthew Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But fwiw, the goal of this PR was to prepare #8606 for closing.

Now, here we would want to support both Vec<Variant> and Vec<Option<Variant>>

Copy link
Contributor

@scovich scovich Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's fine to just support the Option<T> case, as it follows the other builder Extend impls.

I mean, any support is better than none. But we have existing use cases in the code for both Option<T> and T.

There seems to be an odd split here:

So maybe you're right, that to mirror existing conventions we should not Extend<Variant>. And should potentially consider adding the two From<Vec<...>>?

But then we basically end up implementing Extend<Variant> inside From<Vec<Variant>> itself... 🤷

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of why the (primitive) arrays have impl From<Vec<...>> is so they can re-use the Vec allocations

I don't think that is relevant for Variants as the in memory representation of a Vec<Variant> is not the same as a VariantArray

Copy link
Contributor

@scovich scovich Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that makes sense for e.g. From<Vec<bool>>. They don't need an Extend (not even a hidden one) because they just take over the input and are done.

Meanwhile, I guess From<Vec<Option<bool>>> will use Extend<Option<bool>> under the hood, because that Vec can't be directly used?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is my understanding

for v in iter {
match v {
Some(v) => self.append_variant(v),
None => self.append_null(),
}
Comment on lines +179 to +182
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this better or worse than the current code?

Suggested change
match v {
Some(v) => self.append_variant(v),
None => self.append_null(),
}
v.map_or_else(|| self.append_null(), |v| self.append_variant(v))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having a match statement is more readable IMO

}
}
}

/// Builder-specific state for array building that manages array-level offsets and nulls. See
/// [`VariantBuilderExt`] for details.
#[derive(Debug)]
Expand Down Expand Up @@ -438,14 +460,18 @@ fn binary_view_array_from_buffers(buffer: Vec<u8>, offsets: Vec<usize>) -> Binar
mod test {
use super::*;
use arrow::array::Array;
use parquet_variant::Variant;
use parquet_variant::{ShortString, Variant};

/// Test that both the metadata and value buffers are non nullable
#[test]
fn test_variant_array_builder_non_nullable() {
let mut builder = VariantArrayBuilder::new(10);
builder.append_null(); // should not panic
builder.append_variant(Variant::from(42i32));

builder.extend([
None, // should not panic
Some(Variant::from(42_i32)),
]);

let variant_array = builder.build();

assert_eq!(variant_array.len(), 2);
Expand Down Expand Up @@ -500,6 +526,22 @@ mod test {
assert_eq!(list.len(), 2);
}

#[test]
fn test_extend_variant_array_builder() {
let mut b = VariantArrayBuilder::new(3);
b.extend([None, Some(Variant::Null), Some(Variant::from("norm"))]);

let variant_array = b.build();

assert_eq!(variant_array.len(), 3);
assert!(variant_array.is_null(0));
assert_eq!(variant_array.value(1), Variant::Null);
assert_eq!(
variant_array.value(2),
Variant::ShortString(ShortString::try_new("norm").unwrap())
);
}

#[test]
fn test_variant_value_array_builder_basic() {
let mut builder = VariantValueArrayBuilder::new(10);
Expand Down
Loading