Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-35344: [C++][Format] Implementation of the LIST_VIEW and LARGE_LIST_VIEW array formats #35345

Merged
merged 91 commits into from
Nov 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
91 commits
Select commit Hold shift + click to select a range
614ab2c
[Large]ListViewType: Add the list-view type classes and [LARGE_]LIST_…
felipecrv Apr 18, 2023
03ab072
[Large]ListViewArray: Add the list-view array classes
felipecrv Apr 18, 2023
0e70f57
[Large]ListViewScalar: Add list-view scalar classes
felipecrv Apr 18, 2023
7cbf4f1
Parquet: [Large]ListViewArray: Add placeholder for list-view writing …
felipecrv Apr 18, 2023
eb1ce8d
Python: [Large]ListViewArray: Disable writing list-view data into Pandas
felipecrv Apr 18, 2023
a97cf86
visitor_generate.h: Add [Large]ListView to generated visitor code
felipecrv Apr 26, 2023
0578770
[Large]ListViewType: Implement type Compare + tests
felipecrv Jul 5, 2023
fe96002
BaseListBuilder: Make base builder mostly compatible with ListViews
felipecrv Jun 7, 2023
ad39a9d
BaseVarLengthListLikeBuilder: Add a version of Append() that takes a …
felipecrv Jul 7, 2023
06f9072
[Large]ListViewArrayBuilder: Add list-view builder classes
felipecrv Apr 27, 2023
08d541b
[Large]ListViewArray: Buffers validation, creation from JSON, and bas…
felipecrv Apr 25, 2023
1d04ed8
[Large]ListViewScalar: Implement all operations
felipecrv Apr 28, 2023
5bfd169
[Large]ListViewArray: Implement Validate
felipecrv Apr 28, 2023
40e9e5f
[Large]ListViewArray: Implement Flatten()
felipecrv Jul 12, 2023
7e3f538
[Large]ListViewArray: Implement Compare + most of the unit tests
felipecrv Jul 26, 2023
21cf422
[Large]ListViewArray: Implement PrettyPrint + tests
felipecrv May 3, 2023
eda45cc
type_traits.h: Add LargeType to TypeTraits of list and list-view types
felipecrv Jul 14, 2023
91c1c00
concatenate.cc: Extract a SumBufferSizes() function and make some tweaks
felipecrv May 4, 2023
05570ad
concatenate.cc: Pass Buffer pointer already dereferenced to PutOffsets
felipecrv May 4, 2023
56352d9
list_util.h: Add RangeOfValuesUsed() function
felipecrv Jun 13, 2023
14b1a60
concatenate.cc: Rename variable from adjustment to displacement
felipecrv May 6, 2023
f54ca23
[Large]ListViewArray: Implement concatenation + tests
felipecrv May 6, 2023
a15bc8c
Fix comment formatting
felipecrv Jun 13, 2023
9398c98
BaseBuilder: Allow ListView scalars and slices to be added to List bu…
felipecrv Jul 14, 2023
b5ea57f
if_else: Rename RunLoop to RunLoopOfNestedIfElseExec
felipecrv Jul 20, 2023
4b11630
if_else_benchmarks: Add ListView benchmarks
felipecrv Jul 22, 2023
1f749d5
if_else: Include LIST_VIEW and LARGE_LIST_VIEW
felipecrv Jul 22, 2023
24efd00
list_util: Add ListView<->List conversion functions + tests
felipecrv Aug 8, 2023
7da4b11
Declare dtor explicitly on base list[view] builders
felipecrv Aug 18, 2023
0b04fc6
Revert "BaseBuilder: Allow ListView scalars and slices to be added to…
felipecrv Sep 7, 2023
693477c
concatenate.cc: Add a fast path to Concatenate when only 1 array is p…
felipecrv Sep 8, 2023
aa0846e
[Large]ListViewArray: Re-write Flatten
felipecrv Sep 8, 2023
f9e9305
scalar_test.cc: Instantiate basic scalar tests with list-view types
felipecrv Sep 26, 2023
8cacd44
concatenate_test.cc: De-duplicate code checking concatenation of list…
felipecrv Sep 26, 2023
ac5b94e
Revert "concatenate.cc: Add a fast path to Concatenate when only 1 ar…
felipecrv Sep 26, 2023
69345b4
validate.cc: Be strict about nullability of offsets and sizes buffers…
felipecrv Sep 26, 2023
540086a
list_util.h: Add SumOfLogicalListSizes() utility
felipecrv Sep 26, 2023
f3596b6
list_util.cc: Use SumOfLogicalListValues when converting between list…
felipecrv Sep 26, 2023
ef436d7
list_util.cc: Rewrite MinViewOffset and MaxViewEnd
felipecrv Sep 27, 2023
d61d15d
list_util.c: Zero initial padding area of sizes buffer
felipecrv Sep 27, 2023
0c18fdd
random.cc: Simplify and split the random generator into two algorithms
felipecrv Sep 30, 2023
7e51ae7
concatenate.cc: Respect the new invariants imposed on the spec
felipecrv Oct 3, 2023
32865ec
validate.cc: Apply list-view invariants to null list-views as well
felipecrv Oct 3, 2023
ffc8f16
util.cc: Implement Endianness swapping for list-views
felipecrv Oct 5, 2023
8e001f6
IPC: Parameterize JSON simple tests arrow list type
felipecrv Oct 4, 2023
7ff434c
IPC: Include ListView types in the simple JSON list tests
felipecrv Oct 5, 2023
fb92b1e
IPC and JSON Integration tests for list-view types
felipecrv Oct 5, 2023
e40b1fa
Add ListView support to the C data interface
felipecrv Oct 5, 2023
17c0370
validate.cc: Re-do list-view modifications on top of concurrent changes
felipecrv Oct 11, 2023
fd927a8
RangeOfValuesUsed: Don't call MaxViewEnd if MinViewOffset is a nullopt
felipecrv Oct 11, 2023
627af2f
concatenate.cc: Rename SumBufferSizes{->InBytes}
felipecrv Oct 12, 2023
e1cfd96
concatenate.cc: Use mutable_data_as<T>
felipecrv Oct 17, 2023
3ecbcf1
Clarify the validations in List[View]Array::FromArrays() docstrings
felipecrv Oct 16, 2023
0cf760c
Rewrite ListView Flatten
felipecrv Oct 17, 2023
8f87038
list_util: Use mutable_data_as<T>
felipecrv Oct 17, 2023
15b0904
More throroughly document builder member functions
felipecrv Oct 18, 2023
dde0f5a
fixup! concatenate.cc: Respect the new invariants imposed on the spec
felipecrv Oct 26, 2023
8d04a38
fixup! Clarify the validations in List[View]Array::FromArrays() docst…
felipecrv Nov 6, 2023
c490d84
Make [Large]ListViewArray docstrings consistent with each other
felipecrv Nov 9, 2023
36bb781
fixup! [Large]ListViewScalar: Implement all operations
felipecrv Nov 9, 2023
4b77163
fixup! [Large]ListViewArray: Implement Validate
felipecrv Nov 9, 2023
f02d5ce
fixup! validate.cc: Be strict about nullability of offsets and sizes …
felipecrv Nov 9, 2023
311603c
Make the C Bridge test set for list-views complete
felipecrv Nov 10, 2023
26e5288
fixup! IPC and JSON Integration tests for list-view types
felipecrv Nov 10, 2023
0da6695
fixup! random.cc: Simplify and split the random generator into two al…
felipecrv Nov 10, 2023
551466a
fixup! [Large]ListViewArray: Implement Compare + most of the unit tests
felipecrv Nov 10, 2023
6922d27
fixup! list_util.h: Add RangeOfValuesUsed() function
felipecrv Nov 10, 2023
8e6af3e
fixup! list_util.cc: Rewrite MinViewOffset and MaxViewEnd
felipecrv Nov 10, 2023
698b1df
fixup! More throroughly document builder member functions
felipecrv Nov 16, 2023
e9fc005
fixup! [Large]ListViewArrayBuilder: Add list-view builder classes
felipecrv Nov 16, 2023
308781b
fixup! [Large]ListViewArray: Implement Compare + most of the unit tests
felipecrv Nov 15, 2023
ec0e10e
fixup! Rewrite ListView Flatten
felipecrv Nov 15, 2023
8b8c6f0
fixup! [Large]ListViewArrayBuilder: Add list-view builder classes
felipecrv Nov 15, 2023
2d8e55f
fixup! fixup! [Large]ListViewScalar: Implement all operations
felipecrv Nov 15, 2023
0972d5c
fixup! [Large]ListViewArray: Implement Validate
felipecrv Nov 16, 2023
5f57008
Remove sparsity parameter from random list-view generator
felipecrv Nov 16, 2023
d554a13
Move list-x-list-view converters from list_util.h/cc to array_nested.…
felipecrv Nov 16, 2023
aef0cb6
Move list-x-list-view converter tests to list_array_test.cc
felipecrv Nov 16, 2023
7d44e76
array_nested.h: Document the IsValid(i) pre-cond on value_length/valu…
felipecrv Nov 16, 2023
ab5711f
fixup! list_util.h: Add RangeOfValuesUsed() function
felipecrv Nov 16, 2023
4fa8e74
fixup! [Large]ListViewArray: Implement Validate
felipecrv Nov 16, 2023
d42aae8
concatenate_test.cc: Isolate all the random array generation code
felipecrv Nov 16, 2023
252ff8b
concatenate.cc: Preserve the properties of offsets and sizes required…
felipecrv Nov 17, 2023
b30d2e5
fixup! concatenate.cc: Preserve the properties of offsets and sizes r…
felipecrv Nov 17, 2023
c2068bb
fixup! fixup! [Large]ListViewScalar: Implement all operations
felipecrv Nov 17, 2023
a41490e
fixup! concatenate_test.cc: Isolate all the random array generation code
felipecrv Nov 17, 2023
79c6acd
fixup! Make the C Bridge test set for list-views complete
felipecrv Nov 17, 2023
05f6f38
fixup! list_util.h: Add RangeOfValuesUsed() function
felipecrv Nov 17, 2023
773c800
fixup! concatenate_test.cc: Isolate all the random array generation code
felipecrv Nov 17, 2023
ab0dcaf
fixup! list_util.h: Add SumOfLogicalListSizes() utility
felipecrv Nov 22, 2023
62ede21
fixup! concatenate.cc: Preserve the properties of offsets and sizes r…
felipecrv Nov 22, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions cpp/src/arrow/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -228,6 +228,7 @@ set(ARROW_SRCS
util/hashing.cc
util/int_util.cc
util/io_util.cc
util/list_util.cc
util/logging.cc
util/key_value_metadata.cc
util/memory.cc
Expand Down Expand Up @@ -789,6 +790,7 @@ add_arrow_test(array_test
array/array_binary_test.cc
array/array_dict_test.cc
array/array_list_test.cc
array/array_list_view_test.cc
array/array_run_end_test.cc
array/array_struct_test.cc
array/array_union_test.cc
Expand Down
2 changes: 1 addition & 1 deletion cpp/src/arrow/array/array_base.cc
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ struct ScalarFromArraySlotImpl {
Status Visit(const MonthDayNanoIntervalArray& a) { return Finish(a.Value(index_)); }

template <typename T>
Status Visit(const BaseListArray<T>& a) {
Status Visit(const VarLengthListLikeArray<T>& a) {
return Finish(a.value_slice(index_));
}

Expand Down
446 changes: 393 additions & 53 deletions cpp/src/arrow/array/array_list_test.cc

Large diffs are not rendered by default.

84 changes: 84 additions & 0 deletions cpp/src/arrow/array/array_list_view_test.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

#include <gtest/gtest.h>

#include "arrow/array/array_nested.h"
#include "arrow/array/util.h"
#include "arrow/pretty_print.h"
#include "arrow/testing/gtest_util.h"
#include "arrow/type_fwd.h"
#include "arrow/util/checked_cast.h"

namespace arrow {

using internal::checked_cast;

// ----------------------------------------------------------------------
// List-view array tests

namespace {

class TestListViewArray : public ::testing::Test {
public:
std::shared_ptr<Array> string_values;
std::shared_ptr<Array> int32_values;
std::shared_ptr<Array> int16_values;

void SetUp() override {
string_values = ArrayFromJSON(utf8(), R"(["Hello", "World", null])");
int32_values = ArrayFromJSON(int32(), "[1, 20, 3]");
int16_values = ArrayFromJSON(int16(), "[10, 2, 30]");
}

static std::shared_ptr<Array> Offsets(std::string_view json) {
return ArrayFromJSON(int32(), json);
}

static std::shared_ptr<Array> Sizes(std::string_view json) {
return ArrayFromJSON(int32(), json);
}
};

} // namespace

TEST_F(TestListViewArray, MakeArray) {
ASSERT_OK_AND_ASSIGN(auto list_view_array,
ListViewArray::FromArrays(*Offsets("[0, 0, 1, 2]"),
*Sizes("[2, 1, 1, 1]"), *string_values));
auto array_data = list_view_array->data();
auto new_array = MakeArray(array_data);
ASSERT_ARRAYS_EQUAL(*new_array, *list_view_array);
// Should be the exact same ArrayData object
ASSERT_EQ(new_array->data(), array_data);
ASSERT_NE(std::dynamic_pointer_cast<ListViewArray>(new_array), NULLPTR);
}

TEST_F(TestListViewArray, FromOffsetsAndSizes) {
std::shared_ptr<ListViewArray> list_view_array;

ASSERT_OK_AND_ASSIGN(list_view_array, ListViewArray::FromArrays(
*Offsets("[0, 0, 1, 1000]"),
*Sizes("[2, 1, 1, null]"), *int32_values));
ASSERT_EQ(list_view_array->length(), 4);
ASSERT_ARRAYS_EQUAL(*list_view_array->values(), *int32_values);
ASSERT_EQ(list_view_array->offset(), 0);
ASSERT_EQ(list_view_array->data()->GetNullCount(), 1);
ASSERT_EQ(list_view_array->data()->buffers.size(), 3);
}

} // namespace arrow
Loading
Loading