Skip to content

Commit

Permalink
Handle arbitrarily different data in null list column rows when check…
Browse files Browse the repository at this point in the history
…ing for equivalency. (#8666)

The column equivalency checking code was not handling a particular corner case properly.  Fundamentally, there is no requirement that the offsets or child data for null rows in two list columns to be the same.   An example:
```
List<int32_t>:
Length : 7
Offsets : 0, 3, 6, 8, 11, 14, 16, 19
Null count: 7
0010100
   1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 7, 7, 7

List<int32_t>:
Length : 7
Offsets : 0, 0, 0, 2, 2, 5, 5, 5
Null count: 7
0010100
   3, 3, 5, 5, 5
```

At first glance, these columns do not seem equivalent.  However, the only two non-null rows (2 and 4) are identical:
`[[3, 3], [5, 5, 5]]`

The comparison code was expecting row positions to always be the same inside of child rows, but that does not have to be the case.  For example, in the first column, the child row indices that we care about are `[6, 7, 11, 12, 13]`, whereas in the second column they are `[0, 1, 2, 3, 4]`

The fix for this is to fundamentally change how the comparison code works so that instead of simply iterating from `0` to `size` for each column, we instead provide an explicit list of column indices that should be compared.  The various compare functors now take additional `lhs_row_indices` and `rhs_row_indices` columns to reflect this.

For flat hierarchies, this input is always just `[0, 1, 2, 3... size]`.  However, every time we encounter a list column in the hierarchy, the rows that need to be considered for both columns can be completely and arbitrarily changed. 

I'm leaving this as a draft as there is a discussion point in the column _property_ comparisons that I think is worth having.  Similar to the data values, one of the things the column property comparison wanted to do was simply compare `lhs.size()` to `rhs.size()`. But as we can see for the leaf columns in the above case, they are totally different.  However, when we are only checking for _equivalency_ what matters is that the number of rows we are going to be comparing is the same.  Similarly, the null counts cannot be compared directly.  Just the null count of the rows we are explicitly comparing.  As far as I can tell, this is the only way to do it, but I'm not sure it's 100% semantically in the spirit of what the column _properties_ are, since we are really checking the properties of a subset of the overall column.

I left a couple of comments in the property comparator code labelled
`// DISCUSSION`

Note: I haven't added tests yet.

Authors:
  - https://github.com/nvdbaranec

Approvers:
  - Mike Wilson (https://github.com/hyperbolic2346)
  - MithunR (https://github.com/mythrocks)
  - Jake Hemstad (https://github.com/jrhemstad)
  - Nghia Truong (https://github.com/ttnghia)

URL: #8666
  • Loading branch information
nvdbaranec authored Jul 20, 2021
1 parent 0e59d05 commit f1fa694
Show file tree
Hide file tree
Showing 26 changed files with 1,126 additions and 533 deletions.
52 changes: 38 additions & 14 deletions cpp/include/cudf_test/column_utilities.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2019-2020, NVIDIA CORPORATION.
* Copyright (c) 2019-2021, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -27,13 +27,28 @@

namespace cudf {
namespace test {

/**
* @brief Verbosity level of output from column and table comparison functions.
*/
enum class debug_output_level {
FIRST_ERROR = 0, // print first error only
ALL_ERRORS, // print all errors
QUIET // no debug output
};

/**
* @brief Verifies the property equality of two columns.
*
* @param lhs The first column
* @param rhs The second column
* @param verbosity Level of debug output verbosity
*
* @returns True if the column properties are equal, false otherwise
*/
void expect_column_properties_equal(cudf::column_view const& lhs, cudf::column_view const& rhs);
bool expect_column_properties_equal(cudf::column_view const& lhs,
cudf::column_view const& rhs,
debug_output_level verbosity = debug_output_level::FIRST_ERROR);

/**
* @brief Verifies the property equivalence of two columns.
Expand All @@ -44,36 +59,45 @@ void expect_column_properties_equal(cudf::column_view const& lhs, cudf::column_v
*
* @param lhs The first column
* @param rhs The second column
* @param verbosity Level of debug output verbosity
*
* @returns True if the column properties are equivalent, false otherwise
*/
void expect_column_properties_equivalent(cudf::column_view const& lhs,
cudf::column_view const& rhs);
bool expect_column_properties_equivalent(
cudf::column_view const& lhs,
cudf::column_view const& rhs,
debug_output_level verbosity = debug_output_level::FIRST_ERROR);

/**
* @brief Verifies the element-wise equality of two columns.
*
* Treats null elements as equivalent.
*
* @param lhs The first column
* @param rhs The second column
* @param print_all_differences If true display all differences
* @param lhs The first column
* @param rhs The second column
* @param verbosity Level of debug output verbosity
*
* @returns True if the columns (and their properties) are equal, false otherwise
*/
void expect_columns_equal(cudf::column_view const& lhs,
bool expect_columns_equal(cudf::column_view const& lhs,
cudf::column_view const& rhs,
bool print_all_differences = false);
debug_output_level verbosity = debug_output_level::FIRST_ERROR);

/**
* @brief Verifies the element-wise equivalence of two columns.
*
* Uses machine epsilon to compare floating point types.
* Treats null elements as equivalent.
*
* @param lhs The first column
* @param rhs The second column
* @param print_all_differences If true display all differences
* @param lhs The first column
* @param rhs The second column
* @param verbosity Level of debug output verbosity
*
* @returns True if the columns (and their properties) are equivalent, false otherwise
*/
void expect_columns_equivalent(cudf::column_view const& lhs,
bool expect_columns_equivalent(cudf::column_view const& lhs,
cudf::column_view const& rhs,
bool print_all_differences = false);
debug_output_level verbosity = debug_output_level::FIRST_ERROR);

/**
* @brief Verifies the bitwise equality of two device memory buffers.
Expand Down
46 changes: 24 additions & 22 deletions cpp/tests/ast/transform_tests.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@
template <typename T>
using column_wrapper = cudf::test::fixed_width_column_wrapper<T>;

constexpr cudf::test::debug_output_level verbosity{cudf::test::debug_output_level::ALL_ERRORS};

struct TransformTest : public cudf::test::BaseFixture {
};

Expand All @@ -58,7 +60,7 @@ TEST_F(TransformTest, BasicAddition)
auto expected = column_wrapper<int32_t>{13, 27, 21, 50};
auto result = cudf::ast::compute_column(table, expression);

cudf::test::expect_columns_equal(expected, result->view(), true);
cudf::test::expect_columns_equal(expected, result->view(), verbosity);
}

TEST_F(TransformTest, BasicAdditionLarge)
Expand All @@ -74,7 +76,7 @@ TEST_F(TransformTest, BasicAdditionLarge)
auto expected = column_wrapper<int32_t>(b, b + 2000);
auto result = cudf::ast::compute_column(table, expression);

cudf::test::expect_columns_equal(expected, result->view(), true);
cudf::test::expect_columns_equal(expected, result->view(), verbosity);
}

TEST_F(TransformTest, LessComparator)
Expand All @@ -90,7 +92,7 @@ TEST_F(TransformTest, LessComparator)
auto expected = column_wrapper<bool>{true, false, true, false};
auto result = cudf::ast::compute_column(table, expression);

cudf::test::expect_columns_equal(expected, result->view(), true);
cudf::test::expect_columns_equal(expected, result->view(), verbosity);
}

TEST_F(TransformTest, LessComparatorLarge)
Expand All @@ -109,7 +111,7 @@ TEST_F(TransformTest, LessComparatorLarge)
auto expected = column_wrapper<bool>(c, c + 2000);
auto result = cudf::ast::compute_column(table, expression);

cudf::test::expect_columns_equal(expected, result->view(), true);
cudf::test::expect_columns_equal(expected, result->view(), verbosity);
}

TEST_F(TransformTest, MultiLevelTreeArithmetic)
Expand All @@ -135,7 +137,7 @@ TEST_F(TransformTest, MultiLevelTreeArithmetic)
auto result = cudf::ast::compute_column(table, expression_tree);
auto expected = column_wrapper<int32_t>{7, 73, 22, -99};

cudf::test::expect_columns_equal(expected, result->view(), true);
cudf::test::expect_columns_equal(expected, result->view(), verbosity);
}

TEST_F(TransformTest, MultiLevelTreeArithmeticLarge)
Expand Down Expand Up @@ -163,7 +165,7 @@ TEST_F(TransformTest, MultiLevelTreeArithmeticLarge)
auto d = cudf::detail::make_counting_transform_iterator(0, [&](auto i) { return calc(i); });
auto expected = column_wrapper<int32_t>(d, d + 2000);

cudf::test::expect_columns_equal(expected, result->view(), true);
cudf::test::expect_columns_equal(expected, result->view(), verbosity);
}

TEST_F(TransformTest, ImbalancedTreeArithmetic)
Expand All @@ -187,7 +189,7 @@ TEST_F(TransformTest, ImbalancedTreeArithmetic)
auto expected =
column_wrapper<double>{0.6, std::numeric_limits<double>::infinity(), -3.201, -2099.18};

cudf::test::expect_columns_equal(expected, result->view(), true);
cudf::test::expect_columns_equal(expected, result->view(), verbosity);
}

TEST_F(TransformTest, MultiLevelTreeComparator)
Expand All @@ -213,7 +215,7 @@ TEST_F(TransformTest, MultiLevelTreeComparator)
auto result = cudf::ast::compute_column(table, expression_tree);
auto expected = column_wrapper<bool>{false, true, false, false};

cudf::test::expect_columns_equal(expected, result->view(), true);
cudf::test::expect_columns_equal(expected, result->view(), verbosity);
}

TEST_F(TransformTest, MultiTypeOperationFailure)
Expand Down Expand Up @@ -249,7 +251,7 @@ TEST_F(TransformTest, LiteralComparison)
auto result = cudf::ast::compute_column(table, expression);
auto expected = column_wrapper<bool>{false, false, false, true};

cudf::test::expect_columns_equal(expected, result->view(), true);
cudf::test::expect_columns_equal(expected, result->view(), verbosity);
}

TEST_F(TransformTest, UnaryNot)
Expand All @@ -264,7 +266,7 @@ TEST_F(TransformTest, UnaryNot)
auto result = cudf::ast::compute_column(table, expression);
auto expected = column_wrapper<bool>{false, true, false, false};

cudf::test::expect_columns_equal(expected, result->view(), true);
cudf::test::expect_columns_equal(expected, result->view(), verbosity);
}

TEST_F(TransformTest, UnaryTrigonometry)
Expand All @@ -277,17 +279,17 @@ TEST_F(TransformTest, UnaryTrigonometry)
auto expected_sin = column_wrapper<double>{0.0, std::sqrt(2) / 2, std::sqrt(3.0) / 2.0};
auto expression_sin = cudf::ast::expression(cudf::ast::ast_operator::SIN, col_ref_0);
auto result_sin = cudf::ast::compute_column(table, expression_sin);
cudf::test::expect_columns_equivalent(expected_sin, result_sin->view(), true);
cudf::test::expect_columns_equivalent(expected_sin, result_sin->view(), verbosity);

auto expected_cos = column_wrapper<double>{1.0, std::sqrt(2) / 2, 0.5};
auto expression_cos = cudf::ast::expression(cudf::ast::ast_operator::COS, col_ref_0);
auto result_cos = cudf::ast::compute_column(table, expression_cos);
cudf::test::expect_columns_equivalent(expected_cos, result_cos->view(), true);
cudf::test::expect_columns_equivalent(expected_cos, result_cos->view(), verbosity);

auto expected_tan = column_wrapper<double>{0.0, 1.0, std::sqrt(3.0)};
auto expression_tan = cudf::ast::expression(cudf::ast::ast_operator::TAN, col_ref_0);
auto result_tan = cudf::ast::compute_column(table, expression_tan);
cudf::test::expect_columns_equivalent(expected_tan, result_tan->view(), true);
cudf::test::expect_columns_equivalent(expected_tan, result_tan->view(), verbosity);
}

TEST_F(TransformTest, ArityCheckFailure)
Expand All @@ -311,7 +313,7 @@ TEST_F(TransformTest, StringComparison)
auto expected = column_wrapper<bool>{true, false, true, false};
auto result = cudf::ast::compute_column(table, expression);

cudf::test::expect_columns_equal(expected, result->view(), true);
cudf::test::expect_columns_equal(expected, result->view(), verbosity);
}

TEST_F(TransformTest, CopyColumn)
Expand All @@ -325,7 +327,7 @@ TEST_F(TransformTest, CopyColumn)
auto result = cudf::ast::compute_column(table, expression);
auto expected = column_wrapper<int32_t>{3, 0, 1, 50};

cudf::test::expect_columns_equal(expected, result->view(), true);
cudf::test::expect_columns_equal(expected, result->view(), verbosity);
}

TEST_F(TransformTest, CopyLiteral)
Expand All @@ -341,7 +343,7 @@ TEST_F(TransformTest, CopyLiteral)
auto result = cudf::ast::compute_column(table, expression);
auto expected = column_wrapper<int32_t>{-123, -123, -123, -123};

cudf::test::expect_columns_equal(expected, result->view(), true);
cudf::test::expect_columns_equal(expected, result->view(), verbosity);
}

TEST_F(TransformTest, TrueDiv)
Expand All @@ -358,7 +360,7 @@ TEST_F(TransformTest, TrueDiv)
auto result = cudf::ast::compute_column(table, expression);
auto expected = column_wrapper<double>{1.5, 0.0, 0.5, 25.0};

cudf::test::expect_columns_equal(expected, result->view(), true);
cudf::test::expect_columns_equal(expected, result->view(), verbosity);
}

TEST_F(TransformTest, FloorDiv)
Expand All @@ -375,7 +377,7 @@ TEST_F(TransformTest, FloorDiv)
auto result = cudf::ast::compute_column(table, expression);
auto expected = column_wrapper<double>{1.0, 0.0, 0.0, 25.0};

cudf::test::expect_columns_equal(expected, result->view(), true);
cudf::test::expect_columns_equal(expected, result->view(), verbosity);
}

TEST_F(TransformTest, Mod)
Expand All @@ -392,7 +394,7 @@ TEST_F(TransformTest, Mod)
auto result = cudf::ast::compute_column(table, expression);
auto expected = column_wrapper<double>{1.0, 0.0, -1.0, 0.0};

cudf::test::expect_columns_equal(expected, result->view(), true);
cudf::test::expect_columns_equal(expected, result->view(), verbosity);
}

TEST_F(TransformTest, PyMod)
Expand All @@ -409,7 +411,7 @@ TEST_F(TransformTest, PyMod)
auto result = cudf::ast::compute_column(table, expression);
auto expected = column_wrapper<double>{1.0, 0.0, 1.0, 0.0};

cudf::test::expect_columns_equal(expected, result->view(), true);
cudf::test::expect_columns_equal(expected, result->view(), verbosity);
}

TEST_F(TransformTest, BasicAdditionNulls)
Expand All @@ -425,7 +427,7 @@ TEST_F(TransformTest, BasicAdditionNulls)
auto expected = column_wrapper<int32_t>{{0, 0, 0, 50}, {0, 0, 0, 1}};
auto result = cudf::ast::compute_column(table, expression);

cudf::test::expect_columns_equal(expected, result->view(), true);
cudf::test::expect_columns_equal(expected, result->view(), verbosity);
}

TEST_F(TransformTest, BasicAdditionLargeNulls)
Expand All @@ -451,7 +453,7 @@ TEST_F(TransformTest, BasicAdditionLargeNulls)
auto expected = column_wrapper<int32_t>(b, b + N, validities.begin());
auto result = cudf::ast::compute_column(table, expression);

cudf::test::expect_columns_equal(expected, result->view(), true);
cudf::test::expect_columns_equal(expected, result->view(), verbosity);
}

CUDF_TEST_PROGRAM_MAIN()
Loading

0 comments on commit f1fa694

Please sign in to comment.