Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] "case_when" test failure on random union inputs #15192

Closed
pitrou opened this issue Jan 4, 2023 · 9 comments · Fixed by #39308
Closed

[C++] "case_when" test failure on random union inputs #15192

pitrou opened this issue Jan 4, 2023 · 9 comments · Fixed by #39308
Assignees
Milestone

Comments

@pitrou
Copy link
Member

pitrou commented Jan 4, 2023

Describe the bug, including details regarding any error messages, version, and platform.

Enabling this commented out test in scalar_if_else_test_.cc produces test failures:

// TEST(TestCaseWhen, UnionBoolStringRandom) {
//   for (const auto& type : std::vector<std::shared_ptr<DataType>>{
//            sparse_union({field("a", boolean()), field("b", utf8())}, {2, 7}),
//            dense_union({field("a", boolean()), field("b", utf8())}, {2, 7})}) {
//     ARROW_SCOPED_TRACE(type->ToString());
//     TestCaseWhenRandom(type);
//   }
// }

Component(s)

C++

@pitrou
Copy link
Member Author

pitrou commented Jan 4, 2023

cc @wjones127 @lidavidm

@pitrou
Copy link
Member Author

pitrou commented Jan 4, 2023

(#15131 is needed to reproduce)

@zanmato1984
Copy link
Contributor

zanmato1984 commented Dec 19, 2023

Looked a bit. The error message is

'::arrow::internal::ValidateArrayFull(output)' failed with Invalid: Union value at position 0 has invalid type id 0

, assuming this is the error that this issue encountered.

The reason of this failure is:

  1. The random array generation for union types doesn't take into account the type code specified in the UnionDataType object:
    // Trivial type codes map
  2. The function case_when executes fine because the type id generation is also trivial
    // Generate array of type ids
  3. Finally a full validation is conducted on the function output
    ValidateOutputImpl(*output.array());

    , which checks if type ids in the buffer match the ones specified in the data type (and they don't)
    return Status::Invalid("Union value at position ", i, " has invalid type id ",

I think I can take this and make random array generation respect the type ids specified in the data type, instead of generating trivial type ids.

@zanmato1984
Copy link
Contributor

I found that we can bring the test back alive by simply removing passing customized type codes, drafted here: #39308
And I think it's acceptable for this specific case because the case is about testing case_when rather than the "type code mapping" thing for union types.
Of course this will leave the potential issue of mismatched type codes between in data type and actual data unchanged. So I'm also open to do a thorough change, i.e., making random union array generation respect the type ids specified in the data type.
Any comments will be helpful. Thanks.
cc @pitrou

@pitrou
Copy link
Member Author

pitrou commented Dec 20, 2023

@zanmato1984 Thanks for the investigation! I think it would be better to make random union array generation conformant, indeed.

@zanmato1984
Copy link
Contributor

@zanmato1984 Thanks for the investigation! I think it would be better to make random union array generation conformant, indeed.

Sure, will do. Thanks @pitrou !

@zanmato1984
Copy link
Contributor

One quick question:
In order to generate conformant type ids for random union array, I'll have to pass either UnionDataType or type codes into the following functions:

std::shared_ptr<Array> SparseUnion(const ArrayVector& fields, int64_t size,
int64_t alignment = kDefaultBufferAlignment,
MemoryPool* memory_pool = default_memory_pool());

std::shared_ptr<Array> DenseUnion(const ArrayVector& fields, int64_t size,
int64_t alignment = kDefaultBufferAlignment,
MemoryPool* memory_pool = default_memory_pool());

However they seem to be public APIs, even though they belong to arrow-testing library. Do I have to make overloads of them or am I free to change the signatures?

I appreciate if you can enlighten me about this? Thanks @pitrou !

@pitrou
Copy link
Member Author

pitrou commented Dec 20, 2023

arrow-testing is not really for public usage, so you can change those APIs.

@zanmato1984
Copy link
Contributor

zanmato1984 commented Dec 21, 2023

Just found that #36018 already addressed the aforementioned problem. I made a mistake, my local branch was too old.

Now this test case can be brought back alive without other changes, updated in #39308.

So sad. And my apology for the unnecessary trouble :-(

pitrou pushed a commit that referenced this issue Dec 21, 2023
### Rationale for this change

Bring back the problematic test case of random `case_when` on union(bool, string) type. This case used to fail. However #36018 already addressed the issue.

More information about how it used to fail, please refer to #15192 (comment).

### What changes are included in this PR?

Bring back the test code.

### Are these changes tested?

Yes, the change is the test.

### Are there any user-facing changes?

No.

* Closes: #15192

Authored-by: zanmato <zanmato1984@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
@pitrou pitrou added this to the 15.0.0 milestone Dec 21, 2023
clayburn pushed a commit to clayburn/arrow that referenced this issue Jan 23, 2024
…pache#39308)

### Rationale for this change

Bring back the problematic test case of random `case_when` on union(bool, string) type. This case used to fail. However apache#36018 already addressed the issue.

More information about how it used to fail, please refer to apache#15192 (comment).

### What changes are included in this PR?

Bring back the test code.

### Are these changes tested?

Yes, the change is the test.

### Are there any user-facing changes?

No.

* Closes: apache#15192

Authored-by: zanmato <zanmato1984@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…pache#39308)

### Rationale for this change

Bring back the problematic test case of random `case_when` on union(bool, string) type. This case used to fail. However apache#36018 already addressed the issue.

More information about how it used to fail, please refer to apache#15192 (comment).

### What changes are included in this PR?

Bring back the test code.

### Are these changes tested?

Yes, the change is the test.

### Are there any user-facing changes?

No.

* Closes: apache#15192

Authored-by: zanmato <zanmato1984@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants