Skip to content

[C++] DictionaryBuilder::InsertMemoValues should not deduplicate values #47151

@kdkavanagh

Description

@kdkavanagh

Describe the bug, including details regarding any error messages, version, and platform.

Per #47134, duplicate values are tolerable in dictionaries, however the C++ API currently deduplicates memo values even when they are explicitly inserted by the user.

I believe this is the offending line: https://github.com/apache/arrow/blob/main/cpp/src/arrow/array/builder_dict.cc#L85

This becomes an issue when the user thinks they insert N values, but the dictionary ends up containing << N dictionary values. Since the user explicitly called InsertMemoValues, they might proceed by directly calling AppendIndices in the future - unaware that the dictionary contents do not directly map to the indices the user is appending

Component(s)

C++

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions