Description
There are a couple of inter-related issues:
-
Cases where a system might send the schema without the dictionaries, and the user wishes to reason about the schema and its types without knowing the dictionary values
-
Dictionaries that are changing, e.g. using delta dictionary messages
arrow::DictionaryType
has no "linkage" to any external object. I propose adding a "LinkedDictionaryType" or something similar (purely a C++ construct), which functionally would be a subclass ofDictionaryType
, which would allow a type to be created which will obtain its dictionary later through some kind of "Dictionary provider" interface. There is something similar in Java already. This would allow a dictionary to evolve via delta dictionaries, or for a dictionary to be retrieved later e.g. through an RPC or IPC layer
Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm
Related issues:
- Integration tests for Fixed Size List type (blocks)
- [R] Follow DictionaryType/DictionaryArray changes from ARROW-3144 (causes)
- [Format][Integration] Define how to test for delta dictionary support in the JSON integration test data format (is related to)
- [C++] Support reading delta dictionaries in IPC streams (is depended upon by)
- [Python] Support reading Parquet binary/string columns directly as DictionaryArray (is depended upon by)
- [C++] Implement arrow::Concatenate for dictionary-encoded arrays with unequal dictionaries (is depended upon by)
PRs and other links:
Note: This issue was originally created as ARROW-3144. Please see the migration documentation for further details.