-
Notifications
You must be signed in to change notification settings - Fork 1
Description
A checklist of items that I plan to tackle. Please feel free to edit and open issues related to a certain topic
udfs to support
-
variant_to_json- returns a JSON string from aVariantArray -
json_to_variant- returns aVariantArrayfrom a JSON string -
cast_to_variant- returns aVariantArrayfrom a column -
variant_get(VariantArray, path)- returns the extracted type dictated by thepathfrom theVariantArray -
is_variant_null(VariantArray)- tests whether elements inVariantArrayareVariant::Null -
variant_pretty(VariantArray)- returns a human-readable version ofVariantArray -
variant_schema(VariantArray)- returns the schema of a VariantArray -
variant_object_construct(key1, value1, [keyN, valueN]) -
variant_object_delete(VariantArray, VariantPath) -
variant_object_insert(key, value) -
variant_list_construct(value1, [valueN]) -
variant_list_insert(VariantArray, value) -
variant_explode -
variant_explode_outer
Databricks supports the following variant-related udfs: https://docs.databricks.com/gcp/en/sql/language-manual/sql-ref-functions-builtin#variant-functions. I think it would be very cool to achieve 1:1 parity with their functionality
misc
- Add a
examples/directory that lists examples for every udf we support, we can make use of a sample JSONL dataset like: https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page - Have integration tests, preferably using slt tests
- Add more unit tests
- Write documentation
- Write the README
- Have better error messages, especially when instructing users which arguments to pass Improve error messaging #14
- Pick a license Pick licensing #15
questions
It would also be good to compile a list of questions/ideas/limitations that arise from interfacing with arrow's parquet-variant libraries:
-
Why does
VariantArraynot implementIntoIter? -
variant_to_jsonshould accept an optional format configurations. Currently the library dictates how to map specialized Variant types to JSON (e.g. timestamps are always formatted as a string) -
There is a lot of ceremony to go from a
Variantto aVariantArray. MaybeVariantArrayshouldimpl From<IntoIterator<Item = Variant>>? -
When checking if 2
VariantArrays are equal, it is a bit odd that it will panic when calling.value(i)when the ith position has empty metadata and value? -
Why doesn't
VariantArrayimplPartialEq? -
Is there a reason why
VariantArraydoesn't implArray? Seems this would be super nice