You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge?
We have a predicate like col LIKE <...> where col is a DictionaryArray. The filter ends up looking like this:
FilterExec: CAST(col@3 AS Utf8) LIKE '%CONST%' elapsed_compute=38.62343607s]
This is non ideal (in our case this requires many seconds of time) because it will copy each dictionary value column into a StringArray before it applies LIKE
However, the plan shows it is casting to Utf8 which is quite expensive:
> explain select column1 from dict_table where column1 LIKE'%oo%';
+---------------+------------------------------------------------------------+
| plan_type | plan |
+---------------+------------------------------------------------------------+
| logical_plan | Filter: CAST(dict_table.column1AS Utf8) LIKE Utf8("%oo%") |
| | TableScan: dict_table projection=[column1] |
| physical_plan | CoalesceBatchesExec: target_batch_size=8192 |
| | FilterExec: CAST(column1@0AS Utf8) LIKE %oo% | <--- NOTE THIS HAS A CAST
| | MemoryExec: partitions=1, partition_sizes=[1] |
| | |
+---------------+------------------------------------------------------------+2 row(s) fetched.
Elapsed 0.004 seconds.
Describe the solution you'd like
I would like LIKE to operate much faster on DictionaryArrays (without a cast)
Describe alternatives you've considered
What would likely be much faster is done is to evaluate LIKE on the dictionary values first to find any matching keys, and then look up the values
There is code in the arrow like kernels that appears to handle dictionary values, so this "just may work" if we update the coercion rules to avoid the cast.
However, we may also have to implement something like
// find all keys in the dictionarylet matching_values = like(dictionary_array.values(), scalar);// figure out which values in the dictionary match by callking takelet dict_indicies = dictionary_array.keys();// now, leave the overall result here let like_result = take(matching_values, dict_indices);
Additional context
Note potentially related to LIKE for StringViewArray -- see #11024
Is your feature request related to a problem or challenge?
We have a predicate like
col LIKE <...>
wherecol
is aDictionaryArray
. The filter ends up looking like this:This is non ideal (in our case this requires many seconds of time) because it will copy each dictionary value column into a StringArray before it applies
LIKE
Here is an self contained reproducer:
The query gets the correct anser:
However, the plan shows it is casting to Utf8 which is quite expensive:
Describe the solution you'd like
I would like LIKE to operate much faster on DictionaryArrays (without a cast)
Describe alternatives you've considered
What would likely be much faster is done is to evaluate
LIKE
on the dictionary values first to find any matching keys, and then look up the valuesThere is code in the arrow
like
kernels that appears to handle dictionary values, so this "just may work" if we update the coercion rules to avoid the cast.https://docs.rs/arrow-string/52.0.0/src/arrow_string/like.rs.html#122-126
However, we may also have to implement something like
Additional context
Note potentially related to
LIKE
for StringViewArray -- see #11024Also potentially related to #5222
The text was updated successfully, but these errors were encountered: