-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add equality, null, and range filter #14542
Conversation
changes: * new filters that preserve match value typing to better handle filtering different column types * sql planner uses new filters by default in sql compatible null handling mode * remove isFilterable from column capabilities * proper handling of array filtering, add array processor to column processors
processing/src/main/java/org/apache/druid/query/filter/EqualityFilter.java
Fixed
Show fixed
Hide fixed
...filter/src/test/java/org/apache/druid/query/aggregation/bloom/BloomFilterAggregatorTest.java
Fixed
Show fixed
Hide fixed
@@ -223,10 +224,13 @@ | |||
) | |||
{ | |||
final int dimIndex = desc.getIndex(); | |||
if (fieldIndexers.size() == 0 && isConstant && !hasNestedData) { | |||
return DimensionSelector.constant(null, spec.getExtractionFn()); |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation
processing/src/main/java/org/apache/druid/segment/AutoTypeColumnIndexer.java
Fixed
Show fixed
Hide fixed
processing/src/main/java/org/apache/druid/segment/AutoTypeColumnIndexer.java
Fixed
Show fixed
Hide fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've looked through the updates. While there are still things that I would change in here, the only thing that I feel strongly about that I'd kinda rather not be committed if it doesn't have to be is the inclusion of ColumnCapabilities where the TypeStrategy is enough.
Specifically, there were various places where it has a good type strategy, but has to create a ColumnCapabilities object in order to pass it through, the column capabilities that it creates doesn't indicate that it has any indexes or anything, which might be correct and might not, but the API is asking the code to do extra stuff than what is actually needed and I worry about either
The current implementation continuing, us coming up with a reason why we want to use the other methods and then be sad that you cannot use the methods because various code paths just create fake objectsWe never need more than just the type and end up with a proliferation ofColumnCapabilitiesImpl.createDefault().setType()
existing where all that was needed was the type.
I'd really rather that we fix/minimize that API to begin with.
I see that you removed the dummy instances, I overlooked that...
changes: * new filters that preserve match value typing to better handle filtering different column types * sql planner uses new filters by default in sql compatible null handling mode * remove isFilterable from column capabilities * proper handling of array filtering, add array processor to column processors * javadoc for sql test filter functions * range filter support for arrays, tons more tests, fixes * add dimension selector tests for mixed type roots * support json equality * rename semantic index maker thingys to mostly have plural names since they typically make many indexes, e.g. StringValueSetIndex -> StringValueSetIndexes * add cooler equality index maker, ValueIndexes * fix missing string utf8 index supplier * expression array comparator stuff
changes: * new filters that preserve match value typing to better handle filtering different column types * sql planner uses new filters by default in sql compatible null handling mode * remove isFilterable from column capabilities * proper handling of array filtering, add array processor to column processors * javadoc for sql test filter functions * range filter support for arrays, tons more tests, fixes * add dimension selector tests for mixed type roots * support json equality * rename semantic index maker thingys to mostly have plural names since they typically make many indexes, e.g. StringValueSetIndex -> StringValueSetIndexes * add cooler equality index maker, ValueIndexes * fix missing string utf8 index supplier * expression array comparator stuff
… array elements (#15855) Fixes an oversight after #14542 that happens in the SQL planner rewrite of MV_CONTAINS and MV_OVERLAP when faced with array elements that are NULL, where we were incorrectly using EqualityFilter instead of NullFilter for null elements (EqualityFilter does not accept null elements).
… array elements (apache#15855) Fixes an oversight after apache#14542 that happens in the SQL planner rewrite of MV_CONTAINS and MV_OVERLAP when faced with array elements that are NULL, where we were incorrectly using EqualityFilter instead of NullFilter for null elements (EqualityFilter does not accept null elements).
… array elements (#15855) (#15865) Fixes an oversight after #14542 that happens in the SQL planner rewrite of MV_CONTAINS and MV_OVERLAP when faced with array elements that are NULL, where we were incorrectly using EqualityFilter instead of NullFilter for null elements (EqualityFilter does not accept null elements). Co-authored-by: Clint Wylie <cwylie@apache.org>
Description
This PR introduces some new filter types which aim to be both more SQL compliant and also work better with various types by accepting match values in types other than String. The new filters are
EqualityFilter
,NullFilter
, andRangeFilter
, which serve as replacements forSelectorDimFilter
andBoundDimFilter
. EffectivelySelectorDimFilter
is being split intoNullFilter
andEqualityFilter
, andBoundDimFilter
is superceded byRangeFilter
.Probably the most notable change in behavior is that the
EqualityFilter
andRangeFilter
will never match null values.Compare the current behavior of the bound filter:
to the behavior of the new range filter:
Use of these new filters is currently tied by default to the value of SQL compatibility mode,
druid.generic.useDefaultValueForNull=false
, but is controlled separately through a planner/query context parametersqlUseBoundAndSelectors
, which whentrue
will use the classic filters, but whenfalse
will plan into the new filters.Because the new filters accept all types rather than just strings, it also means that we can now support filtering equality and ranges on ARRAY columns
The null filter also supports proper object predicate matching, so
IS NULL
/IS NOT NULL
work correctly onCOMPLEX<json>
columns (and any other complex type):Equality filter
A lot like the
SelectorDimFilter
, but it will never match null. It also implements array matching, so it can be used for array equality as well.type
"equals"
column
matchValue
matchValueType
"STRING"
,"LONG"
,"FLOAT"
,"DOUBLE"
,"ARRAY<STRING>"
,"ARRAY<LONG>"
, etc.filterTuning
I did add special handling for
COMPLEX<json>
, because it is more of an unofficial built-in type than a standard complex, but I can imagine in the future we allow complex types to opt into providing a customized equality check so that they could be supported here as well, but currently at least anything that can be checked with Objects.equals should work correctly.I haven't really added any tests for json matching since i just snuck it in on the side, so it is totally possible there might be cases where Objects.equals doesn't match things that are equivalent otherwise.
Null filter
The
null
filter is a dedicated filter for matchingNULL
, and when enabled in the planner will be used for allIS NULL
and (combined with aNotDimFilter
)IS NOT NULL
filtering, as well as any implicit null/not null. It is separated from theEqualityFilter
for future work of making it easier to fix some additional SQL compliance bugs involving filter negation, and is more consistent with SQL behavior.type
"null"
column
filterTuning
Range filter
A lot like the bound filter, but also never matches null. Instead of accepting a comparator argument, it instead takes a matchValueType, which implies a certain comparator. For range filters on string columns, if the match value type is numeric we use the numeric string comparator, otherwise the lexicographic. For range filters on numeric and array columns, the match value is cast to the columns numeric processor type
type
"range"
column
matchValueType
"STRING"
,"LONG"
,"FLOAT"
,"DOUBLE"
, etc.lower
upper
lowerOpen
upperOpen
filterTuning
Other changes
isFilterable
fromColumnCapabilities
. All types of columns should have the chance to participate in filtering without having to set this flag. This mechanism was wired into a short-circuit that would always use a 'nil' column selector when filtering on columns whose isFilterable was set to false.Release note
This PR has: