Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add equality, null, and range filter #14542

Merged
merged 47 commits into from
Jul 18, 2023

Conversation

clintropolis
Copy link
Member

@clintropolis clintropolis commented Jul 7, 2023

Description

This PR introduces some new filter types which aim to be both more SQL compliant and also work better with various types by accepting match values in types other than String. The new filters are EqualityFilter, NullFilter, and RangeFilter, which serve as replacements for SelectorDimFilter and BoundDimFilter. Effectively SelectorDimFilter is being split into NullFilter and EqualityFilter, and BoundDimFilter is superceded by RangeFilter.

Probably the most notable change in behavior is that the EqualityFilter and RangeFilter will never match null values.

Compare the current behavior of the bound filter:
Screenshot 2023-07-05 at 6 13 14 PM

to the behavior of the new range filter:

Screenshot 2023-07-05 at 6 14 20 PM

Use of these new filters is currently tied by default to the value of SQL compatibility mode, druid.generic.useDefaultValueForNull=false, but is controlled separately through a planner/query context parameter sqlUseBoundAndSelectors, which when true will use the classic filters, but when false will plan into the new filters.

Because the new filters accept all types rather than just strings, it also means that we can now support filtering equality and ranges on ARRAY columns
Screenshot 2023-07-05 at 5 49 32 PM

Screenshot 2023-07-11 at 1 59 21 PM

The null filter also supports proper object predicate matching, so IS NULL/IS NOT NULL work correctly on COMPLEX<json> columns (and any other complex type):
Screenshot 2023-07-11 at 1 44 33 PM

Screenshot 2023-07-11 at 1 44 40 PM

Equality filter

A lot like the SelectorDimFilter, but it will never match null. It also implements array matching, so it can be used for array equality as well.

parameters description required
type must be "equals" yes
column column (or virtual column) name to filter for null values yes
matchValue Value to match, must not be null yes
matchValueType type of the match value, e.g. "STRING", "LONG", "FLOAT", "DOUBLE", "ARRAY<STRING>", "ARRAY<LONG>", etc. yes
filterTuning optional filter tuning to control if filtering will use bitmap indexes no

I did add special handling for COMPLEX<json>, because it is more of an unofficial built-in type than a standard complex, but I can imagine in the future we allow complex types to opt into providing a customized equality check so that they could be supported here as well, but currently at least anything that can be checked with Objects.equals should work correctly.

Screenshot 2023-07-11 at 1 51 11 PM
Screenshot 2023-07-11 at 2 22 32 PM

I haven't really added any tests for json matching since i just snuck it in on the side, so it is totally possible there might be cases where Objects.equals doesn't match things that are equivalent otherwise.

Null filter

The null filter is a dedicated filter for matching NULL, and when enabled in the planner will be used for all IS NULL and (combined with a NotDimFilter) IS NOT NULL filtering, as well as any implicit null/not null. It is separated from the EqualityFilter for future work of making it easier to fix some additional SQL compliance bugs involving filter negation, and is more consistent with SQL behavior.

parameters description required
type must be "null" yes
column column (or virtual column) name to filter for null values yes
filterTuning optional filter tuning to control if filtering will use bitmap indexes no

Range filter

A lot like the bound filter, but also never matches null. Instead of accepting a comparator argument, it instead takes a matchValueType, which implies a certain comparator. For range filters on string columns, if the match value type is numeric we use the numeric string comparator, otherwise the lexicographic. For range filters on numeric and array columns, the match value is cast to the columns numeric processor type

parameters description required
type must be "range" yes
column column (or virtual column) name to filter for null values yes
matchValueType type of the match value, e.g. "STRING", "LONG", "FLOAT", "DOUBLE", etc. yes
lower lower bound value, or null if there is no lower bound no (however, at least one of lower or upper must be specified)
upper upper bound value, or null if there is no upper bound no (however, at least one of lower or upper must be specified)
lowerOpen if the value must be strictly greater than the lower bound, or if it may also be equal no, defaults to false
upperOpen if the value must be strictly less than the upper bound, or it may also be equal no, defaults to false
filterTuning optional filter tuning to control if filtering will use bitmap indexes no

Other changes

  • Removed isFilterable from ColumnCapabilities. All types of columns should have the chance to participate in filtering without having to set this flag. This mechanism was wired into a short-circuit that would always use a 'nil' column selector when filtering on columns whose isFilterable was set to false.
  • Fixed compaction for ARRAY typed columns (by side effect sort of from making column processor factory interface have array handling), added test

Release note


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

changes:
* new filters that preserve match value typing to better handle filtering different column types
* sql planner uses new filters by default in sql compatible null handling mode
* remove isFilterable from column capabilities
* proper handling of array filtering, add array processor to column processors
@cryptoe cryptoe added this to the 27.0 milestone Jul 7, 2023
@@ -223,10 +224,13 @@
)
{
final int dimIndex = desc.getIndex();
if (fieldIndexers.size() == 0 && isConstant && !hasNestedData) {
return DimensionSelector.constant(null, spec.getExtractionFn());

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation

Invoking [DimensionSpec.getExtractionFn](1) should be avoided because it has been deprecated.
Copy link
Contributor

@imply-cheddar imply-cheddar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked through the updates. While there are still things that I would change in here, the only thing that I feel strongly about that I'd kinda rather not be committed if it doesn't have to be is the inclusion of ColumnCapabilities where the TypeStrategy is enough.

Specifically, there were various places where it has a good type strategy, but has to create a ColumnCapabilities object in order to pass it through, the column capabilities that it creates doesn't indicate that it has any indexes or anything, which might be correct and might not, but the API is asking the code to do extra stuff than what is actually needed and I worry about either

  1. The current implementation continuing, us coming up with a reason why we want to use the other methods and then be sad that you cannot use the methods because various code paths just create fake objects
  2. We never need more than just the type and end up with a proliferation of ColumnCapabilitiesImpl.createDefault().setType() existing where all that was needed was the type.

I'd really rather that we fix/minimize that API to begin with.

I see that you removed the dummy instances, I overlooked that...

@clintropolis clintropolis merged commit 913416c into apache:master Jul 18, 2023
@clintropolis clintropolis deleted the nicer-filters branch July 18, 2023 19:15
clintropolis added a commit to clintropolis/druid that referenced this pull request Jul 18, 2023
changes:
* new filters that preserve match value typing to better handle filtering different column types
* sql planner uses new filters by default in sql compatible null handling mode
* remove isFilterable from column capabilities
* proper handling of array filtering, add array processor to column processors
* javadoc for sql test filter functions
* range filter support for arrays, tons more tests, fixes
* add dimension selector tests for mixed type roots
* support json equality
* rename semantic index maker thingys to mostly have plural names since they typically make many indexes, e.g. StringValueSetIndex -> StringValueSetIndexes
* add cooler equality index maker, ValueIndexes 
* fix missing string utf8 index supplier
* expression array comparator stuff
abhishekagarwal87 pushed a commit that referenced this pull request Jul 19, 2023
sergioferragut pushed a commit to sergioferragut/druid that referenced this pull request Jul 21, 2023
changes:
* new filters that preserve match value typing to better handle filtering different column types
* sql planner uses new filters by default in sql compatible null handling mode
* remove isFilterable from column capabilities
* proper handling of array filtering, add array processor to column processors
* javadoc for sql test filter functions
* range filter support for arrays, tons more tests, fixes
* add dimension selector tests for mixed type roots
* support json equality
* rename semantic index maker thingys to mostly have plural names since they typically make many indexes, e.g. StringValueSetIndex -> StringValueSetIndexes
* add cooler equality index maker, ValueIndexes 
* fix missing string utf8 index supplier
* expression array comparator stuff
abhishekagarwal87 pushed a commit that referenced this pull request Feb 7, 2024
… array elements (#15855)

Fixes an oversight after #14542 that happens in the SQL planner rewrite of MV_CONTAINS and MV_OVERLAP when faced with array elements that are NULL, where we were incorrectly using EqualityFilter instead of NullFilter for null elements (EqualityFilter does not accept null elements).
LakshSingla pushed a commit to LakshSingla/druid that referenced this pull request Feb 8, 2024
… array elements (apache#15855)

Fixes an oversight after apache#14542 that happens in the SQL planner rewrite of MV_CONTAINS and MV_OVERLAP when faced with array elements that are NULL, where we were incorrectly using EqualityFilter instead of NullFilter for null elements (EqualityFilter does not accept null elements).
abhishekagarwal87 pushed a commit that referenced this pull request Feb 8, 2024
… array elements (#15855) (#15865)

Fixes an oversight after #14542 that happens in the SQL planner rewrite of MV_CONTAINS and MV_OVERLAP when faced with array elements that are NULL, where we were incorrectly using EqualityFilter instead of NullFilter for null elements (EqualityFilter does not accept null elements).

Co-authored-by: Clint Wylie <cwylie@apache.org>
@clintropolis clintropolis mentioned this pull request Mar 12, 2024
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants