Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grouping on arrays as arrays #12078

Merged
merged 42 commits into from
Jan 26, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
2265991
init multiValue column group by
cryptoe Dec 7, 2021
fe3c9d9
Changing sorting to Lexicographic as default
cryptoe Dec 7, 2021
d50cdfb
Adding initial tests
cryptoe Dec 15, 2021
ba28913
1.Fixing test cases adding
cryptoe Dec 17, 2021
6cd8b65
Linking SQL layer to native layer
cryptoe Dec 20, 2021
23a44a4
Adding multiDimension support to group by column strategy
cryptoe Dec 21, 2021
e44258b
Merge branch 'master' into group_by_arrays
cryptoe Dec 21, 2021
5b6fa68
1. Removing array coercion in Calcite layer
cryptoe Jan 5, 2022
3ca3aa0
1. Supporting all primitive array types
cryptoe Jan 6, 2022
80ff204
1. Supporting all primitive array types
cryptoe Jan 6, 2022
aeba7c5
1. Checkstyle things
cryptoe Jan 6, 2022
7bb3df0
Minor naming things
cryptoe Jan 6, 2022
a4b7954
Merge branch 'master' of github.com:apache/druid into group_by_arrays
cryptoe Jan 7, 2022
6e3f806
CheckStyle Things
cryptoe Jan 7, 2022
c6db303
Fixing test case
cryptoe Jan 7, 2022
b8fe780
Fixing hashing
cryptoe Jan 7, 2022
985fd38
1. Adding the MV function
cryptoe Jan 7, 2022
2d4f01d
1. Adding MV function test cases
cryptoe Jan 8, 2022
1e3129f
Adding Selector strategy function test cases
cryptoe Jan 9, 2022
6dc2961
Fixing ClientQuerySegmentWalkerTest
cryptoe Jan 9, 2022
df5210e
Adding GroupByQueryRunnerTest test cases
cryptoe Jan 10, 2022
607fd54
Fixing test cases
cryptoe Jan 10, 2022
79246e0
Adding few more test cases
cryptoe Jan 10, 2022
2eace6a
Fixing Exception asset statement and intellij inspection
cryptoe Jan 10, 2022
b8711ca
Adding null compatibility tests
cryptoe Jan 10, 2022
1ba5549
Review comments
cryptoe Jan 11, 2022
2d676ed
Fixing few failing tests
cryptoe Jan 11, 2022
ce87461
Merge branch 'master' of github.com:apache/druid into group_by_arrays
cryptoe Jan 12, 2022
40b877e
Fixing few failing tests
cryptoe Jan 13, 2022
57dcbe1
Do no convert to topN Q incase of group by on array
cryptoe Jan 13, 2022
86fb48b
Fixing checkstyle
cryptoe Jan 13, 2022
8619964
Fixing differences between jdk's class cast exception message
cryptoe Jan 14, 2022
63081d3
1. Fixing ordering if the grouping key is an array
cryptoe Jan 14, 2022
97350a5
Fixing DefaultLimitSpec
cryptoe Jan 14, 2022
9b21150
Fixing CalciteArraysQueryTest
cryptoe Jan 15, 2022
ea5ba15
Merge branch 'master' of github.com:apache/druid into group_by_arrays
cryptoe Jan 17, 2022
1a6927f
Dummy commit for LGTM
cryptoe Jan 17, 2022
ec07ee7
changes:
clintropolis Jan 20, 2022
1156180
Review comments
cryptoe Jan 25, 2022
d00e0b2
Fixing test cases
cryptoe Jan 25, 2022
169f6e2
Fixing spot bugs
cryptoe Jan 25, 2022
e4e8d13
Fixing strict compile
cryptoe Jan 25, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 18 additions & 3 deletions core/src/main/java/org/apache/druid/math/expr/ExprEval.java
Original file line number Diff line number Diff line change
Expand Up @@ -405,11 +405,26 @@ public static ExprEval ofType(@Nullable ExpressionType type, @Nullable Object va
if (value instanceof List) {
return bestEffortOf(value);
}
return of((String) value);
if (value == null) {
return of(null);
}
return of(String.valueOf(value));
case LONG:
return ofLong((Number) value);
if (value instanceof Number) {
return ofLong((Number) value);
}
if (value instanceof String) {
return ofLong(ExprEval.computeNumber((String) value));
}
return ofLong(null);
case DOUBLE:
return ofDouble((Number) value);
if (value instanceof Number) {
return ofDouble((Number) value);
}
if (value instanceof String) {
return ofDouble(ExprEval.computeNumber((String) value));
}
return ofDouble(null);
case COMPLEX:
byte[] bytes = null;
if (value instanceof String) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,13 +48,13 @@ public class ExpressionProcessing
@VisibleForTesting
public static void initializeForTests(@Nullable Boolean allowNestedArrays)
{
INSTANCE = new ExpressionProcessingConfig(allowNestedArrays, null);
INSTANCE = new ExpressionProcessingConfig(allowNestedArrays, null, null);
}

@VisibleForTesting
public static void initializeForStrictBooleansTests(boolean useStrict)
{
INSTANCE = new ExpressionProcessingConfig(null, useStrict);
INSTANCE = new ExpressionProcessingConfig(null, useStrict, null);
}

/**
Expand All @@ -81,4 +81,16 @@ public static boolean useStrictBooleans()
}
return INSTANCE.isUseStrictBooleans();
}


public static boolean processArraysAsMultiValueStrings()
{
// this should only be null in a unit test context, in production this will be injected by the null handling module
if (INSTANCE == null) {
throw new IllegalStateException(
"ExpressionProcessing module not initialized, call ExpressionProcessing.initializeForTests()"
);
}
return INSTANCE.processArraysAsMultiValueStrings();
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -28,17 +28,24 @@ public class ExpressionProcessingConfig
{
public static final String NESTED_ARRAYS_CONFIG_STRING = "druid.expressions.allowNestedArrays";
public static final String NULL_HANDLING_LEGACY_LOGICAL_OPS_STRING = "druid.expressions.useStrictBooleans";
// Coerce arrays to multi value strings
public static final String
PROCESS_ARRAYS_AS_MULTIVALUE_STRINGS_CONFIG_STRING = "druid.expressions.processArraysAsMultiValueStrings";

@JsonProperty("allowNestedArrays")
private final boolean allowNestedArrays;

@JsonProperty("useStrictBooleans")
private final boolean useStrictBooleans;

@JsonProperty("processArraysAsMultiValueStrings")
private final boolean processArraysAsMultiValueStrings;

@JsonCreator
public ExpressionProcessingConfig(
@JsonProperty("allowNestedArrays") @Nullable Boolean allowNestedArrays,
@JsonProperty("useStrictBooleans") @Nullable Boolean useStrictBooleans
@JsonProperty("useStrictBooleans") @Nullable Boolean useStrictBooleans,
@JsonProperty("processArraysAsMultiValueStrings") @Nullable Boolean processArraysAsMultiValueStrings
)
{
this.allowNestedArrays = allowNestedArrays == null
Expand All @@ -51,6 +58,10 @@ public ExpressionProcessingConfig(
} else {
this.useStrictBooleans = useStrictBooleans;
}
this.processArraysAsMultiValueStrings
= processArraysAsMultiValueStrings == null
? Boolean.valueOf(System.getProperty(PROCESS_ARRAYS_AS_MULTIVALUE_STRINGS_CONFIG_STRING, "false"))
: processArraysAsMultiValueStrings;
}

public boolean allowNestedArrays()
Expand All @@ -62,4 +73,9 @@ public boolean isUseStrictBooleans()
{
return useStrictBooleans;
}

public boolean processArraysAsMultiValueStrings()
{
return processArraysAsMultiValueStrings;
}
}
62 changes: 62 additions & 0 deletions core/src/main/java/org/apache/druid/math/expr/Function.java
Original file line number Diff line number Diff line change
Expand Up @@ -2976,6 +2976,67 @@ public ExpressionType getOutputType(Expr.InputBindingInspector inspector, List<E
}
}

class MVToArrayFunction implements Function
{
@Override
public String name()
{
return "mv_to_array";
}

@Override
public ExprEval apply(List<Expr> args, Expr.ObjectBinding bindings)
{
return args.get(0).eval(bindings).castTo(ExpressionType.STRING_ARRAY);
}

@Override
public void validateArguments(List<Expr> args)
{
if (args.size() != 1) {
throw new IAE("Function[%s] needs exactly 1 argument of type String", name());
}
IdentifierExpr expr = args.get(0).getIdentifierExprIfIdentifierExpr();

if (expr == null) {
throw new IAE(
"Arg %s should be an identifier expression ie refer to columns directaly. Use array() instead",
args.get(0).toString()
);
}
}

@Nullable
@Override
public ExpressionType getOutputType(Expr.InputBindingInspector inspector, List<Expr> args)
{
return ExpressionType.STRING_ARRAY;
}

@Override
public boolean hasArrayInputs()
{
return true;
}

@Override
public boolean hasArrayOutput()
{
return true;
}

@Override
public Set<Expr> getScalarInputs(List<Expr> args)
{
return Collections.emptySet();
}

@Override
public Set<Expr> getArrayInputs(List<Expr> args)
{
return ImmutableSet.copyOf(args);
}
}
class ArrayConstructorFunction implements Function
{
@Override
Expand All @@ -2993,6 +3054,7 @@ public ExprEval apply(List<Expr> args, Expr.ObjectBinding bindings)
Object[] out = new Object[length];

ExpressionType arrayType = null;

for (int i = 0; i < length; i++) {
ExprEval<?> evaluated = args.get(i).eval(bindings);
arrayType = setArrayOutput(arrayType, out, i, evaluated);
Expand Down
39 changes: 39 additions & 0 deletions core/src/test/java/org/apache/druid/math/expr/FunctionTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -934,6 +934,45 @@ public void testComplexDecodeBaseArg0Unknown()
);
}

@Test
public void testMVToArrayWithValidInputs()
{
assertArrayExpr("mv_to_array(x)", new String[]{"foo"});
assertArrayExpr("mv_to_array(a)", new String[]{"foo", "bar", "baz", "foobar"});
}

@Test
public void testMVToArrayWithConstantLiteral()
{
expectedException.expect(IAE.class);
expectedException.expectMessage("should be an identifier expression");
assertArrayExpr("mv_to_array('1')", null);
}

@Test
public void testMVToArrayWithFunction()
{
expectedException.expect(IAE.class);
expectedException.expectMessage("should be an identifier expression");
assertArrayExpr("mv_to_array(repeat('hello', 2))", null);
}

@Test
public void testMVToArrayWithMoreArgs()
{
expectedException.expect(IAE.class);
expectedException.expectMessage("needs exactly 1 argument of type String");
assertArrayExpr("mv_to_array(x,y)", null);
}

@Test
public void testMVToArrayWithNoArgs()
{
expectedException.expect(IAE.class);
expectedException.expectMessage("needs exactly 1 argument of type String");
assertArrayExpr("mv_to_array()", null);
}

private void assertExpr(final String expression, @Nullable final Object expectedResult)
{
final Expr expr = Parser.parse(expression, ExprMacroTable.nil());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -486,22 +486,18 @@ private class FactorizePlan

FactorizePlan(ColumnSelectorFactory metricFactory)
{
final List<String> columns;

if (fields != null) {
// if fields are set, we are accumulating from raw inputs, use fold expression
plan = ExpressionPlanner.plan(inspectorWithAccumulator(metricFactory), foldExpression.get());
seed = initialValue.get();
columns = plan.getAnalysis().getRequiredBindingsList();
} else {
// else we are merging intermediary results, use combine expression
plan = ExpressionPlanner.plan(inspectorWithAccumulator(metricFactory), combineExpression.get());
seed = initialCombineValue.get();
columns = plan.getAnalysis().getRequiredBindingsList();
}

bindings = new ExpressionLambdaAggregatorInputBindings(
ExpressionSelectors.createBindings(metricFactory, columns),
ExpressionSelectors.createBindings(metricFactory, plan),
accumulatorId,
seed
);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,8 @@
import org.apache.druid.segment.column.ColumnHolder;
import org.apache.druid.segment.column.ColumnType;
import org.apache.druid.segment.column.RowSignature;
import org.apache.druid.segment.data.ComparableList;
import org.apache.druid.segment.data.ComparableStringArray;
import org.joda.time.DateTime;
import org.joda.time.Interval;

Expand Down Expand Up @@ -775,6 +777,15 @@ private static int compareDimsForLimitPushDown(
} else {
dimCompare = comparator.compare(String.valueOf(lhsObj), String.valueOf(rhsObj));
}
} else if (dimensionType.equals(ColumnType.STRING_ARRAY)) {
final ComparableStringArray lhsArr = DimensionHandlerUtils.convertToComparableStringArray(lhsObj);
final ComparableStringArray rhsArr = DimensionHandlerUtils.convertToComparableStringArray(rhsObj);
dimCompare = Comparators.<Comparable>naturalNullsFirst().compare(lhsArr, rhsArr);
} else if (dimensionType.equals(ColumnType.LONG_ARRAY)
|| dimensionType.equals(ColumnType.DOUBLE_ARRAY)) {
final ComparableList lhsArr = DimensionHandlerUtils.convertToList(lhsObj);
final ComparableList rhsArr = DimensionHandlerUtils.convertToList(rhsObj);
dimCompare = Comparators.<Comparable>naturalNullsFirst().compare(lhsArr, rhsArr);
} else {
dimCompare = comparator.compare((String) lhsObj, (String) rhsObj);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,9 @@
import org.apache.druid.query.groupby.GroupByQuery;
import org.apache.druid.query.groupby.GroupByQueryConfig;
import org.apache.druid.query.groupby.ResultRow;
import org.apache.druid.query.groupby.epinephelinae.column.ArrayDoubleGroupByColumnSelectorStrategy;
import org.apache.druid.query.groupby.epinephelinae.column.ArrayLongGroupByColumnSelectorStrategy;
import org.apache.druid.query.groupby.epinephelinae.column.ArrayStringGroupByColumnSelectorStrategy;
import org.apache.druid.query.groupby.epinephelinae.column.DictionaryBuildingStringGroupByColumnSelectorStrategy;
import org.apache.druid.query.groupby.epinephelinae.column.DoubleGroupByColumnSelectorStrategy;
import org.apache.druid.query.groupby.epinephelinae.column.FloatGroupByColumnSelectorStrategy;
Expand Down Expand Up @@ -233,7 +236,7 @@ public GroupByEngineIterator make()
processingBuffer,
fudgeTimestamp,
dims,
isAllSingleValueDims(columnSelectorFactory, query.getDimensions()),
hasNoExplodingDimensions(columnSelectorFactory, query.getDimensions()),
cardinalityForArrayAggregation
);
} else {
Expand All @@ -244,7 +247,7 @@ public GroupByEngineIterator make()
processingBuffer,
fudgeTimestamp,
dims,
isAllSingleValueDims(columnSelectorFactory, query.getDimensions())
hasNoExplodingDimensions(columnSelectorFactory, query.getDimensions())
);
}
}
Expand Down Expand Up @@ -290,6 +293,11 @@ public static int getCardinalityForArrayAggregation(
if (query.getVirtualColumns().exists(Iterables.getOnlyElement(dimensions).getDimension())) {
return -1;
}
// We cannot support array-based aggregation on array based grouping as we we donot have all the indexes up front
// to allocate appropriate values
if (dimensions.get(0).getOutputType().equals(ColumnType.STRING_ARRAY)) {
return -1;
}

final String columnName = Iterables.getOnlyElement(dimensions).getDimension();
columnCapabilities = storageAdapter.getColumnCapabilities(columnName);
Expand Down Expand Up @@ -319,11 +327,12 @@ public static int getCardinalityForArrayAggregation(
}

/**
* Checks whether all "dimensions" are either single-valued, or if allowed, nonexistent. Since non-existent column
* selectors will show up as full of nulls they are effectively single valued, however they can also be null during
* broker merge, for example with an 'inline' datasource subquery.
* Checks whether all "dimensions" are either single-valued,
* or STRING_ARRAY, in case we don't want to explode the underline multi value column,
* or if allowed, nonexistent. Since non-existent columnselectors will show up as full of nulls they are effectively
* single valued, however they can also be null during broker merge, for example with an 'inline' datasource subquery.
*/
public static boolean isAllSingleValueDims(
public static boolean hasNoExplodingDimensions(
final ColumnInspector inspector,
final List<DimensionSpec> dimensions
)
Expand All @@ -340,7 +349,8 @@ public static boolean isAllSingleValueDims(

// Now check column capabilities, which must be present and explicitly not multi-valued
final ColumnCapabilities columnCapabilities = inspector.getColumnCapabilities(dimension.getDimension());
return columnCapabilities != null && columnCapabilities.hasMultipleValues().isFalse();
return dimension.getOutputType().equals(ColumnType.STRING_ARRAY)
cryptoe marked this conversation as resolved.
Show resolved Hide resolved
|| (columnCapabilities != null && columnCapabilities.hasMultipleValues().isFalse());
});
}

Expand Down Expand Up @@ -403,6 +413,20 @@ public GroupByColumnSelectorStrategy makeColumnSelectorStrategy(
return makeNullableNumericStrategy(new FloatGroupByColumnSelectorStrategy());
case DOUBLE:
return makeNullableNumericStrategy(new DoubleGroupByColumnSelectorStrategy());
case ARRAY:
switch (capabilities.getElementType().getType()) {
case LONG:
return new ArrayLongGroupByColumnSelectorStrategy();
case STRING:
return new ArrayStringGroupByColumnSelectorStrategy();
case DOUBLE:
return new ArrayDoubleGroupByColumnSelectorStrategy();
case FLOAT:
// Array<Float> not supported in expressions, ingestion
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to self, double check this is actually true at this layer (if it is not, it might be possibly handled with the Double strategy). While definitely true that FLOAT doesn't exist in expressions, and so within expressions there exists no float array, this type might still be specified by the SQL planner, whenever some float column is added into an array for example. I'm unsure if the expression selectors column capabilities would report ARRAY or ARRAY as the type of the virtual column, i know it coerces DOUBLE back to FLOAT when the planner requests FLOAT types, but don't think it does the same thing for ARRAY so, this is probably true.

default:
throw new IAE("Cannot create query type helper from invalid type [%s]", capabilities.asTypeString());

}
default:
throw new IAE("Cannot create query type helper from invalid type [%s]", capabilities.asTypeString());
}
Expand Down
Loading