Enable ORC stripe prunning based on the DECIMAL predicates #5001

arhimondr · 2016-04-12T15:43:58Z

No description provided.

sopel39 · 2016-04-12T15:47:51Z

presto-orc/src/test/java/com/facebook/presto/orc/TestTupleDomainOrcPredicate.java

+                create(ValueSet.ofRanges(greaterThanOrEqual(LONG_DECIMAL, longDecimal("-1234567890.0987654321"))), true));
+    }
+
+    private static ColumnStatistics decimalColumnStats(Long numberOfValues, String minimum, String maximum)


move below dataColumnStats

sopel39 · 2016-04-12T15:52:09Z

lgtm

dain · 2016-04-23T20:18:42Z

presto-main/src/main/java/com/facebook/presto/type/DecimalSaturatedFloorCasts.java

+    public static long shortDecimalToBigint(long value, int sourceScale)
+    {
+        BigDecimal bigDecimal = new BigDecimal(BigInteger.valueOf(value), sourceScale);
+        return bigDecimal.setScale(0, FLOOR).longValueExact();


Is this really the most efficient way to do this?

Saturated floor cast is going to be use only in DomainTranslator to translate user-entered predicates. I don't think that good performance really matter here.

dain · 2016-04-23T20:28:19Z

The ORC bits look good, but @erichwang should review the changes to TupleDomain

arhimondr · 2016-04-24T07:38:45Z

Actually I have floor casts implemented for DOUBLE to DECIMAL, DOUBLE to INTEGER, BIGINT to INTEGER. I'm going to push them tomorrow.

All the variables which are involved in calculation must be explicitly declared. After Signature binding refactor there will be no such cases when we need to handle undeclared variables in TypeCalculation.

In order to avoid extra cast of both decimal arguments to the same type ADD,SUBSTRACT and DIVIDE operators must accept different decimal types, e.g. (DECIMAL(3,2), DECIMAL(5,1)). We are going to remove arguments coercions to the same type because it is not required by the all operators. For instance Multiply operator does not require casting, because multiply operator has the semantic that doesn't require re-scaling. Compulsory casting of decimal arguments to the same type will be removed in a very next commit that incudes "matchAndBind" algorithm refactoring.

Bind both Types and Literal variables in single traversal. Simplify function signature parameters binding process. Remove decimal parameters cast to the same type. Move all the signature binding related code to the single class. Before this patch signature variables binding has been made in 2 separate traversals. First we called `matchAndBind(declaredSignature, actualParameters)` to bind type parameters. Than we called `calculateLiteralParameters(declatedSignature, actualParameters)` to bind literal parameters. Similary 2 traversals has been made to bind calculated variables to the declared signatures. This commit simplifies binding process with introducing algorithm that requires only one single reqursive traversal for binding both variable types (type, literal). All related code for type variables matching and binding has been moved to the single `SingatureBinder` class.

As the signature parameters binging code has been moved to SignatureBinder, all the parameters binding tests moved to the TestSignatureBinder class. Assertions in TestSignatureBinder has been refactored in "assertj style". This resolves prestodb#4405

If more that one function is selected as applicable for some particular parameters, the latest one in the candidates list is selected for execution. Candidates list is built with reflection. `Class#getMethods` method doesn't guarrantee any fixed order, so the candidates list order is runtime specific. This commit introduces most specific function selection algorithm. This algorithm is suppose to ensure that selected function for some particular parameters will always be the same, and it will always be the most specific. The main algorithm idea is inspired by Java language specifications: https://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html#jls-15.12.2.5 However the actual implemented algorithm has a single substantial difference. Implemented algorithm in Presto is always trying to select applicable function that doesn't require any coercions for invokation, no matter what type of function it is - parametrized, variable arity or generic. The actual implemented algorithm is next: 1. Try to select function with explicitly declared parameters by direct match. 2. If more than one function selected than throw "ambigulous method invocation exception" 3. Try to select generic function without coercion 4. If more than one function selected than throw "ambigulous method invocation exception" 5. Select all applicable function with coercions 6. Select most specific function One function is more specific than another if the invocation handled by the first method could be passed on to the other one 7. If the most specific function can't be selected than throw "ambigulous method invocation exception"

As we have introduced the most specific function selection algorithm, that throws an error in the situation when the most specific method can't be selected, we must resolve the conflicts explicitly for the UNKNOWN type operators. For instance for the invokation `BIGINT + UKNNOWN` two operators are selected as applicable: - `ADD(BIGINT, BIGINT)` - `ADD(BIGINT, INTERVAL)` Before the `most specific function` algorithm was introduced the random operators from those two was considered for invokation. Most specific function selection algorithm can't resolve this conflict because the `BIGINT` and the `INTERVAL` types are not linked with the implicit coercion. For the algorithm theese two types are just 2 completely different types. But bassicaly the default sematic for UNKNOWN should be simple. `OPERATOR(T, UNKNOWN)` must return `T`. To keep this semantic and resolve the ambigulity it was decided to introduce explicit arithmetic and inequallity operators for the `UNKNOWN` type.

Resolve `length(varchar), length(varbinary)` ambiguous call for unknown type

Resolve `approx_set(varchar), approx_set(bigint)` ambigulous call for unknown type

+ Added StandardErrorCode.AMBIGUOUS_FUNCTION_CALL. + Ambiguous exception is thrown as PrestoException instead of IllegalStateException. + Ambiguous exception has consistent message for same input.

We need to test operators that return decimal not only in dedicated `DECIMAL` test classes.

Changes the algorithm to resolve functions as follows: 1. look for an exact match 2. find the most specific function after coercing arguments 3. If more than one exists, find the most specific function that only coerces null arguments. 4. If more than one exists, look for one whose parameters are of the same type. Also, only create UnknownOperators for argument types that would result in ambiguous functions.

arhimondr · 2016-04-26T17:11:46Z

In order to introduce DOUBLE to DECIMAL, DECIMAL to BIGINT and DECIMAL to INTEGER saturated casts i had to re-base this patch onto #4958. This patch can be based on master without those particular commits though.

erichwang · 2016-04-27T00:37:54Z

presto-main/src/main/java/com/facebook/presto/sql/planner/DomainTranslator.java

            if (value.isNull()) {
                return Optional.of(NullableValue.asNull(targetType));
            }
-            Object coercedValue = new FunctionInvoker(metadata.getFunctionRegistry())
+            if (!TypeRegistry.canCoerce(value.getType(), targetType)) {


I don't think we can move this coercion check after the null check. That would imply that the any null type can be coerced into any other type.

erichwang · 2016-04-27T00:51:10Z

presto-main/src/main/java/com/facebook/presto/sql/planner/DomainTranslator.java

+        private Optional<Signature> getSaturatedFloorCastOperator(Type fromType, Type toType)
+        {
+            if (!metadata.getFunctionRegistry().canResolveOperator(SATURATED_FLOOR_CAST, toType, ImmutableList.of(fromType))) {
+                return Optional.empty();


same comment as above

I have used map in every other places that you noticed, but it can't be used here, cause there are no method in FunctionRegistry that return Optional.

erichwang · 2016-04-27T02:17:06Z

@dain, I looked at the DomainTranslator bits and it looks good other than just a few minor comments there

DomainTranslator extracts the TupleDomain for the simple comparisons (>=, <=, =, <>, etc.). TupleDomain is further used by connectors to implement the partition pruning based on the query predicates. If the column is compared with the value of narrower type (column_double >= BIGINT '1'), the value can be easily coerced to the wider type (column_double >= DOUBLE '1.0'), and no workarounds are needed. If the column is compared with the value of wider type (column_integer >= DOUBLE '1.0'), the value must be rounded to the narrower type. Rounding algorithm for `column_integer >= const_double` comparation was already implemented. This commit is a generalization of that algoritm. Now in order to make some types pair elligable for rounding and domain translation the `SATURATED_FLOOR_CAST` operator must be registered for that types pair. Resolves: prestodb#5013

erichwang · 2016-04-28T22:34:49Z

@martint, dain looked at the ORC parts, and I cleared the TupleDomain stuffs

martint · 2016-05-10T00:00:39Z

@arhimondr, can we close this? I've merge the ORC-related bits (#5190), and the other commits are covered by a different PR, no?

arhimondr · 2016-05-10T08:17:38Z

@martint There are still 3 commits left unmerged.

Implement DECIMAL to BIGINT saturated floor cast
Implement DOUBLE to DECIMAL saturated floor cast
Implement DECIMAL to INTEGER saturated floor cast

Those commits depend on decimal coercions.
We can close this PR for now, and once decimal coercions merged i can reopen it.

facebook-github-bot added the CLA Signed label Apr 12, 2016

sopel39 reviewed Apr 12, 2016
View reviewed changes

arhimondr force-pushed the stripe-prunning-decimal-orc branch from 604bfff to dd4d29a Compare April 14, 2016 16:06

dain reviewed Apr 23, 2016
View reviewed changes

dain assigned erichwang Apr 23, 2016

Andrii Rosa and others added 18 commits April 25, 2016 16:30

Add hashcode operator for DECIMAL

053ef85

Remove unspecified values support from TypeCalculation

8639925

All the variables which are involved in calculation must be explicitly declared. After Signature binding refactor there will be no such cases when we need to handle undeclared variables in TypeCalculation.

Allow implicit conversions from BIGINT to DECIMAL

9856d36

Allow implicit conversions from DECIMAL to DOUBLE

6891f35

Resolve lenght(UNKNOWN) ambigulity

e6c1d05

Resolve `length(varchar), length(varbinary)` ambiguous call for unknown type

Resolve approx_set(UNKNOWN) ambigulity

1bd776d

Resolve `approx_set(varchar), approx_set(bigint)` ambigulous call for unknown type

Resolve Json functions ambigulities for UNKNOWN

949091b

Fix error message in filterOutLessSpecificFunctions

9fa81f7

+ Added StandardErrorCode.AMBIGUOUS_FUNCTION_CALL. + Ambiguous exception is thrown as PrestoException instead of IllegalStateException. + Ambiguous exception has consistent message for same input.

Remove AbstractTestDecimalFunctions

1eac827

We need to test operators that return decimal not only in dedicated `DECIMAL` test classes.

Implement Decimal to/from Json casts

b84fd03

Allow implicit conversions from INTEGER to DECIMAL

cfcc1b4

Support Decimal predicates in ORC stripe prunning

ea1d727

arhimondr force-pushed the stripe-prunning-decimal-orc branch from 4c795d2 to fb72094 Compare April 26, 2016 17:07

erichwang reviewed Apr 27, 2016
View reviewed changes

Andrii Rosa added 7 commits April 27, 2016 11:37

Implement DECIMAL to DECIMAL saturated floor cast

b9fb0a3

Implement DECIMAL to BIGINT saturated floor cast

f61c50a

Implement DOUBLE to DECIMAL saturated floor cast

7f78ce1

Implement DOUBLE to INTEGER saturated floor cast

4c9580d

Implement BIGINT to INTEGER saturated floor cast

13f1179

Implement DECIMAL to INTEGER saturated floor cast

f56b415

arhimondr force-pushed the stripe-prunning-decimal-orc branch from fb72094 to f56b415 Compare April 27, 2016 09:38

erichwang assigned martint and unassigned erichwang Apr 28, 2016

martint assigned arhimondr and unassigned martint May 10, 2016

cberner closed this May 10, 2016

arhimondr deleted the stripe-prunning-decimal-orc branch May 10, 2016 16:00

arhimondr restored the stripe-prunning-decimal-orc branch May 10, 2016 16:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable ORC stripe prunning based on the DECIMAL predicates #5001

Enable ORC stripe prunning based on the DECIMAL predicates #5001

arhimondr commented Apr 12, 2016

sopel39 Apr 12, 2016

sopel39 commented Apr 12, 2016

dain Apr 23, 2016

arhimondr Apr 24, 2016

dain commented Apr 23, 2016

arhimondr commented Apr 24, 2016

arhimondr commented Apr 26, 2016

erichwang Apr 27, 2016

erichwang Apr 27, 2016

arhimondr Apr 27, 2016

erichwang commented Apr 27, 2016

erichwang commented Apr 28, 2016

martint commented May 10, 2016 •

edited

Loading

arhimondr commented May 10, 2016

Enable ORC stripe prunning based on the DECIMAL predicates #5001

Enable ORC stripe prunning based on the DECIMAL predicates #5001

Conversation

arhimondr commented Apr 12, 2016

sopel39 Apr 12, 2016

Choose a reason for hiding this comment

sopel39 commented Apr 12, 2016

dain Apr 23, 2016

Choose a reason for hiding this comment

arhimondr Apr 24, 2016

Choose a reason for hiding this comment

dain commented Apr 23, 2016

arhimondr commented Apr 24, 2016

arhimondr commented Apr 26, 2016

erichwang Apr 27, 2016

Choose a reason for hiding this comment

erichwang Apr 27, 2016

Choose a reason for hiding this comment

arhimondr Apr 27, 2016

Choose a reason for hiding this comment

erichwang commented Apr 27, 2016

erichwang commented Apr 28, 2016

martint commented May 10, 2016 • edited Loading

arhimondr commented May 10, 2016

martint commented May 10, 2016 •

edited

Loading