Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable ORC stripe prunning based on the DECIMAL predicates #5001

Closed
wants to merge 25 commits into from

Conversation

arhimondr
Copy link
Member

No description provided.

create(ValueSet.ofRanges(greaterThanOrEqual(LONG_DECIMAL, longDecimal("-1234567890.0987654321"))), true));
}

private static ColumnStatistics decimalColumnStats(Long numberOfValues, String minimum, String maximum)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move below dataColumnStats

@sopel39
Copy link
Contributor

sopel39 commented Apr 12, 2016

lgtm

@arhimondr arhimondr force-pushed the stripe-prunning-decimal-orc branch from 604bfff to dd4d29a Compare April 14, 2016 16:06
public static long shortDecimalToBigint(long value, int sourceScale)
{
BigDecimal bigDecimal = new BigDecimal(BigInteger.valueOf(value), sourceScale);
return bigDecimal.setScale(0, FLOOR).longValueExact();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really the most efficient way to do this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saturated floor cast is going to be use only in DomainTranslator to translate user-entered predicates. I don't think that good performance really matter here.

@dain
Copy link
Contributor

dain commented Apr 23, 2016

The ORC bits look good, but @erichwang should review the changes to TupleDomain

@arhimondr
Copy link
Member Author

Actually I have floor casts implemented for DOUBLE to DECIMAL, DOUBLE to INTEGER, BIGINT to INTEGER. I'm going to push them tomorrow.

Andrii Rosa and others added 18 commits April 25, 2016 16:30
All the variables which are involved in calculation must be explicitly declared.
After Signature binding refactor there will be no such cases when we need to handle undeclared variables in TypeCalculation.
In order to avoid extra cast of both decimal arguments to the same type
ADD,SUBSTRACT and DIVIDE operators must accept different decimal types,
e.g. (DECIMAL(3,2), DECIMAL(5,1)).

We are going to remove arguments coercions to the same type because
it is not required by the all operators. For instance Multiply operator
does not require casting, because multiply operator has the semantic
that doesn't require re-scaling.

Compulsory casting of decimal arguments to the same type will be removed
in a very next commit that incudes "matchAndBind" algorithm refactoring.
Bind both Types and Literal variables in single traversal. Simplify
function signature parameters binding process. Remove decimal
parameters cast to the same type. Move all the signature binding
related code to the single class.

Before this patch signature variables binding has been made
in 2 separate traversals. First we called
`matchAndBind(declaredSignature, actualParameters)` to bind type parameters.
Than we called `calculateLiteralParameters(declatedSignature, actualParameters)`
to bind literal parameters. Similary 2 traversals has been made
to bind calculated variables to the declared signatures.

This commit simplifies binding process with introducing algorithm
that requires only one single reqursive traversal for binding both
variable types (type, literal). All related code for type variables
matching and binding has been moved to the single `SingatureBinder` class.
As the signature parameters binging code has been moved to SignatureBinder,
all the parameters binding tests moved to the TestSignatureBinder class.

Assertions in TestSignatureBinder has been refactored in "assertj style".

This resolves prestodb#4405
If more that one function is selected as applicable
for some particular parameters, the latest one in the candidates list
is selected for execution. Candidates list is built with reflection.
`Class#getMethods` method doesn't guarrantee any fixed
order, so the candidates list order is runtime specific.

This commit introduces most specific function selection algorithm.
This algorithm is suppose to ensure that selected function for some particular
parameters will always be the same, and it will always be the most specific.

The main algorithm idea is inspired by Java language specifications:
https://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html#jls-15.12.2.5

However the actual implemented algorithm has a single substantial difference.
Implemented algorithm in Presto is always trying to select applicable function
that doesn't require any coercions for invokation, no matter what type of function
it is - parametrized, variable arity or generic.

The actual implemented algorithm is next:

1.	Try to select function with explicitly declared parameters by direct match.
2. 	If more than one function selected than throw "ambigulous method invocation exception"
3. 	Try to select generic function without coercion
4. 	If more than one function selected than throw "ambigulous method invocation exception"
5. 	Select all applicable function with coercions
6. 	Select most specific function
   	One function is more specific than another if the invocation handled by the first method
   	could be passed on to the other one
7. 	If the most specific function can't be selected than throw
   	"ambigulous method invocation exception"
As we have introduced the most specific function selection algorithm,
that throws an error in the situation when the most specific method
can't be selected, we must resolve the conflicts explicitly for the
UNKNOWN type operators.

For instance for the invokation `BIGINT + UKNNOWN` two operators are
selected as applicable:

- `ADD(BIGINT, BIGINT)`
- `ADD(BIGINT, INTERVAL)`

Before the `most specific function` algorithm was introduced the
random operators from those two was considered for invokation.

Most specific function selection algorithm can't resolve this conflict
because the `BIGINT` and the `INTERVAL` types are not linked with the
implicit coercion. For the algorithm theese two types are just 2 completely
different types. But bassicaly the default sematic for UNKNOWN should be
simple. `OPERATOR(T, UNKNOWN)` must return `T`. To keep this semantic
and resolve the ambigulity it was decided to introduce explicit
arithmetic and inequallity operators for the `UNKNOWN` type.
Resolve `length(varchar), length(varbinary)` ambiguous call for unknown type
Resolve `approx_set(varchar), approx_set(bigint)` ambigulous call for unknown type
 + Added StandardErrorCode.AMBIGUOUS_FUNCTION_CALL.
 + Ambiguous exception is thrown as PrestoException instead of IllegalStateException.
 + Ambiguous exception has consistent message for same input.
We need to test operators that return decimal not only in dedicated `DECIMAL` test classes.
Changes the algorithm to resolve functions as follows:
1. look for an exact match
2. find the most specific function after coercing arguments
3. If more than one exists, find the most specific function that only
coerces null arguments.
4. If more than one exists, look for one whose parameters are of the
same type.

Also, only create UnknownOperators for argument types that would
result in ambiguous functions.
@arhimondr arhimondr force-pushed the stripe-prunning-decimal-orc branch from 4c795d2 to fb72094 Compare April 26, 2016 17:07
@arhimondr
Copy link
Member Author

In order to introduce DOUBLE to DECIMAL, DECIMAL to BIGINT and DECIMAL to INTEGER saturated casts i had to re-base this patch onto #4958. This patch can be based on master without those particular commits though.

if (value.isNull()) {
return Optional.of(NullableValue.asNull(targetType));
}
Object coercedValue = new FunctionInvoker(metadata.getFunctionRegistry())
if (!TypeRegistry.canCoerce(value.getType(), targetType)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can move this coercion check after the null check. That would imply that the any null type can be coerced into any other type.

private Optional<Signature> getSaturatedFloorCastOperator(Type fromType, Type toType)
{
if (!metadata.getFunctionRegistry().canResolveOperator(SATURATED_FLOOR_CAST, toType, ImmutableList.of(fromType))) {
return Optional.empty();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have used map in every other places that you noticed, but it can't be used here, cause there are no method in FunctionRegistry that return Optional.

@erichwang
Copy link
Contributor

@dain, I looked at the DomainTranslator bits and it looks good other than just a few minor comments there

Andrii Rosa added 7 commits April 27, 2016 11:37
DomainTranslator extracts the TupleDomain for the simple
comparisons (>=, <=, =, <>, etc.). TupleDomain is further
used by connectors to implement the partition pruning
based on the query predicates.

If the column is compared with the value of narrower type
(column_double >= BIGINT '1'), the value can be easily
coerced to the wider type (column_double >= DOUBLE '1.0'),
and no workarounds are needed.

If the column is compared with the value of wider type
(column_integer >= DOUBLE '1.0'), the value must be
rounded to the narrower type.

Rounding algorithm for `column_integer >= const_double`
comparation was already implemented. This commit is a
generalization of that algoritm. Now in order to make some
types pair elligable for rounding and domain translation
the `SATURATED_FLOOR_CAST` operator must be registered for that
types pair.

Resolves: prestodb#5013
@arhimondr arhimondr force-pushed the stripe-prunning-decimal-orc branch from fb72094 to f56b415 Compare April 27, 2016 09:38
@erichwang erichwang assigned martint and unassigned erichwang Apr 28, 2016
@erichwang
Copy link
Contributor

@martint, dain looked at the ORC parts, and I cleared the TupleDomain stuffs

@martint
Copy link
Contributor

martint commented May 10, 2016

@arhimondr, can we close this? I've merge the ORC-related bits (#5190), and the other commits are covered by a different PR, no?

@martint martint assigned arhimondr and unassigned martint May 10, 2016
@arhimondr
Copy link
Member Author

@martint There are still 3 commits left unmerged.

Implement DECIMAL to BIGINT saturated floor cast
Implement DOUBLE to DECIMAL saturated floor cast
Implement DECIMAL to INTEGER saturated floor cast

Those commits depend on decimal coercions.
We can close this PR for now, and once decimal coercions merged i can reopen it.

@cberner cberner closed this May 10, 2016
@arhimondr arhimondr deleted the stripe-prunning-decimal-orc branch May 10, 2016 16:00
@arhimondr arhimondr restored the stripe-prunning-decimal-orc branch May 10, 2016 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants