Skip to content

Composite aggregates #3266

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

normen662
Copy link
Contributor

@normen662 normen662 commented Mar 21, 2025

This PR adds the ability to plan arbitrary aggregation queries using aggregation indexes as well as intersections of aggregation indexes. It also corrects/adds the infrastructure to deal with complicated order-by requirements.

  • refactor data access rules into a data access rule for value indexes and one for aggregation indexes
  • have both of these data access rule implementations share the maximum code (only intersection planning is different)
  • intersection planning for aggregation indexes
    • because of the same reasoning as regular index intersections AND in order to answer the query as these indexes carry data that is not part of the underlying table (the aggregation)
    • rollups of finer-granularity index scans if such a roll-up can be useful in answering the query
  • order by requirements are treated in the same framework as all other kinds of expressions (using M3 maps, translations, etc.)

@normen662 normen662 force-pushed the composite-aggregates branch from a25ae95 to 4c9ec49 Compare March 24, 2025 08:24
@normen662 normen662 force-pushed the composite-aggregates branch 2 times, most recently from 72f7a46 to 2f5e9b8 Compare April 2, 2025 19:26
@normen662 normen662 force-pushed the composite-aggregates branch from 8f97639 to 55cd005 Compare April 10, 2025 12:47
@normen662 normen662 requested review from hatyo, alecgrieser and MMcM April 15, 2025 08:20
@normen662 normen662 added the enhancement New feature or request label Apr 15, 2025
@normen662 normen662 force-pushed the composite-aggregates branch 2 times, most recently from adec3d2 to 91ed1cf Compare May 27, 2025 11:02
@normen662 normen662 force-pushed the composite-aggregates branch from 91ed1cf to 6fa2c04 Compare July 9, 2025 16:48
@normen662 normen662 force-pushed the composite-aggregates branch from 6fa2c04 to 0aeab9c Compare July 9, 2025 16:52
@hatyo hatyo self-requested a review July 31, 2025 13:16
@@ -44,6 +44,10 @@ static <T> EnumeratingIterable<T> singleIterable(@Nonnull final T singleElement)
return new SingleIterable<>(singleElement);
}

static <T> EnumeratingIterable<T> emptyOnEmptyIterable() {
return new SingleIterable<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the reason for using SingleIterator (with null argument) instead of EmptyIterator is related to returning empty list, compared to returning null in EmptyIterator when calling computeNext, is that so, why not just use EmptyIterator instead and let the customer deal with the null?

* @return {@code Optional.empty()} if the match could not be adjusted, Optional.of(matchInfo) for a new adjusted
* match, otherwise.
*/
@Nonnull
default Optional<MatchInfo> adjustMatch(@Nonnull final PartialMatch partialMatch) {
default Optional<MatchInfo> adjustMatch(@Nonnull final PartialMatch partialMatch,
@Nonnull final Quantifier candidateQuantifier) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this quantifier the one of the child? can we not always infer it considering that this should have a single child.

@@ -212,7 +212,8 @@ public MatchableSortExpression translateCorrelations(@Nonnull final TranslationM

@Nonnull
@Override
public Optional<MatchInfo> adjustMatch(@Nonnull final PartialMatch partialMatch) {
public Optional<MatchInfo> adjustMatch(@Nonnull final PartialMatch partialMatch,
@Nonnull final Quantifier candidateQuantifier) {
final var childMatchInfo = partialMatch.getMatchInfo();
final var maxMatchMap = childMatchInfo.getMaxMatchMap();
final var innerQuantifier = Iterables.getOnlyElement(getQuantifiers());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

innerQuantifier and candidateQuantifier are always the same I think.

@@ -70,7 +70,7 @@ public class YamlTestExtension implements TestTemplateInvocationContextProvider,
private final boolean includeMethodInDescriptions;

public YamlTestExtension() {
this(null, false);
this(null, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this necessary?

- unorderedResult: [{BITMAP: xStartsWith_1250'0200004', 'CATEGORY': 'hello', 'OFFSET':0},
{BITMAP: xStartsWith_1250'02', 'CATEGORY': 'hello', 'OFFSET':10000},
{BITMAP: xStartsWith_1250'0400008', 'CATEGORY': 'world', 'OFFSET':0}]
#-
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these tests commented out?

return IntersectionResult.noViableIntersection();
}

final var compensation =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can immediately return IntersectionResult.noViableIntersection() if the resulting intersection compensation is impossible, correct?

return RecordConstructorValue.ofColumns(columnBuilder.build());
}

private static TranslationMap computeTranslationMap(@Nonnull final CorrelationIdentifier intersectionAlias,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be tagged with @Nonnull

}
isCompensationImpossible |= resultCompensationFunction.isImpossible();

groupByMappings =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add some comments explaining the handling of matched group by aggregations and groups here.

.map(childPlan -> (Function<byte[], RecordCursor<QueryResult>>)
((byte[] childContinuation) -> childPlan
.executePlan(store, context, childContinuation, childExecuteProperties)))
.collect(Collectors.toList()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use ImmutableList here instead?

}
}

return IntersectionResult.of(hasCommonOrdering ? intersectionOrdering : null, compensation,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is possible to retrieve an IntersectionResult which non-empty list of RelationalExpressions and null-common ordering, if we examine the code flow above.

Having said that, we could probably cleanup the code, such that we can remove the hasCommonFlag maybe?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants