Skip to content

Conversation

@expani
Copy link
Collaborator

@expani expani commented Dec 23, 2025

Description

we are missing this check in SQL plugin causing agg_for_doc_count to be not added as an aggregate when Distinct Cardinality Aggregate is part of the query.

Adding this the plan's look like follows

[INFO ][o.o.s.o.r.OpenSearchQueryRequest] [runTask-0] Calcite Logical Plan after Conversion
 LogicalSystemLimit(sort0=[$0], dir0=[DESC-nulls-last], fetch=[10000], type=[QUERY_SIZE_LIMIT])
  LogicalSort(sort0=[$0], dir0=[DESC-nulls-last], fetch=[10])
    LogicalProject(u=[$1], RegionID=[$0], agg_for_doc_count=[$2])
      LogicalAggregate(group=[{0}], u=[APPROX_COUNT_DISTINCT($1)], agg_for_doc_count=[COUNT()])
        LogicalProject(RegionID=[$68], UserID=[$84])
          CalciteLogicalIndexScan(table=[[OpenSearch, hits]])

doc_count is expected for all scenarios.

This is required for merging at Coordinator to work properly. See https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/search/aggregations/bucket/terms/InternalTerms.java#L395

There are multiple places that rely on doc count for instance https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/search/aggregations/bucket/terms/InternalTerms.java#L474 relies on the same to prune buckets lower that a certain doc count.

More context in opensearch-project/OpenSearch#20263

Signed-off-by: expani <anijainc@amazon.com>
@expani expani merged commit a0b649f into vinaykpud:feature/substrait-plan Dec 23, 2025
5 of 33 checks passed
abhita added a commit to abhita/sql that referenced this pull request Jan 5, 2026
Merge pull request vinaykpud#29 from expani/feature/substrait-plan

Adding agg_for_doc_count even if distinct count is present in query
Adding agg_for_doc_count even if distinct count is present in query

Signed-off-by: expani <anijainc@amazon.com>

Merge pull request vinaykpud#28 from expani/df_lucene_path_sql

Skip generating and setting the Substrait plan based on Index Setting
Updated to test to use new index setting

Signed-off-by: expani <anijainc@amazon.com>

Using Optimized index setting to not generate substrait for existing path

Signed-off-by: expani <anijainc@amazon.com>

Resolved Sort Pushdown and updated integration tests queries and results (vinaykpud#27)

* Added sort in pushdown context

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

* Removed unwanted sort pushdown in pushDownSortAggregateMeasure

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

* Accessed the fullReNodeTree in CalciteEnumerableIndexScan removed the redundancy in OpenSearchRequest

Also i did this since earlier way of accessing RelNode from ThreadLocal from OpenSearchRequest class
is not working as it looks like the thread which sets the variable and accesses it now are different

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

* Updated some integ test queries and results

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

---------

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Added substrait type conversion support for IP field (vinaykpud#26)

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Use Full RelNode tree for execution

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

Fixed the sort condition for queries with multiple aggregation (vinaykpud#24)

without this change if we have a PPL query with multiple aggregations
and sort condition, it will sort only based on the first aggregated field.
Also found similar changes done in latest mainline:
https://tinyurl.com/u32kpyk9

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Added sort after aggregarte relNode for pushdown context (vinaykpud#23)

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Added logs for reconstructPushedDownRelNodeTree (vinaykpud#22)

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Added min function for string field and fixed query 29 in tests (vinaykpud#20)

* Added min function for string field

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

* Fixed the query 29 PPL

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

---------

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
All changes related to convert only pushed down operations as RelNodeTree and Substrait (vinaykpud#19)

* Changes to convert only pushed down operations RelNode and Substrait

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

* Adding the missing Project operator after the filter

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

* Small refactors in PushDownContext and logs

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

* Added full relNode log

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

* Removed unwanted comments

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

* Use Direct RelNode.copy() Instead of RelBuilder

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

* Disabled CoreRules.FILTER_REDUCE_EXPRESSIONS

this is done to disable the calcite optimization on timerange queries
ie calcite optimizes `>=($0, TIMESTAMP('2013-07-01 00:00:00')) AND <=($0, TIMESTAMP('2013-07-31 00:00:00'))`
to `SEARCH($0, Sarg[['2013-07-01 00:00:00'..'2013-07-31 00:00:00']])`

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

---------

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant