Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate span collector #990

Conversation

dai-chen
Copy link
Collaborator

@dai-chen dai-chen commented Oct 28, 2022

Signed-off-by: Chen Dai daichen@amazon.com

Description

Background

SpanCollector is an implementation class of Collector interface. In AggregationOperator, it's used to generate span start point for an ExprValue (in collector.bucketKey() which is similar responsibility as WindowAssigner). Eventually, AggregationOperator will fetch all results (in collector.results()).

Problems

For some reason, SpanCollector doesn't convert aggregate result map to ExprValue simply and return it to AggregationOperator. Instead, it calculates result array size and locate each map key again. This is probably to support filling gaps between windows as example shown below:

stats count() by span(timestmap, 5 min):

exprValue1: 00:01:00  // window=[00:00:00, 00,05:00)
exprValue2: 00:22:00 // window=[00:20:00, 00:25:00)

// Without special handling:
window1=[00:00:00, 00:05:00), 1
window2=[00:20:00, 00:25:00), 1

// With empty window filled in-between:
window1=[00:00:00, 00:05:00), 1
window2=[00:05:00, 00:10:00), null
window3=[00:10:00, 00:15:00), null
window4=[00:15:00, 00:20:00), null
window5=[00:20:00, 00:25:00), 1

Solution in the PR

However, after testing both current span and OpenSearch histogram DSL query, this complicated logic is no use any more. This PR is to deprecate it and remove unused code in Rounding class correspondingly. This prepares AggregationOperator and Rounding for the upcoming stream processing changes.

Testing

PUT span-test
{
  "mappings": {
    "properties": {
      "eventTime": {
        "type": "date"
      }
    }
  }
}

POST span-test/_bulk
{ "index" : { "_id" : "1" } }
{ "eventTime" : "2022-11-03T00:01:00Z" }
{ "index" : { "_id" : "2" } }
{ "eventTime" : "2022-11-03T00:10:00Z" }

POST _plugins/_ppl
{
  "query": """
    source = span-test
    | stats count(1) by span(eventTime, 5m) AS windowStartTime
  """
}

# Explain the PPL query above to get DSL query as below
POST span-test/_search
{"from":0,"size":0,"timeout":"1m","aggregations":{"composite_buckets":{"composite":{"size":1000,"sources":[{"span(eventTime,5m)":{"date_histogram":{"field":"eventTime","missing_bucket":true,"missing_order":"first","order":"asc","fixed_interval":"5m"}}}]},"aggregations":{"count(1)":{"value_count":{"field":"_index"}}}}}}

Both PPL and DSL query returns only bucket which has data:

{
  "schema": [
    {
      "name": "count(1)",
      "type": "integer"
    },
    {
      "name": "windowStartTime",
      "type": "timestamp"
    }
  ],
  "datarows": [
    [
      1,
      "2022-11-03T00:00:00.000+0000"
    ],
    [
      1,
      "2022-11-03T00:10:00.000+0000"
    ]
  ],
  "total": 3,
  "size": 3
}

Issues Resolved

#954

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen added the maintenance Improves code quality, but not the product label Oct 28, 2022
@dai-chen dai-chen self-assigned this Oct 28, 2022
@codecov-commenter
Copy link

codecov-commenter commented Oct 28, 2022

Codecov Report

Merging #990 (6511f61) into feature/maximus-m1 (eea2689) will decrease coverage by 35.51%.
The diff coverage is n/a.

@@                    Coverage Diff                    @@
##             feature/maximus-m1     #990       +/-   ##
=========================================================
- Coverage                 98.27%   62.76%   -35.52%     
=========================================================
  Files                       339       10      -329     
  Lines                      8545      658     -7887     
  Branches                    561      119      -442     
=========================================================
- Hits                       8398      413     -7985     
- Misses                      142      192       +50     
- Partials                      5       53       +48     
Flag Coverage Δ
query-workbench 62.76% <ø> (?)
sql-engine ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...opensearch/sql/expression/span/SpanExpression.java
...arch/sql/planner/physical/AggregationOperator.java
...ql/planner/physical/collector/BucketCollector.java
...arch/sql/planner/physical/collector/Collector.java
...earch/sql/planner/physical/collector/Rounding.java
...main/java/org/opensearch/sql/executor/Explain.java
...ql/opensearch/request/OpenSearchScrollRequest.java
...watermark/BoundedOutOfOrderWatermarkGenerator.java
...ript/aggregation/dsl/BucketAggregationBuilder.java
...ql/planner/physical/collector/MetricCollector.java
... and 338 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen added this to the Maximus M1 - Phase 1 milestone Oct 31, 2022
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen changed the title Refactor span collector to support windowing operation Deprecate span collector Nov 3, 2022
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen marked this pull request as ready for review November 3, 2022 23:16
@dai-chen dai-chen requested a review from a team as a code owner November 3, 2022 23:16
Copy link
Collaborator

@penghuo penghuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@dai-chen dai-chen merged commit 48eeb0e into opensearch-project:feature/maximus-m1 Nov 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Improves code quality, but not the product
Development

Successfully merging this pull request may close these issues.

4 participants