Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply fast date histogram optimization at the segment level #12073

Merged
merged 19 commits into from
Feb 7, 2024

Conversation

bowenlan-amzn
Copy link
Member

@bowenlan-amzn bowenlan-amzn commented Jan 29, 2024

Description

This PR mainly aims to handle the segment level functionally match all scenario. And is a follow up of #11083 and #11505
Context: match all is one of the pre-condition before doing this filter rewrite and weight count optimization. Previously, we check this by seeing if the query is of type MatchAllQuery, FieldExists or PointRangeQuery on the same field on the shard level. All 3 scenarios mean it's safe to rewrite the date histogram configuration into set of range filter query, and later use the query weight to count how many documents are in each filter or bucket.
This PR checks if it's functionally match all at segment level if not able to create filter at shard level. This can happen if the query part of the aggregation request is not of above 3 types, while it doesn't have real effect on the current processed segment. So we can still do the filter rewrite for this segment. To make sure the results of this segment conform to the shard level bucket results. We build filter by using the shard level min max value bounds and considering the PointRangeQuery on the same field.

This PR refactors the logic that is bind to data from Aggregator into a inner class of Aggregator so those instance fields are easily accessed. While still keeps the common logic in the Helper as methods in the abstract base class or static methods.

This PR add logic to handle the existence of doc count field.

Optimization logic illustration

flowchart

    subgraph ShardLevelInit
        AggregatorInit --> C1{{"IsRewritteable (cached)"}}
        C1 -->|Yes| C7{{"CheckQueryType"}}
        C7 -->|match all, range| Build(BuildFilters):::hl
    end

	subgraph BuildFiltersDetail
        direction TB
		GetBounds --> |shard level| C5("shard bounds + PointRangeQuery")
        C5 --> GetRoundingInterval
        GetBounds --> |segment level| C6("segment bounds")
        C6 --> GetRoundingInterval
		GetRoundingInterval --> GetPreparedRounding
		GetPreparedRounding --> CreateFilters

		subgraph CreateFilters
			CountBucket --> C2{{CheckMaxNumOfBuckets}}
			C2 --> |Not Exceed| E1("CreateFilter (cached)")
		end
	end

	subgraph SegmentLevelCompute
		direction TB
		TryFilters --> C3{{CheckFilterBuildAtShardLevel}}
		C3 --> |No| C4{{CheckSegmentMatchAll}}
		C4 --> |Yes| Build2(BuildFilters):::hl
		Build2 --> BuildBuckets
		C3 --> |Yes| BuildBuckets
	end

    ShardLevelInit ~~~ SegmentLevelCompute
    SegmentLevelCompute ~~~ BuildFiltersDetail

    classDef hl fill:#f73

Loading

Related Issues

Resolves

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
@bowenlan-amzn bowenlan-amzn added the backport 2.x Backport to 2.x branch label Jan 29, 2024
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Copy link
Contributor

github-actions bot commented Jan 29, 2024

Compatibility status:

Checks if related components are compatible with change e1dbcea

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/flow-framework.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/performance-analyzer.git]

Copy link
Contributor

❌ Gradle check result for e963981: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 85ef307: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Feb 7, 2024

❌ Gradle check result for 0a78297: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Feb 7, 2024

❕ Gradle check result for e1dbcea: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testRestartPrimary

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@bowenlan-amzn
Copy link
Member Author

bowenlan-amzn commented Feb 7, 2024

❕ Gradle check result for e1dbcea: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testRestartPrimary

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

#11993

Copy link
Contributor

github-actions bot commented Feb 7, 2024

✅ Gradle check result for e1dbcea: SUCCESS

@msfroh msfroh merged commit 9a0a69f into opensearch-project:main Feb 7, 2024
58 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-12073-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 9a0a69fb530732e49834a2f2ddc3011cca82b26c
# Push it to GitHub
git push --set-upstream origin backport/backport-12073-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-12073-to-2.x.

@msfroh msfroh added backport 2.x Backport to 2.x branch and removed backport 2.x Backport to 2.x branch backport-failed labels Feb 7, 2024
opensearch-trigger-bot bot pushed a commit that referenced this pull request Feb 7, 2024
---------

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
(cherry picked from commit 9a0a69f)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@reta
Copy link
Collaborator

reta commented Feb 7, 2024

Apparently related to this change:

[org.opensearch.search.aggregations.bucket.MinDocCountIT.testDateHistogramCountDesc {p0={"search.concurrent_segment_search.enabled":"false"}}](https://build.ci.opensearch.org/job/gradle-check/33522/testReport/junit/org.opensearch.search.aggregations.bucket/MinDocCountIT/testDateHistogramCountDesc__p0___search_concurrent_segment_search_enabled___false___/)
[org.opensearch.search.aggregations.bucket.MinDocCountIT.testDateHistogramKeyDesc {p0={"search.concurrent_segment_search.enabled":"false"}}](https://build.ci.opensearch.org/job/gradle-check/33522/testReport/junit/org.opensearch.search.aggregations.bucket/MinDocCountIT/testDateHistogramKeyDesc__p0___search_concurrent_segment_search_enabled___false___/)
[org.opensearch.search.aggregations.bucket.MinDocCountIT.testDateHistogramKeyAsc {p0={"search.concurrent_segment_search.enabled":"false"}}](https://build.ci.opensearch.org/job/gradle-check/33522/testReport/junit/org.opensearch.search.aggregations.bucket/MinDocCountIT/testDateHistogramKeyAsc__p0___search_concurrent_segment_search_enabled___false___/)
[org.opensearch.search.aggregations.bucket.MinDocCountIT.testDateHistogramCountAsc {p0={"search.concurrent_segment_search.enabled":"false"}}](https://build.ci.opensearch.org/job/gradle-check/33522/testReport/junit/org.opensearch.search.aggregations.bucket/MinDocCountIT/testDateHistogramCountAsc__p0___search_concurrent_segment_search_enabled___false___/)

Reference https://build.ci.opensearch.org/job/gradle-check/33522/
@bowenlan-amzn could you please take a look?

msfroh pushed a commit that referenced this pull request Feb 9, 2024
…level (#12279)

* Apply fast date histogram optimization at the segment level (#12073)

---------

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
(cherry picked from commit 9a0a69f)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Fix: reset the filter built at segment level for date histogram optimization (#12267)


---------

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>

---------

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
opensearch-trigger-bot bot pushed a commit that referenced this pull request Feb 10, 2024
…level (#12279)

* Apply fast date histogram optimization at the segment level (#12073)

---------

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
(cherry picked from commit 9a0a69f)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Fix: reset the filter built at segment level for date histogram optimization (#12267)

---------

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>

---------

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
(cherry picked from commit 25c2fde)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
msfroh pushed a commit that referenced this pull request Feb 10, 2024
…level (#12279) (#12284)

* Apply fast date histogram optimization at the segment level (#12073)

---------


(cherry picked from commit 9a0a69f)


* Fix: reset the filter built at segment level for date histogram optimization (#12267)

---------



---------




(cherry picked from commit 25c2fde)

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
peteralfonsi pushed a commit to peteralfonsi/OpenSearch that referenced this pull request Mar 1, 2024
…ch-project#12073)


---------

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
rayshrey pushed a commit to rayshrey/OpenSearch that referenced this pull request Mar 18, 2024
…ch-project#12073)


---------

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…ch-project#12073)

---------

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
@bowenlan-amzn bowenlan-amzn deleted the 11654-segment-matchall branch May 3, 2024 05:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants