Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add request tracing framework #7648

Merged
merged 43 commits into from
Jun 28, 2023

Conversation

suranjay
Copy link
Contributor

@suranjay suranjay commented May 22, 2023

Description

This change adds request tracing framework in OpenSearch. It also integrates Opentelemetry with OpenSearch as it uses the framework for implementing tracing capabilities in OpenSearch.

The tracing framework automatically handles context propagation between different threads, tasks and nodes.

Note: The integrations tests will be covered in the subsequent PR to limit the changes in this PR.

Related Issues

Resolves #7543, #7544

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@suranjay suranjay changed the title Add request tracing framework [WIP] Add request tracing framework May 22, 2023
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Copy link
Member

@shwetathareja shwetathareja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @suranjay picking this up. What is the overhead of Tracer framework disabled by default? Do we have some nos.?

Copy link
Collaborator

@Bukhtawar Bukhtawar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider re-wiring dependencies. It seems too tightly coupled with Otel. Please also consider moving this to a module

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.search.backpressure.SearchBackpressureIT.testSearchShardTaskCancellationWithHighCpu

@codecov
Copy link

codecov bot commented May 28, 2023

Codecov Report

Merging #7648 (7591c1f) into main (4c02dd1) will increase coverage by 0.07%.
The diff coverage is 77.25%.

@@             Coverage Diff              @@
##               main    #7648      +/-   ##
============================================
+ Coverage     70.90%   70.97%   +0.07%     
- Complexity    56710    56866     +156     
============================================
  Files          4733     4752      +19     
  Lines        268333   268599     +266     
  Branches      39322    39336      +14     
============================================
+ Hits         190254   190645     +391     
+ Misses        61996    61888     -108     
+ Partials      16083    16066      -17     
Impacted Files Coverage Δ
...rg/opensearch/telemetry/tracing/SpanReference.java 0.00% <0.00%> (ø)
...pensearch/common/settings/FeatureFlagSettings.java 50.00% <ø> (ø)
...racing/ThreadContextBasedTracerContextStorage.java 16.66% <16.66%> (ø)
.../opensearch/telemetry/tracing/noop/NoopTracer.java 25.00% <25.00%> (ø)
server/src/main/java/org/opensearch/node/Node.java 84.95% <27.27%> (-0.89%) ⬇️
...pensearch/telemetry/tracing/NoopTracerFactory.java 50.00% <50.00%> (ø)
...elemetry/tracing/OTelTracingContextPropagator.java 75.00% <75.00%> (ø)
...va/org/opensearch/telemetry/TelemetrySettings.java 75.00% <75.00%> (ø)
...search/telemetry/tracing/OTelTracingTelemetry.java 82.35% <82.35%> (ø)
...rg/opensearch/telemetry/tracing/OTelTelemetry.java 83.33% <83.33%> (ø)
... and 14 more

... and 457 files with indirect coverage changes

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: null ❌
  • URL: null
  • CommitID: 764862b
    Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
    Is the failure a flaky test unrelated to your change?

@suranjay suranjay force-pushed the main-opentelemetry branch from 764862b to 60c2a47 Compare May 29, 2023 09:19
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: null ❌
  • URL: null
  • CommitID: 60c2a47
    Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
    Is the failure a flaky test unrelated to your change?

@suranjay suranjay closed this May 29, 2023
@suranjay suranjay force-pushed the main-opentelemetry branch from 60c2a47 to 7f5a378 Compare May 29, 2023 09:29
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: null ❌
  • URL: null
  • CommitID: 7f5a378
    Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
    Is the failure a flaky test unrelated to your change?

suranjay added 2 commits May 29, 2023 15:02
Signed-off-by: suranjay <surajkumar.tu@gmail.com>
Signed-off-by: suranjay <surajkumar.tu@gmail.com>
@suranjay suranjay reopened this May 29, 2023
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: null ❌
  • URL: null
  • CommitID: f275db8
    Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
    Is the failure a flaky test unrelated to your change?

1 similar comment
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: null ❌
  • URL: null
  • CommitID: f275db8
    Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
    Is the failure a flaky test unrelated to your change?

Signed-off-by: suranjay <surajkumar.tu@gmail.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: null ❌
  • URL: null
  • CommitID: 0d67058
    Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
    Is the failure a flaky test unrelated to your change?

@suranjay
Copy link
Contributor Author

Thanks @suranjay picking this up. What is the overhead of Tracer framework disabled by default? Do we have some nos.?

@shwetathareja There is no overhead with of Tracer framework in disabled state. We have verified this by doing performance benchmarking using OSB

Signed-off-by: suranjay <surajkumar.tu@gmail.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: null ❌
  • URL: null
  • CommitID: a2dd91d
    Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
    Is the failure a flaky test unrelated to your change?

Copy link
Member

@shwetathareja shwetathareja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @suranjay . I am still going over the changes, published some feedback after first pass. Btw, you can change the status to actual PR from DRAFT.

buildSrc/version.properties Outdated Show resolved Hide resolved
server/build.gradle Outdated Show resolved Hide resolved
server/build.gradle Outdated Show resolved Hide resolved
Copy link
Contributor

@gashutos gashutos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few files left from TracerManager.....

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@rishabhmaurya
Copy link
Contributor

Thanks @suranjay, LGTM!

@reta
Copy link
Collaborator

reta commented Jun 27, 2023

Thanks @suranjay !

@reta
Copy link
Collaborator

reta commented Jun 27, 2023

(PS: the build should be fixed by #8290)

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testNodeDropWithOngoingReplication

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      2 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testNodeDropWithOngoingReplication
      1 org.opensearch.snapshots.RestoreSnapshotIT.testRestoreRemoteStoreIndicesWithoutRemoteTranslog
      1 org.opensearch.snapshots.RestoreSnapshotIT.testRestoreRemoteStoreIndicesWithRemoteTranslog
      1 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testDropPrimaryDuringReplication

Signed-off-by: suranjay <surajkumar.tu@gmail.com>
@suranjay suranjay force-pushed the main-opentelemetry branch from 3e1e031 to 7637b37 Compare June 28, 2023 05:26
Signed-off-by: suranjay <surajkumar.tu@gmail.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.snapshots.RestoreSnapshotIT.testRestoreRemoteStoreIndicesWithoutRemoteTranslog
      1 org.opensearch.snapshots.RestoreSnapshotIT.testRestoreRemoteStoreIndicesWithRemoteTranslog
      1 org.opensearch.snapshots.DedicatedClusterSnapshotRestoreIT.testIndexDeletionDuringSnapshotCreationInQueue
      1 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testNodeDropWithOngoingReplication

@shwetathareja shwetathareja merged commit 03bc192 into opensearch-project:main Jun 28, 2023
@shwetathareja shwetathareja added the backport 2.x Backport to 2.x branch label Jun 28, 2023
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-7648-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 03bc19220a711853660652a4761c05285f597884
# Push it to GitHub
git push --set-upstream origin backport/backport-7648-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-7648-to-2.x.

sarthakaggarwal97 pushed a commit to sarthakaggarwal97/OpenSearch that referenced this pull request Jun 28, 2023
* Add request tracing framework

Signed-off-by: suranjay <surajkumar.tu@gmail.com>

* Introduce ThreadContextStatePropagator to propagate request and transient headers within ThreadContext

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

---------

Signed-off-by: suranjay <surajkumar.tu@gmail.com>
Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
Co-authored-by: Andriy Redko <andriy.redko@aiven.io>
suranjay added a commit to suranjay/OpenSearch that referenced this pull request Jun 29, 2023
* Add request tracing framework

Signed-off-by: suranjay <surajkumar.tu@gmail.com>

* Introduce ThreadContextStatePropagator to propagate request and transient headers within ThreadContext

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

---------

Signed-off-by: suranjay <surajkumar.tu@gmail.com>
Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
Co-authored-by: Andriy Redko <andriy.redko@aiven.io>
baba-devv pushed a commit to baba-devv/OpenSearch that referenced this pull request Jul 29, 2023
* Add request tracing framework

Signed-off-by: suranjay <surajkumar.tu@gmail.com>

* Introduce ThreadContextStatePropagator to propagate request and transient headers within ThreadContext

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

---------

Signed-off-by: suranjay <surajkumar.tu@gmail.com>
Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
Co-authored-by: Andriy Redko <andriy.redko@aiven.io>
@rishabhmaurya rishabhmaurya mentioned this pull request Aug 1, 2023
9 tasks
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
* Add request tracing framework

Signed-off-by: suranjay <surajkumar.tu@gmail.com>

* Introduce ThreadContextStatePropagator to propagate request and transient headers within ThreadContext

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

---------

Signed-off-by: suranjay <surajkumar.tu@gmail.com>
Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
Co-authored-by: Andriy Redko <andriy.redko@aiven.io>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Tracing Framework]Tracing framework implementation
7 participants