Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds instrumentation for search path #8408

Conversation

Gaganjuneja
Copy link
Contributor

Description

Adds the instrumentation on Search path. Making it a Draft PR to discuss the instrumentation on SearchService.java whether this is the right way to do explicit tracing or should we do inside SearchOperationsListener, which will require the state management.

Related Issues

Resolves #[7546]

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Gagan Juneja <gjjuneja@amazon.com>
@Gaganjuneja
Copy link
Contributor Author

@reta @shwetathareja Please provide your thoughts on this.

@github-actions
Copy link
Contributor

github-actions bot commented Jul 3, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Jul 5, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Jul 5, 2023

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      2 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testNodeDropWithOngoingReplication
      1 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testDropPrimaryDuringReplication
      1 org.opensearch.indices.replication.SegmentReplicationRelocationIT.testPrimaryRelocation

@reta
Copy link
Collaborator

reta commented Jul 6, 2023

By flag you mean levels or some component level settings?

Nope, see please [1]

Trace Flags, a binary encoding containing information about the trace

[1] https://opentelemetry.io/docs/concepts/signals/traces/#span-context

@Gaganjuneja
Copy link
Contributor Author

@reta , I am working on documentation task for tracing framework features for release 2.9. Which includes how somebody can enable/disable the feature and other settings. While I fully appreciate and support the holistic approach you've outlined for adding instrumentation, I believe it would be beneficial to include a basic search request instrumentation as a stepping stone that users can easily relate to. (SearchTask -> QueryPhase -> FetchPhase)

I have created a meta task for the overall tracing implementation and will continue working on that. I tried doing the changes for TransportAction but it's a lot of work and requires testing as there are hundreds of implementations of TransportAction and we need to pass the TracingFactory instance from all concrete classes.

@reta
Copy link
Collaborator

reta commented Jul 9, 2023

While I fully appreciate and support the holistic approach you've outlined for adding instrumentation, I believe it would be beneficial to include a basic search request instrumentation as a stepping stone that users can easily relate to. (SearchTask -> QueryPhase -> FetchPhase)

As I mentioned, I don't see the point in rushing the instrumentation in the current form (it could be replaced by a simple log statements if there is an urgent need to do so). If others see the value in it - I have no objections.

@Bukhtawar
Copy link
Collaborator

Bukhtawar commented Jul 9, 2023

Thanks @Gaganjuneja for this, wanted to understand more around the changes. As I see, the tracing is being done mostly around the shard level search execution SearchServices as long as changes are pertaining to this class or specific search operations, without touching the distributed transport layers and task coordinations I would still be fine with it since highly likely they won't change. But the moment you change the transport layers TransportSearchAction it makes me concerned if we are writing it generic enough and thinking through how the framework interactions would evolve.
If you can isolate core search module instrumentations and still use this PR to build search shard level interactions more generically, I am good.

The benefit I see with doing this way is, if there are core deficiencies in the tracing framework we have built so far, it would help us understand those gaps upfront and course correct without having to invest into into a framework first and then thinking how best these core modules would fit in

@reta what do you think?

@reta
Copy link
Collaborator

reta commented Jul 9, 2023

Thanks @Bukhtawar

@reta what do you think?

We already found one with Tracing API [1], this is good (in my opinion) and we could address that now.

But the moment you change the transport layers TransportSearchAction it makes me concerned if we are writing it generic enough and thinking through how the framework interactions would evolve.

That the most complicated and arguably the most important part of the work. It opens up the way for context transfer and actually getting real benefits of distributed nature of the tracing (not only for core but beyond that). What we get with this (isolated) instrumentation - out of the blue uncorrelated traces. This is what logs are for (my opinion).

@shwetathareja
Copy link
Member

@Gaganjuneja @Bukhtawar
I completely agree to that point that we should have more sample instrumentation across different code paths to see if there are gaps with the tracing framework we have built so far. But agree with @reta here that thinking on how we are going to introduce the tracing at the transport level generically is important as without that others might not be able to use it holistically.
This PR does start the discussion on one other aspect whether we should simply do inline instrumentation vs use the current listeners SearchOperationListener and what changes we need for the same.

Gagan Juneja added 3 commits July 10, 2023 15:33
Signed-off-by: Gagan Juneja <gjjuneja@amazon.com>
Signed-off-by: Gagan Juneja <gjjuneja@amazon.com>
Signed-off-by: Gagan Juneja <gjjuneja@amazon.com>
@Gaganjuneja
Copy link
Contributor Author

@Gaganjuneja @Bukhtawar I completely agree to that point that we should have more sample instrumentation across different code paths to see if there are gaps with the tracing framework we have built so far. But agree with @reta here that thinking on how we are going to introduce the tracing at the transport level generically is important as without that others might not be able to use it holistically. This PR does start the discussion on one other aspect whether we should simply do inline instrumentation vs use the current listeners SearchOperationListener and what changes we need for the same.

Absolutely, @shwetathareja, We need to decide on that as well, explicit inline instrumentation looks good to me.

@Gaganjuneja
Copy link
Contributor Author

Thanks @Bukhtawar

@reta what do you think?

We already found one with Tracing API [1], this is good (in my opinion) and we could address that now.

But the moment you change the transport layers TransportSearchAction it makes me concerned if we are writing it generic enough and thinking through how the framework interactions would evolve.

That the most complicated and arguably the most important part of the work. It opens up the way for context transfer and actually getting real benefits of distributed nature of the tracing (not only for core but beyond that). What we get with this (isolated) instrumentation - out of the blue uncorrelated traces. This is what logs are for (my opinion).

@reta @Bukhtawar Please take a look once. I have added the instrumentation in TransportAction directly and removed from the TransportSearchAction.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@suranjay suranjay mentioned this pull request Jul 10, 2023
6 tasks
@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity. Remove stalled label or comment or this will be closed in 7 days.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Aug 11, 2023
@opensearch-trigger-bot
Copy link
Contributor

This PR was closed because it has been stalled for 7 days with no activity.

@kotwanikunal
Copy link
Member

Apologies. This PR was auto closed without reaching a resolution from the maintainers.
Re-opening to move it forward.
Thanks for your contributions to OpenSearch!

@kotwanikunal kotwanikunal reopened this Sep 14, 2023
@kotwanikunal kotwanikunal removed the stalled Issues that have stalled label Sep 14, 2023
@github-actions
Copy link
Contributor

Compatibility status:

Checks if related components are compatible with change 479a937

Incompatible components

Skipped components

Compatible components

Compatible components: []

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Jan 13, 2024
@kotwanikunal
Copy link
Member

There has been no movement on this PR. @Gaganjuneja Please reopen if you still plan to work on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stalled Issues that have stalled
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants