Skip to content
This repository was archived by the owner on Oct 28, 2025. It is now read-only.

Conversation

@liukatkat
Copy link
Collaborator

@liukatkat liukatkat commented Aug 1, 2025

What:

This PR lets us nest spans from the semgrep rpc process under spans for this mcp server.

How:

We pass in the span id and the trace id to the process through environment variables.

Test plan:

You can see the traces nesting properly on Datadog!
image
(Edit: sometimes the the top-level trace disappears...)

This was referenced Aug 1, 2025
Copy link
Collaborator Author

liukatkat commented Aug 1, 2025

This stack of pull requests is managed by Graphite. Learn more about stacking.

@liukatkat liukatkat changed the title link telemetry to ocaml server feat: connect traces to semgrep rpc process Aug 1, 2025
@liukatkat liukatkat marked this pull request as ready for review August 1, 2025 17:33
@liukatkat liukatkat force-pushed the katrina/mcp-telemetry branch from b3df907 to 68e079e Compare August 1, 2025 20:07
@liukatkat liukatkat force-pushed the katrina/linking-telemetry-to-ocaml branch 2 times, most recently from 022ad07 to 560bd7b Compare August 1, 2025 20:20
@liukatkat liukatkat force-pushed the katrina/mcp-telemetry branch from 68e079e to a292fcd Compare August 1, 2025 20:20
@liukatkat liukatkat force-pushed the katrina/linking-telemetry-to-ocaml branch from 560bd7b to aa70e5b Compare August 1, 2025 20:26
@liukatkat liukatkat force-pushed the katrina/mcp-telemetry branch from a292fcd to f96dba9 Compare August 1, 2025 20:26
@liukatkat liukatkat force-pushed the katrina/linking-telemetry-to-ocaml branch from aa70e5b to dda0f6c Compare August 2, 2025 00:21
@liukatkat liukatkat force-pushed the katrina/mcp-telemetry branch 2 times, most recently from a5b7303 to d1fb1c2 Compare August 4, 2025 21:34
@liukatkat liukatkat force-pushed the katrina/linking-telemetry-to-ocaml branch from dda0f6c to 7282056 Compare August 4, 2025 21:34
Comment on lines 190 to 201
env["SEMGREP_LOG_SRCS"] = "mcp,commons"
if top_level_span:
env["SEMGREP_TRACE_PARENT_SPAN_ID"] = trace.format_span_id(
top_level_span.get_span_context().span_id
)
env["SEMGREP_TRACE_PARENT_TRACE_ID"] = trace.format_trace_id(
top_level_span.get_span_context().trace_id
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of questions:

  • Does semgrep CLI read these variables?
  • How would this work with the RPC-based implementation? Should we align with the work there first?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! semgrep links spans by reading these variables. The variables are defined here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying!

How would this change with the new rpc implementation? So you know?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently, i enable tracing only when we start up the ocaml mcp process. then, when the user invoke the semgrep_scan_rpc tool, we perform the rpc call while passing the tracing info through these env vars to the ocaml mcp process so the spans can be linked!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I realize I have not provided enough context.

When using this semgrep rpc, I believe the strategy is to start a single semgrep mcp process thst will run in background and accept rpc calls.

If we use env variables, then the process will use the same trace.id and span.id for every call. I don't think this is what we want.

Evry rpc call should get it's own span.id/trace.id as they belong to a different transaction, right?

Am I missing something on how this is currently implemented in semgrep?

Copy link
Collaborator Author

@liukatkat liukatkat Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my understanding is that server_lifespan gets triggered every time a user starts a session (e.g. when they open cursor), and we will then create a semgrep mcp process per session. so, we will have one mcp (python) server, which spins up multiple semgrep mcp processes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my understanding is that server_lifespan gets triggered every time a user starts a session (e.g. when they open cursor), and we will then create a semgrep mcp process per session. so, we will have one mcp (python) server, which spins up multiple semgrep mcp processes.

This is correct. However, for obeservability, you'd normally prefer to have a new transaction, per operation. This allows for analyzing requests independently.

If we use the same transaction and span for the whole session, that means we will see all calls done by the agent under the same view and analysis on the observability cluster.

Again, I may be missing something about how this was implemented in semgrep mcp as I'm too familiar with the CLI:

  • Are you familiar with the implementation?
  • Do you have examples of how the transactions look like once ingested by an observability cluster?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on my current implementation, it is true that you will see all the calls to the tool under one top-level span. but you can still see the individual spans separately.

Screenshot 2025-08-06 at 11.42.16 AM.png

i can also look at these individual spans, the top-level span simply serves as a way for us to identify that these spans come from the same session.

image.png

i am not super familiar with tracing in general, is there a reason why separating them would make it better?

i am familiar with the semgrep mcp implementation. and i think it is still doable if we want to separate the spans and not nest them under one top-level span. i would just have to initialize a new trace and pass the env var in every time i make an rpc call (instead of using the same trace info). i can also update the tracing on the semgrep mcp side to start a top-level scan per rpc call instead of per mcp process.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not super familiar with tracing in general, is there a reason why separating them would make it better?

i am familiar with the semgrep mcp implementation. and i think it is still doable if we want to separate the spans and not nest them under one top-level span. i would just have to initialize a new trace and pass the env var in every time i make an rpc call (instead of using the same trace info). i can also update the tracing on the semgrep mcp side to start a top-level scan per rpc call instead of per mcp process.

I think this is what we want. The reason being that every call we make is a separate transaction, since it's a different operation run by the user. Think of this as an HTTP request to a server. Every time the agent users semgrep MCP, it is essentially doing an entire new request, which should have its own trace.id and span group.

@liukatkat liukatkat force-pushed the katrina/linking-telemetry-to-ocaml branch from 7282056 to 954aa0f Compare August 5, 2025 17:54
@liukatkat liukatkat force-pushed the katrina/mcp-telemetry branch from 635f1ae to de12f0e Compare August 12, 2025 20:53
@liukatkat liukatkat force-pushed the katrina/linking-telemetry-to-ocaml branch from bb3b49a to 15cf5dc Compare August 12, 2025 20:53
Base automatically changed from katrina/mcp-telemetry to main August 12, 2025 22:45
@liukatkat liukatkat force-pushed the katrina/linking-telemetry-to-ocaml branch 2 times, most recently from b40ba32 to 573cd89 Compare August 18, 2025 20:25
@liukatkat liukatkat force-pushed the katrina/linking-telemetry-to-ocaml branch from 573cd89 to 34c2f92 Compare August 18, 2025 20:27
@liukatkat
Copy link
Collaborator Author

Although the code for tracing on the semgrep mcp command side isn't in the latest release yet, I have tested that it is still safe to merge this code in. The traces will be connected once Semgrep is released!

@liukatkat liukatkat merged commit feae361 into main Aug 18, 2025
13 checks passed
@liukatkat liukatkat deleted the katrina/linking-telemetry-to-ocaml branch August 18, 2025 20:48
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants