feat: connect traces to semgrep rpc process #95

liukatkat · 2025-08-01T17:27:19Z

What:

This PR lets us nest spans from the semgrep rpc process under spans for this mcp server.

How:

We pass in the span id and the trace id to the process through environment variables.

Test plan:

You can see the traces nesting properly on Datadog!

(Edit: sometimes the the top-level trace disappears...)

liukatkat · 2025-08-01T17:27:43Z

feat: connect traces to semgrep rpc process #95 👈 (View in Graphite)
feat: add basic tracing to mcp #93
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

flaper87 · 2025-08-05T08:35:44Z

src/semgrep_mcp/semgrep.py

+    env["SEMGREP_LOG_SRCS"] = "mcp,commons"
+    if top_level_span:
+        env["SEMGREP_TRACE_PARENT_SPAN_ID"] = trace.format_span_id(
+            top_level_span.get_span_context().span_id
+        )
+        env["SEMGREP_TRACE_PARENT_TRACE_ID"] = trace.format_trace_id(
+            top_level_span.get_span_context().trace_id
+        )


Couple of questions:

Does semgrep CLI read these variables?

How would this work with the RPC-based implementation? Should we align with the work there first?

Yes! semgrep links spans by reading these variables. The variables are defined here.

Thanks for clarifying!

How would this change with the new rpc implementation? So you know?

currently, i enable tracing only when we start up the ocaml mcp process. then, when the user invoke the semgrep_scan_rpc tool, we perform the rpc call while passing the tracing info through these env vars to the ocaml mcp process so the spans can be linked!

Sorry, I realize I have not provided enough context.

When using this semgrep rpc, I believe the strategy is to start a single semgrep mcp process thst will run in background and accept rpc calls.

If we use env variables, then the process will use the same trace.id and span.id for every call. I don't think this is what we want.

Evry rpc call should get it's own span.id/trace.id as they belong to a different transaction, right?

Am I missing something on how this is currently implemented in semgrep?

my understanding is that server_lifespan gets triggered every time a user starts a session (e.g. when they open cursor), and we will then create a semgrep mcp process per session. so, we will have one mcp (python) server, which spins up multiple semgrep mcp processes.

my understanding is that server_lifespan gets triggered every time a user starts a session (e.g. when they open cursor), and we will then create a semgrep mcp process per session. so, we will have one mcp (python) server, which spins up multiple semgrep mcp processes.

This is correct. However, for obeservability, you'd normally prefer to have a new transaction, per operation. This allows for analyzing requests independently.

If we use the same transaction and span for the whole session, that means we will see all calls done by the agent under the same view and analysis on the observability cluster.

Again, I may be missing something about how this was implemented in semgrep mcp as I'm too familiar with the CLI:

Are you familiar with the implementation?

Do you have examples of how the transactions look like once ingested by an observability cluster?

based on my current implementation, it is true that you will see all the calls to the tool under one top-level span. but you can still see the individual spans separately.

i can also look at these individual spans, the top-level span simply serves as a way for us to identify that these spans come from the same session.

i am not super familiar with tracing in general, is there a reason why separating them would make it better?

i am familiar with the semgrep mcp implementation. and i think it is still doable if we want to separate the spans and not nest them under one top-level span. i would just have to initialize a new trace and pass the env var in every time i make an rpc call (instead of using the same trace info). i can also update the tracing on the semgrep mcp side to start a top-level scan per rpc call instead of per mcp process.

i am not super familiar with tracing in general, is there a reason why separating them would make it better?

i am familiar with the semgrep mcp implementation. and i think it is still doable if we want to separate the spans and not nest them under one top-level span. i would just have to initialize a new trace and pass the env var in every time i make an rpc call (instead of using the same trace info). i can also update the tracing on the semgrep mcp side to start a top-level scan per rpc call instead of per mcp process.

I think this is what we want. The reason being that every call we make is a separate transaction, since it's a different operation run by the user. Think of this as an HTTP request to a server. Every time the agent users semgrep MCP, it is essentially doing an entire new request, which should have its own trace.id and span group.

liukatkat · 2025-08-18T20:47:53Z

Although the code for tracing on the semgrep mcp command side isn't in the latest release yet, I have tested that it is still safe to merge this code in. The traces will be connected once Semgrep is released!

This was referenced Aug 1, 2025

feat: core RPC scanning #87

Merged

feat: add basic tracing to mcp #93

Merged

liukatkat mentioned this pull request Aug 1, 2025

chore: exclude semgrep interfaces from pyright #94

Closed

liukatkat changed the title ~~link telemetry to ocaml server~~ feat: connect traces to semgrep rpc process Aug 1, 2025

liukatkat marked this pull request as ready for review August 1, 2025 17:33

liukatkat force-pushed the katrina/mcp-telemetry branch from b3df907 to 68e079e Compare August 1, 2025 20:07

liukatkat force-pushed the katrina/linking-telemetry-to-ocaml branch 2 times, most recently from 022ad07 to 560bd7b Compare August 1, 2025 20:20

liukatkat force-pushed the katrina/mcp-telemetry branch from 68e079e to a292fcd Compare August 1, 2025 20:20

liukatkat force-pushed the katrina/linking-telemetry-to-ocaml branch from 560bd7b to aa70e5b Compare August 1, 2025 20:26

liukatkat force-pushed the katrina/mcp-telemetry branch from a292fcd to f96dba9 Compare August 1, 2025 20:26

liukatkat force-pushed the katrina/linking-telemetry-to-ocaml branch from aa70e5b to dda0f6c Compare August 2, 2025 00:21

liukatkat force-pushed the katrina/mcp-telemetry branch 2 times, most recently from a5b7303 to d1fb1c2 Compare August 4, 2025 21:34

liukatkat force-pushed the katrina/linking-telemetry-to-ocaml branch from dda0f6c to 7282056 Compare August 4, 2025 21:34

flaper87 reviewed Aug 5, 2025

View reviewed changes

liukatkat force-pushed the katrina/linking-telemetry-to-ocaml branch from 7282056 to 954aa0f Compare August 5, 2025 17:54

liukatkat force-pushed the katrina/mcp-telemetry branch from 635f1ae to de12f0e Compare August 12, 2025 20:53

liukatkat force-pushed the katrina/linking-telemetry-to-ocaml branch from bb3b49a to 15cf5dc Compare August 12, 2025 20:53

Base automatically changed from katrina/mcp-telemetry to main August 12, 2025 22:45

brandonspark approved these changes Aug 18, 2025

View reviewed changes

liukatkat force-pushed the katrina/linking-telemetry-to-ocaml branch 2 times, most recently from b40ba32 to 573cd89 Compare August 18, 2025 20:25

liukatkat added 4 commits August 18, 2025 13:26

trigger ci

3b1b002

link telemetry to ocaml server

cf0d332

add back accidentally removed comma

eb19809

add --trace

34c2f92

liukatkat force-pushed the katrina/linking-telemetry-to-ocaml branch from 573cd89 to 34c2f92 Compare August 18, 2025 20:27

liukatkat merged commit feae361 into main Aug 18, 2025
13 checks passed

liukatkat deleted the katrina/linking-telemetry-to-ocaml branch August 18, 2025 20:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: connect traces to semgrep rpc process #95

feat: connect traces to semgrep rpc process #95

Uh oh!

liukatkat commented Aug 1, 2025 •

edited

Loading

Uh oh!

liukatkat commented Aug 1, 2025 •

edited

Loading

Uh oh!

flaper87 Aug 5, 2025

Uh oh!

liukatkat Aug 5, 2025

Uh oh!

flaper87 Aug 5, 2025

Uh oh!

liukatkat Aug 5, 2025

Uh oh!

flaper87 Aug 5, 2025

Uh oh!

liukatkat Aug 6, 2025 •

edited

Loading

Uh oh!

flaper87 Aug 6, 2025

Uh oh!

liukatkat Aug 6, 2025

Uh oh!

flaper87 Aug 7, 2025

Uh oh!

liukatkat commented Aug 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: connect traces to semgrep rpc process #95

feat: connect traces to semgrep rpc process #95

Uh oh!

Conversation

liukatkat commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What:

How:

Test plan:

Uh oh!

liukatkat commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liukatkat Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liukatkat commented Aug 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

liukatkat commented Aug 1, 2025 •

edited

Loading

liukatkat commented Aug 1, 2025 •

edited

Loading

liukatkat Aug 6, 2025 •

edited

Loading