feat(llmobs): llmobs-specific context manager #10767

Yun-Kim · 2024-09-23T20:52:14Z

Summary

Public facing changes:

Any LLMObs method (export_span(), annotate()) that allow an optional span argument will now default to finding the current active LLMObs span rather than the current active APM span.
Adds multithreading (futures.multithreading) support for LLMObs. Previously multithreaded apps would result in broken traces.

Private changes:

LLMObs has its own context provider which keeps track of the active LLM-type span (generated by both LLMObs._start_span() and LLM integrations)
HTTPPropagation now adds LLMObs parent ID as a field on the request headers directly, rather than through the context object.
Adds private helper method LLMObs._instance.current_span(), returns the current active LLMObs-generated (integration, SDK) span.
Adds private helper method LLMObs._instance._current_trace_context(), returns current LLMObs context (which can represent both a span or a distributed span)
Adds a new field to the LLMObs span event struct, _dd which is a str/str dictionary containing the span/trace IDs of the APM span to correlate with. Currently these are the same span/trace IDs as the LLMObs span/trace ID, but this unlocks future steps of using independent span/trace IDs.

Previous behavior

LLMObs spans are based on APM spans, except LLMObs spans' parenting involves only other LLMObs spans. So with a potential trace structure containing a mixture of APM-specific and LLMObs spans, like:

Span A (LLMObs span) --> Span B (Apm-specific span) --> Span C (LLMObs span)

LLMObs only cares about the LLMObs spans, where span C's parent is the root span, even though in APM it would be span B. Combined with distributed tracing and multithreading, this makes it not so easy to determine that "correct" (read LLMObs) parenting tree for traces submitted to LLM Observability.

Problems with previous approach

Previously we worked around this by traversing the span's local parent tree and finding the next LLM-type span on both span start and finish for non-distributed cases, and for distributed cases we would attach the parent ID on the span context's meta field to be propagated in distributed request headers. However attaching things to the span context meta was not suitable long-term due to a couple factors:

Context objects are not thread-safe: in a multithreading case with n>1 child threads creating their own spans, the parent ID stored in the context object could be overwritten during thread execution, therefore incorrectly propagating parent IDs.
Context objects store trace-specific information, and are not designed for our use case where we skip spans here and there in the trace. This also leads to edge cases that were handled with ugly workaround code:

Example ugly workaround

Any meta fields set on the context object gets propagated as span tags on all subsequent spans in the trace on span start time, except for the spans in the first service of a trace which get propagated at span finish time. Fixing this resulted in overriding these span tags on span start and more checks on span finish.

Current approach

Instead of being dependent on a Context object that doesn't quite fit our use case and trying to make it fit our use case, we simply keep track of our own active LLMObs span/context:

LLMObsContextProvider handles keeping track of the current active LLMObs span via active() and activate()
Instead of traversing a span's local ancestor tree to solve for a span's llmobs parent ID, we just use LLMObsContextProvider._activate_llmobs_span() and set the llmobs parent ID as a tag at span start time.
(called by LLMObs._start_span() and BaseLLMIntegration.trace(submit_to_llmobs=True) and the bedrock integration).
LLMObs.inject_distributed_headers now uses the LLMObsContextProvider to inject the active llmobs span's ID into request headers
LLMObs.activate_distributed_headers() now uses the LLMObsContextProvider to activate the extracted llmobs context to continue the trace in a distributed case.
trace_utils.activate_distributed_headers() now includes automatic llmobs context activation if llmobs is enabled. I've config-gated this so that LLMObs is only imported for llmobs users (same for HTTPPropagator.inject().

By keeping track of our own active LLMObs spans, spans submitted to LLM Observability have an independent set of span and parent IDs, even if the span and trace IDs are shared with APM spans for now. This is the first step to decoupling from tracer internals.

Next steps

We can go further by generating LLMObs-specific span/trace IDs which are separate from APM. This will solve some edge cases with traces involving mixed APM/LLMObs spans.

Checklist

PR author has checked that all the criteria below are met
The PR description includes an overview of the change
The PR description articulates the motivation for the change
The change includes tests OR the PR description describes a testing strategy
The PR description notes risks associated with the change, if any
Newly-added code is easy to change
The change follows the library release note guidelines
The change includes or references documentation updates if necessary
Backport labels are set (if applicable)

Reviewer Checklist

Reviewer has checked that all the criteria below are met
Title is accurate
All changes are related to the pull request's stated goal
Avoids breaking API changes
Testing strategy adequately addresses listed risks
Newly-added code is easy to change
Release note makes sense to a user of the library
If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
Backport labels are set in a manner that is consistent with the release branch maintenance policy

github-actions · 2024-09-23T20:52:40Z

CODEOWNERS have been resolved as:

ddtrace/llmobs/_context.py                                              @DataDog/ml-observability
releasenotes/notes/feat-llmobs-context-cf709480b30ed0a5.yaml            @DataDog/apm-python
ddtrace/contrib/internal/futures/threading.py                           @DataDog/apm-core-python @DataDog/apm-idm-python
ddtrace/contrib/internal/trace_utils.py                                 @DataDog/apm-core-python @DataDog/apm-idm-python
ddtrace/llmobs/_constants.py                                            @DataDog/ml-observability
ddtrace/llmobs/_integrations/base.py                                    @DataDog/ml-observability
ddtrace/llmobs/_integrations/bedrock.py                                 @DataDog/ml-observability
ddtrace/llmobs/_integrations/langgraph.py                               @DataDog/ml-observability
ddtrace/llmobs/_llmobs.py                                               @DataDog/ml-observability
ddtrace/llmobs/_utils.py                                                @DataDog/ml-observability
ddtrace/llmobs/_writer.py                                               @DataDog/ml-observability
ddtrace/propagation/http.py                                             @DataDog/apm-sdk-api-python
tests/llmobs/_utils.py                                                  @DataDog/ml-observability
tests/llmobs/test_llmobs.py                                             @DataDog/ml-observability
tests/llmobs/test_llmobs_service.py                                     @DataDog/ml-observability
tests/llmobs/test_propagation.py                                        @DataDog/ml-observability
tests/tracer/test_propagation.py                                        @DataDog/apm-sdk-api-python

tests/llmobs/test_propagation.py

tests/llmobs/test_llmobs_trace_processor.py

tests/llmobs/test_propagation.py

ddtrace/contrib/trace_utils.py

pr-commenter · 2024-09-23T21:26:40Z

Benchmarks

Benchmark execution time: 2025-01-24 23:20:28

Comparing candidate commit 8f0a797 in PR branch yunkim/llmobs-context with baseline commit 48c6547 in branch main.

Found 2 performance improvements and 0 performance regressions! Performance is the same for 364 metrics, 2 unstable metrics.

scenario:iast_aspects-ospathbasename_aspect

🟩 execution_time [-339.437ns; -279.134ns] or [-9.299%; -7.647%]

scenario:iast_aspects-ospathdirname_aspect

🟩 execution_time [-498.889ns; -433.800ns] or [-12.121%; -10.539%]

tests/llmobs/test_propagation.py

erikayasuda

apm-core files LGTM!

datadog-dd-trace-py-rkomorn · 2024-09-24T18:58:46Z

Datadog Report

Branch report: yunkim/llmobs-context
Commit report: 5feb4ba
Test service: dd-trace-py

✅ 0 Failed, 130 Passed, 1378 Skipped, 4m 36.31s Total duration (34m 44.56s time saved)

sabrenner

LGMT! Just a couple small nits, but feel free to ignore if they're intentional. Really cool stuff, walked through the logic a couple times and seems OK to me.

ddtrace/llmobs/_context.py

ddtrace/llmobs/_llmobs.py

tests/llmobs/_utils.py

ddtrace/llmobs/_llmobs.py

datadog-dd-trace-py-rkomorn · 2025-01-23T16:18:32Z

Datadog Report

Branch report: yunkim/llmobs-context
Commit report: 8f0a797
Test service: dd-trace-py

✅ 0 Failed, 170 Passed, 1338 Skipped, 5m 3.79s Total duration (35m 6.26s time saved)

Yun-Kim · 2025-01-28T22:49:35Z

Gotta look at if relying on contextvars means we don't need to actively inject llmobs propagation into the futures threading integration.

Yun-Kim · 2025-01-28T23:05:12Z

ddtrace/propagation/http.py


-            _inject_llmobs_parent_id(span_context)
+            LLMObs._inject_llmobs_context(headers)


Instead of injecting into the headers, let's add it into the context and let the injection mechanisms propagate that into the headers.

Yun-Kim · 2025-01-28T23:05:55Z

ddtrace/contrib/internal/futures/threading.py

+    if ctx[1] is not None and config._llmobs_enabled:
+        from ddtrace.llmobs import LLMObs
+
+        LLMObs._instance._llmobs_context_provider.activate(ctx[1])


Explore using signals instead of importing!

Use our own context manager

b34e3bd

Yun-Kim mentioned this pull request Sep 23, 2024

feat(llmobs): llmobs context provider #10138

Closed

2 tasks

datadog-datadog-prod-us1 bot reviewed Sep 23, 2024

View reviewed changes

Yun-Kim added 2 commits September 23, 2024 17:29

Fix tests, refactor trace_utils

c99c0bc

Remove propagated parent ID key

191a0d0

datadog-datadog-prod-us1 bot reviewed Sep 23, 2024

View reviewed changes

tests/llmobs/test_propagation.py Outdated Show resolved Hide resolved

Yun-Kim marked this pull request as ready for review September 23, 2024 22:13

Yun-Kim requested review from a team as code owners September 23, 2024 22:13

Yun-Kim requested review from mabdinur, erikayasuda and wconti27 September 23, 2024 22:13

Yun-Kim and others added 3 commits September 24, 2024 13:04

Merge branch 'main' into yunkim/llmobs-context

b0ede81

Fix tracer dummy tests, removed propagation constant

bd9b539

Add dd apm span/trace IDs as separate fields

dd0a23a

erikayasuda approved these changes Sep 24, 2024

View reviewed changes

typing

0e41826

sabrenner approved these changes Sep 24, 2024

View reviewed changes

ddtrace/llmobs/_context.py Outdated Show resolved Hide resolved

ddtrace/llmobs/_context.py Outdated Show resolved Hide resolved

ddtrace/llmobs/_llmobs.py Outdated Show resolved Hide resolved

lievan approved these changes Sep 24, 2024

View reviewed changes

tests/llmobs/_utils.py Show resolved Hide resolved

Yun-Kim and others added 2 commits October 15, 2024 18:13

Merge branch 'main' into yunkim/llmobs-context

ec5779f

Address review comments

304bbe7

Yun-Kim requested a review from Kyle-Verhoog October 15, 2024 22:24

Add release note

80eced2

Yun-Kim requested a review from a team as a code owner October 16, 2024 19:53

Yun-Kim commented Oct 22, 2024

View reviewed changes

ddtrace/llmobs/_llmobs.py Outdated Show resolved Hide resolved

Yun-Kim commented Oct 22, 2024

View reviewed changes

ddtrace/llmobs/_llmobs.py Outdated Show resolved Hide resolved

github-actions bot added the stale label Dec 22, 2024

Merge branch 'main' into yunkim/llmobs-context

2f04735

Yun-Kim force-pushed the yunkim/llmobs-context branch from a7cd6ad to 2f04735 Compare January 6, 2025 19:36

Typing fix

8565dd8

github-actions bot removed the stale label Jan 7, 2025

mabdinur approved these changes Jan 8, 2025

View reviewed changes

Yun-Kim and others added 3 commits January 9, 2025 16:46

Merge branch 'main' into yunkim/llmobs-context

76f00e3

Ignore stderr due to noisy logs

88fb4ae

Merge branch 'main' into yunkim/llmobs-context

173271f

Yun-Kim force-pushed the yunkim/llmobs-context branch from 114d58f to e9e64ca Compare January 15, 2025 20:23

Merge branch 'main' into yunkim/llmobs-context

5feb4ba

Yun-Kim force-pushed the yunkim/llmobs-context branch from e9e64ca to 5feb4ba Compare January 15, 2025 22:05

Yun-Kim added 2 commits January 22, 2025 16:51

Merge branch 'main' into yunkim/llmobs-context

e5be797

Port over context changes to langgraph util

c37d1b2

Fix merge conflict omission in tests

35aae06

Yun-Kim enabled auto-merge (squash) January 24, 2025 20:03

Merge branch 'main' into yunkim/llmobs-context

8f0a797

Yun-Kim commented Jan 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llmobs): llmobs-specific context manager #10767

feat(llmobs): llmobs-specific context manager #10767

Yun-Kim commented Sep 23, 2024 •

edited

Loading

github-actions bot commented Sep 23, 2024 •

edited

Loading

pr-commenter bot commented Sep 23, 2024 •

edited

Loading

erikayasuda left a comment

datadog-dd-trace-py-rkomorn bot commented Sep 24, 2024 •

edited

Loading

sabrenner left a comment

datadog-dd-trace-py-rkomorn bot commented Jan 23, 2025 •

edited

Loading

Yun-Kim commented Jan 28, 2025

Yun-Kim Jan 28, 2025

Yun-Kim Jan 28, 2025


		_inject_llmobs_parent_id(span_context)
		LLMObs._inject_llmobs_context(headers)

feat(llmobs): llmobs-specific context manager #10767

Are you sure you want to change the base?

feat(llmobs): llmobs-specific context manager #10767

Conversation

Yun-Kim commented Sep 23, 2024 • edited Loading

Summary

Previous behavior

Problems with previous approach

Current approach

Next steps

Checklist

Reviewer Checklist

github-actions bot commented Sep 23, 2024 • edited Loading

pr-commenter bot commented Sep 23, 2024 • edited Loading

Benchmarks

scenario:iast_aspects-ospathbasename_aspect

scenario:iast_aspects-ospathdirname_aspect

erikayasuda left a comment

Choose a reason for hiding this comment

datadog-dd-trace-py-rkomorn bot commented Sep 24, 2024 • edited Loading

Datadog Report

sabrenner left a comment

Choose a reason for hiding this comment

datadog-dd-trace-py-rkomorn bot commented Jan 23, 2025 • edited Loading

Datadog Report

Yun-Kim commented Jan 28, 2025

Yun-Kim Jan 28, 2025

Choose a reason for hiding this comment

Yun-Kim Jan 28, 2025

Choose a reason for hiding this comment

Yun-Kim commented Sep 23, 2024 •

edited

Loading

github-actions bot commented Sep 23, 2024 •

edited

Loading

pr-commenter bot commented Sep 23, 2024 •

edited

Loading

datadog-dd-trace-py-rkomorn bot commented Sep 24, 2024 •

edited

Loading

datadog-dd-trace-py-rkomorn bot commented Jan 23, 2025 •

edited

Loading