fix(anthropic): add instrumentation for Anthropic prompt caching #2175

dinmukhamedm · 2024-10-20T23:21:49Z

This PR addresses #1838 and is an alternative to the stale (and in the current form not working) #1858.

This is not a draft, it is a final working implementation. However, there are a couple open questions, that I will leave as comments to the relevant pieces of code.

Below is a screenshot of the new resulting attributes flattened to a YAML format:

I have added tests that cover my changes.
If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
(If applicable) I have updated the documentation accordingly.

dinmukhamedm · 2024-10-20T23:23:16Z

.../opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py

        token_histogram.record(
-            prompt_tokens,
+            input_tokens,


I am open to discussing whether this must be input tokens (i.e. all tokens sent by the user) or prompt tokens (i.e. the new tokens that anthropic has neither written to nor read from cache)

dinmukhamedm · 2024-10-20T23:23:32Z

.../opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py

        token_histogram.record(
-            prompt_tokens,
+            input_tokens,


.../opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py

...opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py

packages/opentelemetry-instrumentation-anthropic/tests/data/1024+tokens.txt

dinmukhamedm · 2024-10-20T23:31:41Z

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py

@@ -74,6 +74,11 @@ class SpanAttributes:
    LLM_OPENAI_API_VERSION = "gen_ai.openai.api_version"
    LLM_OPENAI_API_TYPE = "gen_ai.openai.api_type"

+    # Anthropic


I am afraid that this PR won't pass tests, because pyproject.toml in the anthropic package looks at 0.4.1, basically the PyPI version, but my changes rely on the new attributes here. Let me know if I need to merge this first somehow

Yeah I need to manually publish the semconv package cause poetry doesn't support deeply nested local dependencies :/
python-poetry/poetry#2270

dinmukhamedm · 2024-10-20T23:35:43Z

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py

+    LLM_ANTHROPIC_CACHE_CREATION_INPUT_TOKENS = "llm.anthropic.usage.cache_creation_input_tokens"
+    LLM_ANTHROPIC_CACHE_READ_INPUT_TOKENS = "llm.anthropic.usage.cache_read_input_tokens"
+    LLM_ANTHROPIC_TOTAL_INPUT_TOKENS = "llm.anthropic.usage.total_input_tokens"


The logic I followed when naming this attributes was the following:

Feature: Support Prompt Caching #1858 (comment) this comment

llm is apparently less standardized than gen_ai, so we are free to experiment here

I haven't found a separate documentation for gen_ai.anthropic attributes like they have for gen_ai.openai here

So I thought, a good in-between would be llm.anthropic.usage.*, but I am open to additional thoughts.

Also, I am not sure if restricting this to anthropic is a good idea, because both google-generativeai and openai have at least some overlap with this.

I'd argue these should be gen_ai and without anthropic. Most of the "official" semantic conventions came after OpenLLMetry and were inspired but the work we've done here - so we should set the preferred path moving forward!

I agree generally, but given the context I provide in #2175 (comment) and the fact that what used to be called gen_ai.usage.prompt_tokens is now apparently called gen_ai.usage.input_tokens "officially", we need to decide what to call these, especially the TOTAL_INPUT_TOKENS one.

So, for context, OpenAI does not return cache_creation_input_tokens as they don't charge for the operation. Their usage block looks like this:

usage: { total_tokens: 2306, prompt_tokens: 2006, completion_tokens: 300, prompt_tokens_details: { cached_tokens: 1920, audio_tokens: 0, }, completion_tokens_details: { reasoning_tokens: 0, audio_tokens: 0, } }

Gemini returns the number of cache_creation_input_tokens from the cache.create call, and they charge for storage of cache tokens per hour.

So I think, overall the formula is this:

input_tokens = cached_tokens + uncached_tokens cached_tokens = cache_read_tokens # generally cached_tokens = cache_read_tokens OR cache_creation_tokens # for Anthropic

and cache_read_tokens and cache_creation_tokens are charged for at vastly different prices.

With this in mind, the main open question is what are gen_ai.usage.prompt_tokens and gen_ai.usage.input_tokens bound to mean? Is it total input tokens or is it the number of uncached tokens charged for regular price? While the former makes more sense to me, I think this will sort of break any cost calculation implementations depending on this library?

Done! @nirga please check my latest commit

dinmukhamedm · 2024-10-21T03:34:06Z

Also, I was surprised to learn that system messages are not instrumented in Anthropic, so I opened #2187. Also willing to contribute, unless anybody else picks that up faster

nirga

Excellent work @dinmukhamedm! Left one comment, and I'll manually publish the semconv package so the CI will pass here.

...opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py

.../opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py

nirga · 2024-10-21T09:51:33Z

.../opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py

+    else:
+        cache_creation_tokens = 0
+
+    input_tokens = prompt_tokens + cache_read_tokens + cache_creation_tokens


Aren't you double-calculating the number of input tokens like this? From my understanding cache_read_tokens+cache_creation_tokens should be exactly the number of tokens in the input. Or is it the case that either prompt_tokens is set (for non-cached requests) OR cache_read_tokens+cache_creation_tokens?

Nope, Anthropic fills all three in. If it is a cache write, cache_read_tokens == 0. If it as a cache read, cache_creation_tokens == 0. prompt_tokens are tokens from uncached parts of the messages. For cache writes and reads is always about 3-4. I am assuming this is some control tokens or stop sequences or something. You can see some numbers I hard-coded in the tests here.

For example, if I send two text blocks of sizes 1200 and 100 tokens in one message and only direct Anthropic to cache the first one, the usage will be:

{"cache_read_input_tokens": 0, "cache_creation_input_tokens": 1200, "input_tokens": 104, "output_tokens": ...} for the first call, and

{"cache_read_input_tokens": 1200, "cache_creation_input_tokens": 0, "input_tokens": 104, "output_tokens": ...} for the second

My intuition is that we should keep the input_tokens constant across providers - so it should always be the number of tokens in the input - regardless if some of them were cached and some weren't

(which is what I think you did - right?)

Yes, except for one place where I accidentally added just uncached tokens. Fixed in the last commit now.

nirga · 2024-10-21T09:52:05Z

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py

@@ -74,6 +74,11 @@ class SpanAttributes:
    LLM_OPENAI_API_VERSION = "gen_ai.openai.api_version"
    LLM_OPENAI_API_TYPE = "gen_ai.openai.api_type"

+    # Anthropic


Yeah I need to manually publish the semconv package cause poetry doesn't support deeply nested local dependencies :/
python-poetry/poetry#2270

nirga · 2024-10-21T09:53:00Z

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py

+    LLM_ANTHROPIC_CACHE_CREATION_INPUT_TOKENS = "llm.anthropic.usage.cache_creation_input_tokens"
+    LLM_ANTHROPIC_CACHE_READ_INPUT_TOKENS = "llm.anthropic.usage.cache_read_input_tokens"
+    LLM_ANTHROPIC_TOTAL_INPUT_TOKENS = "llm.anthropic.usage.total_input_tokens"


I'd argue these should be gen_ai and without anthropic. Most of the "official" semantic conventions came after OpenLLMetry and were inspired but the work we've done here - so we should set the preferred path moving forward!

nirga · 2024-10-23T15:51:28Z

@dinmukhamedm can you give me permissons to push to your PR?

dinmukhamedm · 2024-10-23T17:27:12Z

@nirga invited you with write permissions on our fork

gitguardian · 2024-10-23T17:40:20Z

⚠️ GitGuardian has uncovered 2 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

Since your pull request originates from a forked repository, GitGuardian is not able to associate the secrets uncovered with secret incidents on your GitGuardian dashboard.
Skipping this check run and merging your pull request will create secret incidents on your GitGuardian dashboard.

🔎 Detected hardcoded secrets in your pull request

GitGuardian id	GitGuardian status	Secret	Commit	Filename
10447429	Triggered	Generic High Entropy Secret	`f3433b2`	packages/traceloop-sdk/traceloop/sdk/telemetry.py	View secret
14250362	Triggered	Generic High Entropy Secret	`f3433b2`	packages/traceloop-sdk/traceloop/sdk/telemetry.py	View secret

🛠 Guidelines to remediate hardcoded secrets

Understand the implications of revoking this secret by investigating where it is used in your code.
Replace and store your secrets safely. Learn here the best practices.
Revoke and rotate these secrets.
If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider

following these best practices for managing and storing secrets including API keys and other credentials
install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.

^{🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.}

[traceloop#1838] add caching for anthropic

0cabaae

dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. new instrumentation python Pull requests that update Python code testing labels Oct 20, 2024

dinmukhamedm commented Oct 20, 2024

View reviewed changes

dinmukhamedm changed the title ~~fix(anthropic): add instrumentation for Anthropic~~ fix(anthropic): add instrumentation for Anthropic prompt caching Oct 21, 2024

nirga reviewed Oct 21, 2024

View reviewed changes

dinmukhamedm added 2 commits October 22, 2024 13:19

update gen_ai.usage and change prompt tokens to always show input tokens

939b26e

fix lint

e6f506d

nirga approved these changes Oct 23, 2024

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Oct 23, 2024

nirga added 2 commits October 23, 2024 17:50

chore: manually publish semconv

981ef5b

Merge branch 'main' into pr/2175

f3433b2

nirga merged commit 144c82a into traceloop:main Oct 23, 2024
8 of 9 checks passed

nirga deleted the anthropic-caching branch October 23, 2024 17:48

nirga mentioned this pull request Nov 9, 2024

🚀 Feature: Support Prompt Caching #1838

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(anthropic): add instrumentation for Anthropic prompt caching #2175

fix(anthropic): add instrumentation for Anthropic prompt caching #2175

dinmukhamedm commented Oct 20, 2024

dinmukhamedm Oct 20, 2024

dinmukhamedm Oct 20, 2024

dinmukhamedm Oct 20, 2024

nirga Oct 21, 2024

dinmukhamedm Oct 20, 2024 •

edited

Loading

nirga Oct 21, 2024

dinmukhamedm Oct 21, 2024

dinmukhamedm Oct 22, 2024

dinmukhamedm commented Oct 21, 2024

nirga left a comment

nirga Oct 21, 2024

dinmukhamedm Oct 21, 2024

nirga Oct 22, 2024

nirga Oct 22, 2024

dinmukhamedm Oct 22, 2024

nirga Oct 21, 2024

nirga Oct 21, 2024

nirga commented Oct 23, 2024

dinmukhamedm commented Oct 23, 2024

gitguardian bot commented Oct 23, 2024

fix(anthropic): add instrumentation for Anthropic prompt caching #2175

fix(anthropic): add instrumentation for Anthropic prompt caching #2175

Conversation

dinmukhamedm commented Oct 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dinmukhamedm Oct 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dinmukhamedm commented Oct 21, 2024

nirga left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nirga commented Oct 23, 2024

dinmukhamedm commented Oct 23, 2024

gitguardian bot commented Oct 23, 2024

⚠️ GitGuardian has uncovered 2 secrets following the scan of your pull request.

dinmukhamedm Oct 20, 2024 •

edited

Loading