-
Notifications
You must be signed in to change notification settings - Fork 675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(anthropic): add instrumentation for Anthropic prompt caching #2175
Conversation
token_histogram.record( | ||
prompt_tokens, | ||
input_tokens, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am open to discussing whether this must be input tokens (i.e. all tokens sent by the user) or prompt tokens (i.e. the new tokens that anthropic has neither written to nor read from cache)
token_histogram.record( | ||
prompt_tokens, | ||
input_tokens, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
.../opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py
Show resolved
Hide resolved
...opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py
Show resolved
Hide resolved
packages/opentelemetry-instrumentation-anthropic/tests/data/1024+tokens.txt
Show resolved
Hide resolved
@@ -74,6 +74,11 @@ class SpanAttributes: | |||
LLM_OPENAI_API_VERSION = "gen_ai.openai.api_version" | |||
LLM_OPENAI_API_TYPE = "gen_ai.openai.api_type" | |||
|
|||
# Anthropic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am afraid that this PR won't pass tests, because pyproject.toml
in the anthropic
package looks at 0.4.1
, basically the PyPI version, but my changes rely on the new attributes here. Let me know if I need to merge this first somehow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I need to manually publish the semconv package cause poetry doesn't support deeply nested local dependencies :/
python-poetry/poetry#2270
LLM_ANTHROPIC_CACHE_CREATION_INPUT_TOKENS = "llm.anthropic.usage.cache_creation_input_tokens" | ||
LLM_ANTHROPIC_CACHE_READ_INPUT_TOKENS = "llm.anthropic.usage.cache_read_input_tokens" | ||
LLM_ANTHROPIC_TOTAL_INPUT_TOKENS = "llm.anthropic.usage.total_input_tokens" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic I followed when naming this attributes was the following:
- Feature: Support Prompt Caching #1858 (comment) this comment
llm
is apparently less standardized thangen_ai
, so we are free to experiment here- I haven't found a separate documentation for
gen_ai.anthropic
attributes like they have forgen_ai.openai
here
So I thought, a good in-between would be llm.anthropic.usage.*
, but I am open to additional thoughts.
Also, I am not sure if restricting this to anthropic is a good idea, because both google-generativeai and openai have at least some overlap with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue these should be gen_ai
and without anthropic
. Most of the "official" semantic conventions came after OpenLLMetry and were inspired but the work we've done here - so we should set the preferred path moving forward!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree generally, but given the context I provide in #2175 (comment) and the fact that what used to be called gen_ai.usage.prompt_tokens
is now apparently called gen_ai.usage.input_tokens
"officially", we need to decide what to call these, especially the TOTAL_INPUT_TOKENS
one.
So, for context, OpenAI does not return cache_creation_input_tokens
as they don't charge for the operation. Their usage block looks like this:
usage: {
total_tokens: 2306,
prompt_tokens: 2006,
completion_tokens: 300,
prompt_tokens_details: {
cached_tokens: 1920,
audio_tokens: 0,
},
completion_tokens_details: {
reasoning_tokens: 0,
audio_tokens: 0,
}
}
Gemini returns the number of cache_creation_input_tokens
from the cache.create
call, and they charge for storage of cache tokens per hour.
So I think, overall the formula is this:
input_tokens = cached_tokens + uncached_tokens
cached_tokens = cache_read_tokens # generally
cached_tokens = cache_read_tokens OR cache_creation_tokens # for Anthropic
and cache_read_tokens
and cache_creation_tokens
are charged for at vastly different prices.
With this in mind, the main open question is what are gen_ai.usage.prompt_tokens
and gen_ai.usage.input_tokens
bound to mean? Is it total input tokens or is it the number of uncached tokens charged for regular price? While the former makes more sense to me, I think this will sort of break any cost calculation implementations depending on this library?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! @nirga please check my latest commit
Also, I was surprised to learn that system messages are not instrumented in Anthropic, so I opened #2187. Also willing to contribute, unless anybody else picks that up faster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work @dinmukhamedm! Left one comment, and I'll manually publish the semconv package so the CI will pass here.
...opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py
Show resolved
Hide resolved
.../opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py
Show resolved
Hide resolved
else: | ||
cache_creation_tokens = 0 | ||
|
||
input_tokens = prompt_tokens + cache_read_tokens + cache_creation_tokens |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't you double-calculating the number of input tokens like this? From my understanding cache_read_tokens+cache_creation_tokens
should be exactly the number of tokens in the input. Or is it the case that either prompt_tokens
is set (for non-cached requests) OR cache_read_tokens+cache_creation_tokens
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, Anthropic fills all three in. If it is a cache write, cache_read_tokens == 0
. If it as a cache read, cache_creation_tokens == 0
. prompt_tokens
are tokens from uncached parts of the messages. For cache writes and reads is always about 3-4. I am assuming this is some control tokens or stop sequences or something. You can see some numbers I hard-coded in the tests here.
For example, if I send two text blocks of sizes 1200 and 100 tokens in one message and only direct Anthropic to cache the first one, the usage will be:
{"cache_read_input_tokens": 0, "cache_creation_input_tokens": 1200, "input_tokens": 104, "output_tokens": ...}
for the first call, and{"cache_read_input_tokens": 1200, "cache_creation_input_tokens": 0, "input_tokens": 104, "output_tokens": ...}
for the second
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intuition is that we should keep the input_tokens
constant across providers - so it should always be the number of tokens in the input - regardless if some of them were cached and some weren't
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(which is what I think you did - right?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, except for one place where I accidentally added just uncached tokens. Fixed in the last commit now.
@@ -74,6 +74,11 @@ class SpanAttributes: | |||
LLM_OPENAI_API_VERSION = "gen_ai.openai.api_version" | |||
LLM_OPENAI_API_TYPE = "gen_ai.openai.api_type" | |||
|
|||
# Anthropic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I need to manually publish the semconv package cause poetry doesn't support deeply nested local dependencies :/
python-poetry/poetry#2270
LLM_ANTHROPIC_CACHE_CREATION_INPUT_TOKENS = "llm.anthropic.usage.cache_creation_input_tokens" | ||
LLM_ANTHROPIC_CACHE_READ_INPUT_TOKENS = "llm.anthropic.usage.cache_read_input_tokens" | ||
LLM_ANTHROPIC_TOTAL_INPUT_TOKENS = "llm.anthropic.usage.total_input_tokens" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue these should be gen_ai
and without anthropic
. Most of the "official" semantic conventions came after OpenLLMetry and were inspired but the work we've done here - so we should set the preferred path moving forward!
@dinmukhamedm can you give me permissons to push to your PR? |
@nirga invited you with write permissions on our fork |
|
GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
---|---|---|---|---|---|
10447429 | Triggered | Generic High Entropy Secret | f3433b2 | packages/traceloop-sdk/traceloop/sdk/telemetry.py | View secret |
14250362 | Triggered | Generic High Entropy Secret | f3433b2 | packages/traceloop-sdk/traceloop/sdk/telemetry.py | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secrets safely. Learn here the best practices.
- Revoke and rotate these secrets.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
This PR addresses #1838 and is an alternative to the stale (and in the current form not working) #1858.
This is not a draft, it is a final working implementation. However, there are a couple open questions, that I will leave as comments to the relevant pieces of code.
Below is a screenshot of the new resulting attributes flattened to a YAML format:
feat(instrumentation): ...
orfix(instrumentation): ...
.