Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(anthropic): add instrumentation for Anthropic tool calling #1372

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,23 @@ def _set_input_attributes(span, kwargs):
set_span_attribute(
span, f"{SpanAttributes.LLM_PROMPTS}.{i}.role", message.get("role")
)
if kwargs.get("tools"):
for i, tool in enumerate(kwargs.get("tools")):
set_span_attribute(
span,
f"{SpanAttributes.LLM_REQUEST_FUNCTIONS}.{i}.name",
tool.get("name")
)
set_span_attribute(
span,
f"{SpanAttributes.LLM_REQUEST_FUNCTIONS}.{i}.description",
tool.get("description")
)
set_span_attribute(
span,
f"{SpanAttributes.LLM_REQUEST_FUNCTIONS}.{i}.input_schema",
json.dumps(tool.get("input_schema"))
)


def _set_span_completions(span, response):
Expand Down Expand Up @@ -320,6 +337,39 @@ def _set_response_attributes(span, response):
prompt_tokens + completion_tokens,
)

if response.get("role"):
set_span_attribute(span, f"{SpanAttributes.LLM_COMPLETIONS}.role", response.get("role"))

if response.get("stop_reason"):
set_span_attribute(span, f"{SpanAttributes.LLM_COMPLETIONS}.stop_reason", response.get("stop_reason"))

if response.get("content"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @peachypeachyy this is already happening in _set_span_completions. What I meant is editing the code there to specify the tool selection (if a tool was selected) - and following the similar conventions we have in the OpenAI instrumentation

Copy link
Author

@peachypeachyy peachypeachyy Jun 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @nirga let me step you through my thought process.

Following is the request being sent:

response = client.messages.create(
    model="claude-3-opus-20240229",
    max_tokens=1024,
    tools=[
        {
            "name": "get_weather",
            "description": "Get the current weather in a given location",
            "input_schema": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "The unit of temperature, either 'celsius' or 'fahrenheit'"
                    }
                },
                "required": ["location"]
            }
        },
        {
            "name": "get_time",
            "description": "Get the current time in a given time zone",
            "input_schema": {
                "type": "object",
                "properties": {
                    "timezone": {
                        "type": "string",
                        "description": "The IANA time zone name, e.g. America/Los_Angeles"
                    }
                },
                "required": ["timezone"]
            }
        }
    ],
    messages=[
        {
            "role": "user",
            "content": "What is the weather like right now in New York City, NY?"
        }
    ]
)

The corresponding formatted response is:

{'content': [TextBlock(text='<thinking>\nThe get_weather function is relevant to answer this question, as it provides the current weather for a given location.\n\nIt requires the "location" parameter, which in this case is provided by the user as "New York City, NY". \n\nThe "unit" parameter is optional. The user did not specify a preference, so we can use the default.\n\nSince all required parameters are available, we can proceed with calling the get_weather function to get the current weather in New York City.\n</thinking>', type='text'),
             ToolUseBlock(id='toolu_013gp9gPBaYhPVLMeq4N34yU', input={'location': 'New York City, NY'}, name='get_weather', type='tool_use')],
 'id': 'msg_01H62BKGLJBD8zstmvW21Cp4',
 'model': 'claude-3-opus-20240229',
 'role': 'assistant',
 'stop_reason': 'tool_use',
 'stop_sequence': None,
 'type': 'message',
 'usage': Usage(input_tokens=748, output_tokens=165)}

In this case, even though we have 2 functions passed in tools, get_weather and get_time, claude's opus determined to use get_weather as seen in the ToolUseBlock.

I have captured this tool get_weather which is being used within ToolUseBlock using f"{SpanAttributes.LLM_COMPLETIONS}.{i}.name".

Now, which tool has to be used is governed by the parameter tool_choice which needs to be set in the request. I have not set it in this example, by default the value is auto. Do you need me to set spans for this within the request? something like f"{SpanAttributes.LLM_REQUEST_FUNCTIONS}.{i}.tool_choice == auto

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peachypeachyy can you look at set_span_completions? It's also setting the same attributes you're setting here. You should only add the tool choice logging, similar to OpenAI

for i, content in enumerate(response.get("content")):
if dict(content).get('id') is not None:
set_span_attribute(
span,
f"{SpanAttributes.LLM_COMPLETIONS}.{i}.id",
dict(content).get('id'),
)
if dict(content).get('type') is not None:
set_span_attribute(
span,
f"{SpanAttributes.LLM_COMPLETIONS}.{i}.type",
dict(content).get('type'),
)
if dict(content).get('input') is not None:
set_span_attribute(
span,
f"{SpanAttributes.LLM_COMPLETIONS}.{i}.input",
json.dumps(dict(content).get('input')),
)
if dict(content).get('name') is not None:
set_span_attribute(
span,
f"{SpanAttributes.LLM_COMPLETIONS}.{i}.name",
dict(content).get('name'),
)

if should_send_prompts():
_set_span_completions(span, response)

Expand Down
9 changes: 5 additions & 4 deletions packages/opentelemetry-instrumentation-anthropic/poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ pytest = "^8.2.2"
pytest-sugar = "1.0.0"

[tool.poetry.group.test.dependencies]
anthropic = ">=0.21.3,<0.29.0"
anthropic = ">=0.27.0"
pytest = "^8.2.2"
pytest-sugar = "1.0.0"
vcrpy = "^6.0.1"
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
interactions:
- request:
body: '{"max_tokens": 1024, "messages": [{"role": "user", "content": "What is
the weather like right now in New York? Also what time is it there?"}], "model":
"claude-3-opus-20240229", "tools": [{"name": "get_weather", "description": "Get
the current weather in a given location", "input_schema": {"type": "object",
"properties": {"location": {"type": "string", "description": "The city and state,
e.g. San Francisco, CA"}, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"],
"description": "The unit of temperature, either ''celsius'' or ''fahrenheit''"}},
"required": ["location"]}}, {"name": "get_time", "description": "Get the current
time in a given time zone", "input_schema": {"type": "object", "properties":
{"timezone": {"type": "string", "description": "The IANA time zone name, e.g.
America/Los_Angeles"}}, "required": ["timezone"]}}]}'
headers:
accept:
- application/json
accept-encoding:
- gzip, deflate
anthropic-version:
- '2023-06-01'
connection:
- keep-alive
content-length:
- '845'
content-type:
- application/json
host:
- api.anthropic.com
user-agent:
- Anthropic/Python 0.29.0
x-stainless-arch:
- x64
x-stainless-async:
- 'false'
x-stainless-lang:
- python
x-stainless-os:
- Linux
x-stainless-package-version:
- 0.29.0
x-stainless-runtime:
- CPython
x-stainless-runtime-version:
- 3.10.14
method: POST
uri: https://api.anthropic.com/v1/messages
response:
body:
string: !!binary |
H4sIAAAAAAAAA4RTYW/TMBD9K6f7wpe0awPbWDQhVTAQaKoQVJsGQZGb3BrT5JzZ54Wu6n9HdklX
ISQ+xbk7v3vv6XmLusIMW7cqJtN3sujO3OPVslf3b282Z0+fp1/mS0xQNh2FKXJOrQgTtKYJBeWc
dqJYMMHWVNRghmWjfEWjlyPTeTdKJ+mrSZpeYIKlYSEWzL5vB0ChX+Fq/GR4KbXmtebVm5wXBhS7
nixIrR08eLKbBHqCXjcNMFEFYsA7AukNiDGNy3LOeTqGFUnRk5KaLIxgYUIBpCYovbXEAkNTM8yp
hztj12NYhDWWHry25OJ4jo0plWjDOUKnrGpJyCbQ17qs44R3ZKFWDjprHnVFFSgHOQ6gOQbYAORZ
yzEIaAemC9CqGQfa6Z626Jb+zTl2/ks4TD0ZpuNdew6RaqUrYCNQaUulNJuBd/Q4gaUP3kCpGDTf
B5YCS+WoAsNxweDHXxqHc1Cl95NXyglZhkWg/c0wDbYFt8LAx9l8thcV+AKrNvCftWR1qU7m1Bd/
0IM7X03gVatHineZyhBEu3kW6UIaWrUmWBqpYx6gVE0T64cg7W14EePkghA3zvny5BA73CXP0TSm
KbwLYY8vJPz7YjL9cHF+a+zPm0/96+v3d9e3s2mVnoUQBwmY4VH6wlXuvGC2PUQJs4N1uNv9SNCJ
6QpLysXe0dbYcPTgiUvCjH3TJOjj+8u2e+BCzJrYYXZ+miZovBzX0snpbvcbAAD//wMAiN3y7N8D
AAA=
headers:
CF-Cache-Status:
- DYNAMIC
CF-RAY:
- 89712d1918f04631-SIN
Connection:
- keep-alive
Content-Encoding:
- gzip
Content-Type:
- application/json
Date:
- Fri, 21 Jun 2024 04:15:22 GMT
Server:
- cloudflare
Transfer-Encoding:
- chunked
anthropic-ratelimit-requests-limit:
- '50'
anthropic-ratelimit-requests-remaining:
- '50'
anthropic-ratelimit-requests-reset:
- '2024-06-21T04:16:18Z'
anthropic-ratelimit-tokens-limit:
- '20000'
anthropic-ratelimit-tokens-remaining:
- '20000'
anthropic-ratelimit-tokens-reset:
- '2024-06-21T04:16:18Z'
request-id:
- req_01X7p5rhys5xac6P9AmQiGs4
via:
- 1.1 google
x-cloud-trace-context:
- 791e086e0002047f61d7da1c157754d9
status:
code: 200
message: OK
version: 1
Original file line number Diff line number Diff line change
Expand Up @@ -570,3 +570,162 @@ async def test_async_anthropic_message_streaming(exporter, reader):
assert found_token_metric is True
assert found_choice_metric is True
assert found_duration_metric is True


@pytest.mark.vcr
def test_anthropic_tools(exporter, reader):
client = Anthropic()
response = client.messages.create(
model="claude-3-opus-20240229",
max_tokens=1024,
tools=[
{
"name": "get_weather",
"description": "Get the current weather in a given location",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The unit of temperature, either 'celsius' or 'fahrenheit'"
}
},
"required": ["location"]
}
},
{
"name": "get_time",
"description": "Get the current time in a given time zone",
"input_schema": {
"type": "object",
"properties": {
"timezone": {
"type": "string",
"description": "The IANA time zone name, e.g. America/Los_Angeles"
}
},
"required": ["timezone"]
}
}
],
messages=[
{
"role": "user",
"content": "What is the weather like right now in New York? Also what time is it there?"
}
]
)
try:
client.messages.create(
unknown_parameter="unknown",
)
except Exception:
pass

spans = exporter.get_finished_spans()
assert all(span.name == "anthropic.chat" for span in spans)

anthropic_span = spans[0]

assert (
anthropic_span.attributes["gen_ai.prompt.0.content"] ==
"What is the weather like right now in New York? Also what time is it there?"
)

assert (anthropic_span.attributes["gen_ai.prompt.0.role"]) == "user"
assert (anthropic_span.attributes.get("gen_ai.completion.0.content") == response.content[0].text)

assert anthropic_span.attributes["gen_ai.usage.prompt_tokens"] == 18
assert (
anthropic_span.attributes["gen_ai.usage.completion_tokens"]
+ anthropic_span.attributes["gen_ai.usage.prompt_tokens"]
== anthropic_span.attributes["llm.usage.total_tokens"]
)

assert (
anthropic_span.attributes["llm.request.functions.0.name"] == "get_weather"
)
assert (
anthropic_span.attributes["llm.request.functions.0.description"]
== "Get the current weather in a given location"
)

assert (anthropic_span.attributes["llm.request.functions.1.name"]) == "get_time"
assert (
anthropic_span.attributes["llm.request.functions.1.description"]
== "Get the current time in a given time zone"
)

assert (anthropic_span.attributes["gen_ai.completion.0.finish_reason"]) == "tool_use"

assert (anthropic_span.attributes["gen_ai.completion.role"]) == "assistant"
assert (anthropic_span.attributes["gen_ai.completion.0.type"]) == "text"
assert (anthropic_span.attributes["gen_ai.completion.1.type"]) == "tool_use"
assert (anthropic_span.attributes["gen_ai.completion.1.id"]) == "toolu_01G97WorjVJw8LFYLWA1d26t"
assert (anthropic_span.attributes["gen_ai.completion.1.name"]) == "get_weather"
assert (anthropic_span.attributes["gen_ai.completion.1.input"]) == json.dumps({"location": "New York"})

metrics_data = reader.get_metrics_data()
resource_metrics = metrics_data.resource_metrics
assert len(resource_metrics) > 0

found_token_metric = False
found_choice_metric = False
found_duration_metric = False
found_exception_metric = False

for rm in resource_metrics:
for sm in rm.scope_metrics:
for metric in sm.metrics:
if metric.name == "gen_ai.client.token.usage":
found_token_metric = True
for data_point in metric.data.data_points:
assert data_point.attributes["gen_ai.token.type"] in [
"input",
"output",
]
assert (
data_point.attributes["gen_ai.response.model"]
== "claude-3-opus-20240229"
)
assert data_point.sum > 0

if metric.name == "gen_ai.client.generation.choices":
found_choice_metric = True
for data_point in metric.data.data_points:
assert data_point.value >= 1
assert (
data_point.attributes["gen_ai.response.model"]
== "claude-3-opus-20240229"
)

if metric.name == "gen_ai.client.operation.duration":
found_duration_metric = True
assert any(
data_point.count > 0 for data_point in metric.data.data_points
)
assert any(
data_point.sum > 0 for data_point in metric.data.data_points
)
assert all(
data_point.attributes.get("gen_ai.response.model")
== "claude-3-opus-20240229"
or data_point.attributes.get("error.type") == "TypeError"
for data_point in metric.data.data_points
)

if metric.name == "llm.anthropic.completion.exceptions":
found_exception_metric = True
for data_point in metric.data.data_points:
assert data_point.value == 1
assert data_point.attributes["error.type"] == "TypeError"

assert found_token_metric is True
assert found_choice_metric is True
assert found_duration_metric is True
assert found_exception_metric is True