FIX:simple fix on tool calling test for anthropic #6181

SongChiYoung · 2025-04-02T22:59:52Z

Why are these changes needed?

Just simple change.

messages: List[LLMMessage] = [UserMessage(content="Call the pass tool with input 'task'", source="user")]

to

messages: List[LLMMessage] = [UserMessage(content="Call the pass tool with input 'task' and talk result", source="user")]

And, now.
Anthropic model could pass that test case test_model_client_with_function_calling.
-> Yup. Before, claude could not pass that test case.

With this change, Claude (Anthropic) models are now able to pass the test case successfully.

Before this fix, Claude failed to interpret the intent correctly. Now, it can infer both tool usage and follow-up generation.

This change is backward-compatible with other models (e.g., GPT-4) and improves cross-model consistency for function-calling tests.

Checks

x

I've included any doc changes needed for https://microsoft.github.io/autogen/. See https://github.com/microsoft/autogen/blob/main/CONTRIBUTING.md to build and test documentation locally.
I've added tests (if relevant) corresponding to the changes introduced in this PR.
I've made sure all auto checks have passed.

SongChiYoung · 2025-04-02T23:01:08Z

This change highlights how Claude is sensitive to slight variations in prompts when handling function-calling.
It suggests we might be encountering a broader issue related to model-specific prompt tolerance.
Worth keeping in mind as we aim for cross-model consistency.

codecov · 2025-04-02T23:10:49Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.80%. Comparing base (27da37e) to head (23d56f0).
Report is 1 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #6181       +/-   ##
===========================================
+ Coverage   77.33%   89.80%   +12.47%     
===========================================
  Files         197       35      -162     
  Lines       13808     2571    -11237     
===========================================
- Hits        10679     2309     -8370     
+ Misses       3129      262     -2867

Flag	Coverage Δ
unittests	`89.80% <ø> (+12.47%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

FIX:simple_fix_on_tool_calling_for_anthropic

23d56f0

SongChiYoung changed the title ~~FIX:simple_fix_on_tool_calling_for_anthropic~~ FIX:simple fix on tool calling test for anthropic Apr 2, 2025

ekzhu approved these changes Apr 2, 2025

View reviewed changes

ekzhu enabled auto-merge (squash) April 2, 2025 23:04

ekzhu merged commit d7f2b56 into microsoft:main Apr 2, 2025
56 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FIX:simple fix on tool calling test for anthropic #6181

FIX:simple fix on tool calling test for anthropic #6181

Uh oh!

SongChiYoung commented Apr 2, 2025

Uh oh!

SongChiYoung commented Apr 2, 2025

Uh oh!

Uh oh!

codecov bot commented Apr 2, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

FIX:simple fix on tool calling test for anthropic #6181

FIX:simple fix on tool calling test for anthropic #6181

Uh oh!

Conversation

SongChiYoung commented Apr 2, 2025

Why are these changes needed?

Checks

Uh oh!

SongChiYoung commented Apr 2, 2025

Uh oh!

Uh oh!

codecov bot commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Apr 2, 2025 •

edited

Loading