Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] support o3-mini #6570

Merged
merged 7 commits into from
Feb 3, 2025
Merged

[feat] support o3-mini #6570

merged 7 commits into from
Feb 3, 2025

Conversation

xingyaoww
Copy link
Collaborator

@xingyaoww xingyaoww commented Jan 31, 2025

End-user friendly description of the problem this fixes or functionality that this introduces

  • Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Give a summary of what the PR does, explaining any non-trivial design decisions


Link of any specific issues this addresses


To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:66d0fbb-nikolaik   --name openhands-app-66d0fbb   docker.all-hands.dev/all-hands-ai/openhands:66d0fbb

@@ -61,7 +61,8 @@ RUN if [ -z "${RELEASE_TAG}" ]; then \
fi && \
wget https://github.com/${RELEASE_ORG}/openvscode-server/releases/download/${RELEASE_TAG}/${RELEASE_TAG}-linux-${arch}.tar.gz && \
tar -xzf ${RELEASE_TAG}-linux-${arch}.tar.gz && \
mv -f ${RELEASE_TAG}-linux-${arch} ${OPENVSCODE_SERVER_ROOT} && \
if [ -d "${OPENVSCODE_SERVER_ROOT}" ]; then rm -rf "${OPENVSCODE_SERVER_ROOT}"; fi && \
mv ${RELEASE_TAG}-linux-${arch} ${OPENVSCODE_SERVER_ROOT} && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this o3-mini related?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not, but it fixes an issue in main when running evals. We could keep them together in this PR or I can merge this on a separate PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I’m just a passerby who recently tried to add o3-mini support to this project. While working on it, I came across this PR and thought I’d check it out. Keeping a clean Git history is usually helpful for troubleshooting and makes it easier for others to jump in, so I just wanted to share that thought. Of course, it’s totally up to you as the Collaborator. Thanks for reading!

@regismesquita
Copy link
Contributor

regismesquita commented Feb 1, 2025

o3-mini seems to be missing on missing on frontend/src/utils/verified-models.ts , not sure if you want to add it only after testing or missed this.

@xingyaoww
Copy link
Collaborator Author

xingyaoww commented Feb 1, 2025

o3-mini seems to be missing on missing on frontend/src/utils/verified-models.ts , not sure if you want to add it only after testing or missed this.

Yep! We will add it to the verified model list (or even as one of the default model) once eval finishes and turns favorably

@xingyaoww xingyaoww marked this pull request as ready for review February 2, 2025 22:32
@xingyaoww xingyaoww requested a review from enyst February 2, 2025 22:32
@xingyaoww
Copy link
Collaborator Author

Okay official result for o3-mini after 4 runs.. The cost for 4 runs is about 1257.39, which means 314.35 on average per run. So, conclusion: cost slightly less than sonnet and performs slightly weaker than Claude.

image

https://openhands-ai.slack.com/archives/C06PB3T5ZK6/p1738536767441849?thread_ts=1738363049.253389&cid=C06PB3T5ZK6

@@ -148,17 +165,11 @@ def __init__(
base_url=self.config.base_url,
api_version=self.config.api_version,
custom_llm_provider=self.config.custom_llm_provider,
max_tokens=self.config.max_output_tokens,
max_completion_tokens=self.config.max_output_tokens,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary for o3-mini, but I wonder, does it work with the others like before? litellm translates max_tokens, but it might be safer to add max_completion_tokens only for reasoning models.

Copy link
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all the evals, this is exciting! Interesting times ahead.

Not sure, but I am a bit concerned that max_completion_tokens is newer and intended for reasoning models. It might be safer to set it only for them for now? I didn't find something clear on litellm.

@xingyaoww
Copy link
Collaborator Author

@enyst

image

I looked through LiteLLM Repo, and it seems LiteLLM does translate them all: https://github.com/search?q=repo%3ABerriAI%2Flitellm+max_completion_tokens&type=code&p=2

For anthropic it didn't make any differences at all: max_tokens and max_completion_tokens all got translated to max_tokens_to_sample.

image

Maybe let's merge it and use it for a while to see if issue comes up?

@xingyaoww xingyaoww changed the title Support o3-mini [feat] support o3-mini Feb 3, 2025
@enyst
Copy link
Collaborator

enyst commented Feb 3, 2025

OK let's merge then!

@xingyaoww
Copy link
Collaborator Author

(For sanity check - I build from this branch and try running claude, there's no issue - probably safe to merge now

@xingyaoww xingyaoww merged commit 622fc52 into main Feb 3, 2025
18 checks passed
@xingyaoww xingyaoww deleted the xw/o3mini branch February 3, 2025 15:26
zchn pushed a commit to zchn/OpenHands that referenced this pull request Feb 4, 2025
adityasoni9998 pushed a commit to adityasoni9998/OpenHands that referenced this pull request Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants