-
Notifications
You must be signed in to change notification settings - Fork 593
[GPT-OSS-120B] Reference implementation #2395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
| @@ -0,0 +1,141 @@ | |||
| # MLPerf Inference reference implementation for GPT-OSS-120B | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might need to change the dir name to gpt-oss-120b (in case OAI release new version in the future)
| ## Model and Dataset download | ||
|
|
||
| * Model: `openai/gpt-oss-120b`, commit id: [`b5c939d`](https://huggingface.co/openai/gpt-oss-120b/tree/b5c939de8f754692c1647ca79fbf85e8c1e70f8a) | ||
| * Dataset: Please request access at [this link](https://drive.google.com/drive/folders/1DCfEXHqe69okrqKbSyV-8VUw413JqpPY?usp=drive_link) - **this is a tentative dataset** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a TODO to replace it with mlc download link?
| ```bash | ||
| ./run_server.sh \ | ||
| --model_path path/to/gpt-oss-120b/model \ | ||
| --dp N \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would dp work here? Does --dp 2 map to 2 GPUs?
| @@ -0,0 +1,295 @@ | |||
| #!/usr/bin/env python3 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably can remove the files in archive if it's not useful?
| results = [] | ||
| for prompt_ids in prompts: | ||
| start_time = time.time() | ||
| response = self._send_request( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, this function is a BS=1 singlestream function which we use offline, and the generate_stream is used for server? Wonder if it will be too slow for offline
| * Model: `openai/gpt-oss-120b`, commit id: [`b5c939d`](https://huggingface.co/openai/gpt-oss-120b/tree/b5c939de8f754692c1647ca79fbf85e8c1e70f8a) | ||
| * Dataset: Please request access at [this link](https://drive.google.com/drive/folders/1DCfEXHqe69okrqKbSyV-8VUw413JqpPY?usp=drive_link) - **this is a tentative dataset** | ||
|
|
||
| Datasets are now provided in **Parquet format** (recommended) for better performance and smaller file size (50% smaller than pickle). Pickle format is still supported for backward compatibility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have an instruction to generate the dataset pickle file?
| Returns: | ||
| tuple: (conversation_object, token_list) ready for model completion | ||
| """ | ||
| instructions = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just confirming - is this the final version we used for the reference, or the GPT-OSS ref code?
| return _finalize_conversation(messages, user_query) | ||
|
|
||
|
|
||
| def create_healthbench_prompt(prompt, reasoning_effort=ReasoningEffort.HIGH): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file needs some clean up : )
| @@ -0,0 +1,11 @@ | |||
| audioread>=2.1.9 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use == (we had very bad experience with using >= where the impl is broken after 1 round)
(you can use pip freeze in your env and copy the versions here)
| std::chrono::nanoseconds scheduled_delta, | ||
| ResponseDelegate* response_delegate, SequenceGen* sequence_gen) | ||
| ResponseDelegate* response_delegate, SequenceGen* sequence_gen, | ||
| uint64_t repeat_index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Placeholder for whether to use repeats
Adding gpt-oss-120b reference implementation.
This uses SGLang backend to serve gpt-oss-120b model. Scripts are provided to run this in Offline/Server scenario in PerformanceOnly/AccuracyOnly modes.