Skip to content

Conversation

@v-shobhit
Copy link
Contributor

Adding gpt-oss-120b reference implementation.
This uses SGLang backend to serve gpt-oss-120b model. Scripts are provided to run this in Offline/Server scenario in PerformanceOnly/AccuracyOnly modes.

@v-shobhit v-shobhit requested a review from a team as a code owner November 21, 2025 08:07
@github-actions
Copy link
Contributor

github-actions bot commented Nov 21, 2025

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@@ -0,0 +1,141 @@
# MLPerf Inference reference implementation for GPT-OSS-120B
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might need to change the dir name to gpt-oss-120b (in case OAI release new version in the future)

## Model and Dataset download

* Model: `openai/gpt-oss-120b`, commit id: [`b5c939d`](https://huggingface.co/openai/gpt-oss-120b/tree/b5c939de8f754692c1647ca79fbf85e8c1e70f8a)
* Dataset: Please request access at [this link](https://drive.google.com/drive/folders/1DCfEXHqe69okrqKbSyV-8VUw413JqpPY?usp=drive_link) - **this is a tentative dataset**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a TODO to replace it with mlc download link?

```bash
./run_server.sh \
--model_path path/to/gpt-oss-120b/model \
--dp N \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would dp work here? Does --dp 2 map to 2 GPUs?

@@ -0,0 +1,295 @@
#!/usr/bin/env python3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably can remove the files in archive if it's not useful?

results = []
for prompt_ids in prompts:
start_time = time.time()
response = self._send_request(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this function is a BS=1 singlestream function which we use offline, and the generate_stream is used for server? Wonder if it will be too slow for offline

* Model: `openai/gpt-oss-120b`, commit id: [`b5c939d`](https://huggingface.co/openai/gpt-oss-120b/tree/b5c939de8f754692c1647ca79fbf85e8c1e70f8a)
* Dataset: Please request access at [this link](https://drive.google.com/drive/folders/1DCfEXHqe69okrqKbSyV-8VUw413JqpPY?usp=drive_link) - **this is a tentative dataset**

Datasets are now provided in **Parquet format** (recommended) for better performance and smaller file size (50% smaller than pickle). Pickle format is still supported for backward compatibility.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an instruction to generate the dataset pickle file?

Returns:
tuple: (conversation_object, token_list) ready for model completion
"""
instructions = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just confirming - is this the final version we used for the reference, or the GPT-OSS ref code?

return _finalize_conversation(messages, user_query)


def create_healthbench_prompt(prompt, reasoning_effort=ReasoningEffort.HIGH):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file needs some clean up : )

@@ -0,0 +1,11 @@
audioread>=2.1.9
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use == (we had very bad experience with using >= where the impl is broken after 1 round)
(you can use pip freeze in your env and copy the versions here)

std::chrono::nanoseconds scheduled_delta,
ResponseDelegate* response_delegate, SequenceGen* sequence_gen)
ResponseDelegate* response_delegate, SequenceGen* sequence_gen,
uint64_t repeat_index)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Placeholder for whether to use repeats

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants