[GPT-OSS-120B] Reference implementation #2395

v-shobhit · 2025-11-21T08:07:54Z

Adding gpt-oss-120b reference implementation.
This uses SGLang backend to serve gpt-oss-120b model. Scripts are provided to run this in Offline/Server scenario in PerformanceOnly/AccuracyOnly modes.

github-actions · 2025-11-21T08:08:03Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

nvzhihanj · 2025-12-03T16:34:49Z

language/gpt-oss/README.md

@@ -0,0 +1,141 @@
+# MLPerf Inference reference implementation for GPT-OSS-120B


Might need to change the dir name to gpt-oss-120b (in case OAI release new version in the future)

nvzhihanj · 2025-12-03T16:36:05Z

language/gpt-oss/README.md

+## Model and Dataset download
+
+* Model: `openai/gpt-oss-120b`, commit id: [`b5c939d`](https://huggingface.co/openai/gpt-oss-120b/tree/b5c939de8f754692c1647ca79fbf85e8c1e70f8a)
+* Dataset: Please request access at [this link](https://drive.google.com/drive/folders/1DCfEXHqe69okrqKbSyV-8VUw413JqpPY?usp=drive_link) - **this is a tentative dataset**


Can you add a TODO to replace it with mlc download link?

nvzhihanj · 2025-12-03T16:45:11Z

language/gpt-oss/README.md

+```bash
+./run_server.sh \
+  --model_path path/to/gpt-oss-120b/model \
+  --dp N  \


How would dp work here? Does --dp 2 map to 2 GPUs?

nvzhihanj · 2025-12-03T23:53:57Z

language/gpt-oss/archive/collect_results_csv.py

@@ -0,0 +1,295 @@
+#!/usr/bin/env python3


Probably can remove the files in archive if it's not useful?

nvzhihanj · 2025-12-03T23:58:06Z

language/gpt-oss/backends/sglang_backend.py

+        results = []
+        for prompt_ids in prompts:
+            start_time = time.time()
+            response = self._send_request(


IIUC, this function is a BS=1 singlestream function which we use offline, and the generate_stream is used for server? Wonder if it will be too slow for offline

nvzhihanj · 2025-12-04T00:50:10Z

language/gpt-oss/README.md

+* Model: `openai/gpt-oss-120b`, commit id: [`b5c939d`](https://huggingface.co/openai/gpt-oss-120b/tree/b5c939de8f754692c1647ca79fbf85e8c1e70f8a)
+* Dataset: Please request access at [this link](https://drive.google.com/drive/folders/1DCfEXHqe69okrqKbSyV-8VUw413JqpPY?usp=drive_link) - **this is a tentative dataset**
+
+Datasets are now provided in **Parquet format** (recommended) for better performance and smaller file size (50% smaller than pickle). Pickle format is still supported for backward compatibility.


Do we have an instruction to generate the dataset pickle file?

nvzhihanj · 2025-12-04T00:51:14Z

language/gpt-oss/preprocess/harmonize_inputs.py

+    Returns:
+        tuple: (conversation_object, token_list) ready for model completion
+    """
+    instructions = (


Just confirming - is this the final version we used for the reference, or the GPT-OSS ref code?

nvzhihanj · 2025-12-04T00:51:48Z

language/gpt-oss/preprocess/harmonize_inputs.py

+    return _finalize_conversation(messages, user_query)
+
+
+def create_healthbench_prompt(prompt, reasoning_effort=ReasoningEffort.HIGH):


This file needs some clean up : )

nvzhihanj · 2025-12-04T00:52:38Z

language/gpt-oss/requirements.txt

@@ -0,0 +1,11 @@
+audioread>=2.1.9


Please use == (we had very bad experience with using >= where the impl is broken after 1 round)
(you can use pip freeze in your env and copy the versions here)

nvzhihanj · 2025-12-04T00:53:27Z

loadgen/issue_query_controller.cc

    std::chrono::nanoseconds scheduled_delta,
-    ResponseDelegate* response_delegate, SequenceGen* sequence_gen)
+    ResponseDelegate* response_delegate, SequenceGen* sequence_gen,
+    uint64_t repeat_index)


Placeholder for whether to use repeats

mlcommons-bot and others added 30 commits December 20, 2024 22:46

[Automated Commit] Format Codebase

9b59d2e

Merge branch 'mlcommons:master' into master

fe51c12

[Automated Commit] Format Codebase

1f2666c

initial

f9c4f61

[Automated Commit] Format Codebase

db9d25e

json fixes

9daa72c

updates, tokenizer

2d0a179

fix padding

c8e679d

concurrent requests

50453c6

increase timeout

e76b68d

[Automated Commit] Format Codebase

b6d5671

add refactor changes

4fd4f56

[Automated Commit] Format Codebase

1df0885

rm truncation

eb2f48c

rm truncation, wait for server ready

4f35b8a

left padding

354cb62

[Automated Commit] Format Codebase

75f4307

fixes

065bf7c

add failure check

36aa581

change opts

c75e629

organize files

040e986

rm submodule

991ff7e

add infer stuff

492847a

add harmonize-tokens.py

f3a3282

move things

8596b13

[Automated Commit] Format Codebase

1db2f96

add README

8c08778

fix name

168f210

add commands

13775b1

update README

37c646d

github-actions bot and others added 8 commits November 20, 2025 22:27

[Automated Commit] Format Codebase

98585b8

add setup

21a8034

updates

734d8f4

server scenario fix; gpt-oss -> gpt-oss-120b

e3e22b8

add fixes

e40a7da

add accuracy eval script for mlperf

382fc9e

finishing touches

31f435a

[Automated Commit] Format Codebase

63592a3

v-shobhit requested a review from a team as a code owner November 21, 2025 08:07

v-shobhit and others added 17 commits November 21, 2025 20:09

refactor mode -> scenario

a41f882

Merge branch 'mlcommons:master' into gptoss-loadgen

b2bc9e0

add eval_perf script

81f6ca5

[Automated Commit] Format Codebase

f780189

add pass@k to acc eval

7f47e5e

add repeats_per_sample option to loadgen

d3a7b58

Merge branch 'loadgen-repeat-samples' into gptoss-loadgen

60976f2

[Automated Commit] Format Codebase

50051f2

fix harmonize tokens -> text

bee73b2

[Automated Commit] Format Codebase

5039fd6

remove file

db4d290

fix prompt of summarization

dbb0fd9

move stuff to sglang

57c6dae

allow use of parquet

da35468

[Automated Commit] Format Codebase

72cd475

Merge branch 'master' into gptoss-loadgen

724502b

fix scores for pass@1 with k repeats

7923249

nvzhihanj suggested changes Dec 4, 2025

View reviewed changes

v-shobhit and others added 2 commits December 8, 2025 23:10

add extra-args option

957c53d

[Automated Commit] Format Codebase

44f662b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GPT-OSS-120B] Reference implementation #2395

[GPT-OSS-120B] Reference implementation #2395

Uh oh!

v-shobhit commented Nov 21, 2025

Uh oh!

github-actions bot commented Nov 21, 2025 •

edited

Loading

Uh oh!

nvzhihanj Dec 3, 2025

Uh oh!

nvzhihanj Dec 3, 2025

Uh oh!

nvzhihanj Dec 3, 2025

Uh oh!

nvzhihanj Dec 3, 2025

Uh oh!

nvzhihanj Dec 3, 2025

Uh oh!

nvzhihanj Dec 4, 2025

Uh oh!

nvzhihanj Dec 4, 2025

Uh oh!

nvzhihanj Dec 4, 2025

Uh oh!

nvzhihanj Dec 4, 2025

Uh oh!

nvzhihanj Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		@@ -0,0 +1,141 @@
		# MLPerf Inference reference implementation for GPT-OSS-120B

		return _finalize_conversation(messages, user_query)


		def create_healthbench_prompt(prompt, reasoning_effort=ReasoningEffort.HIGH):

[GPT-OSS-120B] Reference implementation #2395

Are you sure you want to change the base?

[GPT-OSS-120B] Reference implementation #2395

Uh oh!

Conversation

v-shobhit commented Nov 21, 2025

Uh oh!

github-actions bot commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions bot commented Nov 21, 2025 •

edited

Loading