-
Notifications
You must be signed in to change notification settings - Fork 593
[GPT-OSS-120B] Reference implementation #2395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
9b59d2e
fe51c12
1f2666c
f9c4f61
db9d25e
9daa72c
2d0a179
c8e679d
50453c6
e76b68d
b6d5671
4fd4f56
1df0885
eb2f48c
4f35b8a
354cb62
75f4307
065bf7c
36aa581
c75e629
040e986
991ff7e
492847a
f3a3282
8596b13
1db2f96
8c08778
168f210
13775b1
37c646d
9cd0c61
6abf8fa
8fe8712
889db8c
381cf60
209e923
fdc518d
f73da58
4219a77
782f066
326a5fa
1b99263
579ef1c
ec8d12e
a9de6f4
387779c
bde219f
24ef1e0
141dd8d
684e27a
f5b04db
d44256e
ff9133b
5f1fd8e
ca654c8
8c59f03
108ea9d
f302c7c
534390c
78bf971
115f498
9571c11
f8244ee
531c37c
5e86d65
8c02839
be4109e
c0b9ef3
dc54e98
41cac5a
06a5387
d927fb3
c2fda5e
c5c389b
341c750
ef142a6
834175f
e447081
a1e668a
3db5bc2
3ca2cc1
09cd3f7
1aad396
a6e05f2
a2936a6
188411c
b324d7d
f8a9f43
8429244
d1f2794
ee33969
20f8916
1e67e0c
96c90e1
e6d9c67
ff892bb
e4043d2
1714c09
19fcc80
6cb7698
b281727
9a0c45a
35ea0e4
1c73423
9a9194f
1812c04
16af1bf
9b4a84c
9450d46
319e5f7
4d89b98
be02519
18a8444
199f476
425ce75
7cdc7cb
ab90695
62e4d47
292c49d
98585b8
21a8034
734d8f4
e3e22b8
e40a7da
382fc9e
31f435a
63592a3
a41f882
b2bc9e0
81f6ca5
f780189
7f47e5e
d3a7b58
60976f2
50051f2
bee73b2
5039fd6
db4d290
dbb0fd9
57c6dae
da35468
72cd475
724502b
7923249
957c53d
44f662b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| *venv* | ||
| *.pkl | ||
| *.csv |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,141 @@ | ||
| # MLPerf Inference reference implementation for GPT-OSS-120B | ||
| This is the reference implementation for GPT-OSS-120B. This is a proposal and is a WIP. | ||
|
|
||
| ## Model and Dataset download | ||
|
|
||
| * Model: `openai/gpt-oss-120b`, commit id: [`b5c939d`](https://huggingface.co/openai/gpt-oss-120b/tree/b5c939de8f754692c1647ca79fbf85e8c1e70f8a) | ||
| * Dataset: Please request access at [this link](https://drive.google.com/drive/folders/1DCfEXHqe69okrqKbSyV-8VUw413JqpPY?usp=drive_link) - **this is a tentative dataset** | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add a TODO to replace it with mlc download link? |
||
|
|
||
| Datasets are now provided in **Parquet format** (recommended) for better performance and smaller file size (50% smaller than pickle). Pickle format is still supported for backward compatibility. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we have an instruction to generate the dataset pickle file? |
||
|
|
||
|
|
||
| ## Environment setup | ||
| Work on reference implementation is done using the sglang containers at [https://hub.docker.com/r/lmsysorg/sglang/tags](https://hub.docker.com/r/lmsysorg/sglang/tags). For enroot setup, a script is provided under [`setup_enroot.sh`](./setup_enroot.sh). For all sections below, we shall assume this environment is instantiated. | ||
|
|
||
| Once in the environment, install additional requirements using [`setup.sh`](./setup.sh): | ||
| ```bash | ||
| ./setup.sh | ||
| ``` | ||
|
|
||
| ## Running the reference implementation: SGLang | ||
| Use [`./sglang/run_server.sh`](./sglang/run_server.sh) to launch an SGLang server hosting `gpt-oss-120b`. | ||
|
|
||
| ### Run the server | ||
| ```bash | ||
| ./run_server.sh \ | ||
| --model_path path/to/gpt-oss-120b/model \ | ||
| --dp N \ | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How would dp work here? Does --dp 2 map to 2 GPUs? |
||
| --stream_interval 100 \ | ||
| --eagle_path optional/path/to/eagle/head | ||
| ``` | ||
| The script uses `python3 -m sglang.launch_server` tp instantiate the model, with `tp=pp=ep=1`, and `dp` as specified. | ||
|
|
||
| Then, run a benchmark script that uses the client to send/recv requests. | ||
| ### Run the inference | ||
|
|
||
| **Note:** All scripts now support both Parquet (`.parquet`) and Pickle (`.pkl`) formats for dataset files. Parquet is recommended as it offers: | ||
| - 50% smaller file size | ||
| - Faster loading times | ||
| - Cross-language compatibility | ||
| - Type-safe schema preservation | ||
|
|
||
| Example usage: | ||
| ```bash | ||
| # first, install loadgen | ||
| pip install $(git rev-parse --show-toplevel)/loadgen | ||
|
|
||
| # Using Parquet format (recommended) | ||
| python3 run_mlperf.py \ | ||
| --scenario offline \ | ||
| --input-file /path/to/dataset.parquet \ | ||
| --accuracy | ||
|
|
||
| # Using Pickle format (backward compatible) | ||
| python3 run_mlperf.py \ | ||
| --scenario offline \ | ||
| --input-file /path/to/dataset.pkl \ | ||
| --accuracy | ||
| ``` | ||
|
|
||
| Full command-line options: | ||
| ```bash | ||
| python3 run_mlperf.py --help | ||
| usage: run_mlperf.py [-h] [--scenario {offline,server}] --input-file INPUT_FILE [--max-samples MAX_SAMPLES] [--mlperf-conf MLPERF_CONF] | ||
| [--user-conf USER_CONF] [--accuracy] [--output-dir OUTPUT_DIR] [--backend {sglang}] [--server-url SERVER_URL] | ||
| [--generation-config GENERATION_CONFIG] [--max-new-tokens MAX_NEW_TOKENS] [--num-workers NUM_WORKERS] | ||
| [--max-concurrency MAX_CONCURRENCY] | ||
|
|
||
| Run MLPerf inference benchmarks for gpt-oss | ||
|
|
||
| options: | ||
| -h, --help show this help message and exit | ||
| --scenario {offline,server} | ||
| MLPerf scenario mode | ||
| --input-file INPUT_FILE | ||
| Path to tokenized dataset (parquet or pickle file) | ||
| --max-samples MAX_SAMPLES | ||
| Maximum number of samples to use (None for all) | ||
| --mlperf-conf MLPERF_CONF | ||
| Path to MLPerf configuration file | ||
| --user-conf USER_CONF | ||
| Path to user configuration file | ||
| --accuracy Run accuracy mode instead of performance | ||
| --output-dir OUTPUT_DIR | ||
| Directory for MLPerf output logs | ||
| --backend {sglang} Backend to use for inference | ||
| --server-url SERVER_URL | ||
| Server URL for backend (SGLang) | ||
| --generation-config GENERATION_CONFIG | ||
| Path to generation configuration JSON file | ||
| --max-new-tokens MAX_NEW_TOKENS | ||
| Override max_new_tokens from generation config (default: use value from config) | ||
| --num-workers NUM_WORKERS | ||
| Number of worker threads (for server scenario) | ||
| --max-concurrency MAX_CONCURRENCY | ||
| Maximum concurrent requests to backend (SGLang handles batching internally) | ||
|
|
||
| ``` | ||
|
|
||
| ### Evaluate the accuracy | ||
| Run `run_mlperf.py` with `--accuracy`, and then use the generated `mlperf_log_accuracy.json` to evaluate the accuracy of the run. | ||
|
|
||
| Example usage: | ||
| ```bash | ||
| # Using Parquet format (recommended) | ||
| python3 eval_mlperf_accuracy.py \ | ||
| --mlperf-log mlperf_results/offline/accuracy/mlperf_log_accuracy.json \ | ||
| --reference-data /path/to/acc_eval_inputs.parquet \ | ||
| --tokenizer openai/gpt-oss-120b | ||
|
|
||
| # Using Pickle format (backward compatible) | ||
| python3 eval_mlperf_accuracy.py \ | ||
| --mlperf-log mlperf_results/offline/accuracy/mlperf_log_accuracy.json \ | ||
| --reference-data /path/to/acc_eval_inputs.pkl \ | ||
| --tokenizer openai/gpt-oss-120b | ||
| ``` | ||
|
|
||
| Full command-line options: | ||
| ```bash | ||
| python3 eval_mlperf_accuracy.py --help | ||
| usage: eval_mlperf_accuracy.py [-h] --mlperf-log MLPERF_LOG --reference-data REFERENCE_DATA [--tokenizer TOKENIZER] [--output-file OUTPUT_FILE] | ||
| [--save-outputs SAVE_OUTPUTS] [--num-lcb-workers NUM_LCB_WORKERS] [--verbose] | ||
|
|
||
| Evaluate MLPerf accuracy logs for gpt-oss-120b | ||
|
|
||
| options: | ||
| -h, --help show this help message and exit | ||
| --mlperf-log MLPERF_LOG | ||
| Path to mlperf_log_accuracy.json | ||
| --reference-data REFERENCE_DATA | ||
| Path to reference parquet or pickle file (DataFrame with dataset, ground_truth, etc.) | ||
| --tokenizer TOKENIZER | ||
| HuggingFace tokenizer name or path | ||
| --output-file OUTPUT_FILE | ||
| Output JSON file for results (optional) | ||
| --save-outputs SAVE_OUTPUTS | ||
| Save detokenized outputs to pickle file (ordered by qsl_idx) for debugging | ||
| --num-lcb-workers NUM_LCB_WORKERS | ||
| Number of parallel workers for LiveCodeBench evaluation (default: 64) | ||
| --verbose Verbose logging | ||
|
|
||
| ``` | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might need to change the dir name to gpt-oss-120b (in case OAI release new version in the future)