[OMNIML-2244] Create the MXFP8 quant exporter #634

ajrasane · 2025-12-02T21:34:14Z

What does this PR do?

Type of change:
New feature

Overview: ?

Implemented functions for the MXFP8 quant exporter
Integrated autocast for converting model to fp16
deprecated quantize_weights_to_mxfp8
Updated tests

Usage

python torch_quant_to_onnx.py --quantize_mode=mxfp8 --onnx_save_path=vit_base_patch16_224.mxfp8.onnx --calibration_data_size 64 --batch_size 128

Testing

python evaluate.py --onnx_path=vit_base_patch16_224.mxfp8.onnx --model_name=vit_base_patch16_224 --results_path=./results.txt --batch_size 128

Accuracy and latency results

The top1 accuracy of the model is 85.07%
The top5 accuracy of the model is 97.558%

Reference accuracy for fp16

The top1 accuracy of the model is 85.102%
The top5 accuracy of the model is 97.526%

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: No
deprecated quantize_weights_to_mxfp8
Did you write any new necessary tests?: No
Did you add or update any necessary documentation?: No
Did you update Changelog?: No

codecov · 2025-12-02T21:45:35Z

Codecov Report

❌ Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 74.50%. Comparing base (3ef9e39) to head (92ff47a).

Files with missing lines	Patch %	Lines
modelopt/torch/_deploy/utils/torch_onnx.py	66.66%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #634      +/-   ##
==========================================
- Coverage   74.57%   74.50%   -0.07%     
==========================================
  Files         183      183              
  Lines       18451    18400      -51     
==========================================
- Hits        13759    13709      -50     
+ Misses       4692     4691       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

kevalmorabia97 · 2025-12-03T06:45:48Z

.github/workflows/gpu_tests.yml

    timeout-minutes: 120
    container: &gpu_container
-      image: nvcr.io/nvidia/pytorch:25.06-py3
+      image: nvcr.io/nvidia/pytorch:25.08-py3


25.08 is CUDA 13 container. ort-gpu installation by default is for cuda 12, is that fine? We also install cupy-cuda12x instead of cupy-cuda13x in setup.py for int4 onnx quantization

There are issues with using autocast with the TensorRT verison. I will disable it for now for mxfp8.

kevalmorabia97 · 2025-12-04T04:45:17Z

.gitlab/tests.yml

Please rebase to latest main branch so all tests can run in github for this PR. CICD is now migrated to .github/workflows (except onnx_ptq bash test).

gcunhase · 2025-12-05T14:37:50Z

Can you please add the accuracy for the baseline model for comparison? FP32 or FP16 should be okay. Thanks.

Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>

## What does this PR do? **Type of change:** New feature **Overview:** ? - Implemented functions for the MXFP8 quant exporter - Integrated autocast for converting model to fp16 - deprecated quantize_weights_to_mxfp8 - Updated tests ## Usage ``` python torch_quant_to_onnx.py --quantize_mode=mxfp8 --onnx_save_path=vit_base_patch16_224.mxfp8.onnx --calibration_data_size 64 --batch_size 128 ``` ## Testing ``` python evaluate.py --onnx_path=vit_base_patch16_224.mxfp8.onnx --model_name=vit_base_patch16_224 --results_path=./results.txt --batch_size 128 ``` Accuracy and latency results ``` The top1 accuracy of the model is 85.07% The top5 accuracy of the model is 97.558% Inference latency of the model is 6.65451 ms ``` ## Before your PR is "*Ready for review*"  - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: No - deprecated quantize_weights_to_mxfp8 - **Did you write any new necessary tests?**: No - **Did you add or update any necessary documentation?**: No - **Did you update [Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**: No  --------- Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>

ajrasane requested review from a team as code owners December 2, 2025 21:34

ajrasane requested a review from cjluo-nv December 2, 2025 21:34

ajrasane requested a review from a team as a code owner December 3, 2025 02:28

ajrasane requested a review from kevalmorabia97 December 3, 2025 02:28

kevalmorabia97 reviewed Dec 3, 2025

View reviewed changes

kevalmorabia97 reviewed Dec 4, 2025

View reviewed changes

ajrasane force-pushed the ajrasane/mxfp8_export branch from f0103e0 to 06fc4df Compare December 4, 2025 21:06

cjluo-nv approved these changes Dec 4, 2025

View reviewed changes

ajrasane enabled auto-merge (squash) December 4, 2025 22:48

gcunhase approved these changes Dec 5, 2025

View reviewed changes

ajrasane added 5 commits December 5, 2025 20:05

[OMNIML-2244] Create MXFP8 quant exporter

17d59a4

Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>

Integrate autocast for mxfp8

918d081

Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>

Update tests container

2146a4e

Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>

Do not use autocast for mxfp8

9b30e17

Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>

Rebase latest changes

b685019

Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>

ajrasane force-pushed the ajrasane/mxfp8_export branch from 06fc4df to b685019 Compare December 5, 2025 20:06

Fix imports after rebase

92ff47a

Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>

ajrasane merged commit 93b28d0 into main Dec 5, 2025
36 checks passed

ajrasane deleted the ajrasane/mxfp8_export branch December 5, 2025 22:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[OMNIML-2244] Create the MXFP8 quant exporter #634

[OMNIML-2244] Create the MXFP8 quant exporter #634

Uh oh!

ajrasane commented Dec 2, 2025 •

edited

Loading

Uh oh!

codecov bot commented Dec 2, 2025 •

edited

Loading

Uh oh!

kevalmorabia97 Dec 3, 2025

Uh oh!

ajrasane Dec 3, 2025

Uh oh!

kevalmorabia97 Dec 4, 2025

Uh oh!

ajrasane Dec 4, 2025

Uh oh!

gcunhase commented Dec 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[OMNIML-2244] Create the MXFP8 quant exporter #634

[OMNIML-2244] Create the MXFP8 quant exporter #634

Uh oh!

Conversation

ajrasane commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Uh oh!

codecov bot commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

kevalmorabia97 Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

ajrasane Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

kevalmorabia97 Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

ajrasane Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

gcunhase commented Dec 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ajrasane commented Dec 2, 2025 •

edited

Loading

codecov bot commented Dec 2, 2025 •

edited

Loading