[GPU] Support int4 in cuDNN GEMM fusions. #33794

sergachev · 2025-11-11T01:00:43Z

📝 Summary of Changes
Support int4 in cuDNN GEMM fusions.

🎯 Justification
Accelerates some int4 GEMM fusions (under the flag xla_gpu_cudnn_gemm_fusion_level).

🚀 Kind of Contribution
⚡️ Performance Improvement

📊 Benchmark (for Performance Improvements)

Please measure and include speedups for one of the public HLOs in
compiler/xla/tools/benchmarks/hlo/.

These do not use int4.

🧪 Unit Tests:
yes

🧪 Execution Tests:
yes

Imported from GitHub PR #33794 📝 Summary of Changes Support int4 in cuDNN GEMM fusions. 🎯 Justification Accelerates some int4 GEMM fusions (under the flag xla_gpu_cudnn_gemm_fusion_level). 🚀 Kind of Contribution ⚡️ Performance Improvement 📊 Benchmark (for Performance Improvements) > Please measure and include speedups for one of the public HLOs in `compiler/xla/tools/benchmarks/hlo/`. These do not use int4. 🧪 Unit Tests: yes 🧪 Execution Tests: yes Copybara import of the project: -- e1b8dc7 by Ilia Sergachev <isergachev@nvidia.com>: [GPU] Support int4 in cuDNN GEMM fusions. Merging this change closes #33794 FUTURE_COPYBARA_INTEGRATE_REVIEW=#33794 from openxla:cudnn_gemm_int4 e1b8dc7 PiperOrigin-RevId: 830894321

Imported from GitHub PR openxla/xla#33794 📝 Summary of Changes Support int4 in cuDNN GEMM fusions. 🎯 Justification Accelerates some int4 GEMM fusions (under the flag xla_gpu_cudnn_gemm_fusion_level). 🚀 Kind of Contribution ⚡️ Performance Improvement 📊 Benchmark (for Performance Improvements) > Please measure and include speedups for one of the public HLOs in `compiler/xla/tools/benchmarks/hlo/`. These do not use int4. 🧪 Unit Tests: yes 🧪 Execution Tests: yes Copybara import of the project: -- e1b8dc7daff4963b93152d2a5c81c4d91a9f14d8 by Ilia Sergachev <isergachev@nvidia.com>: [GPU] Support int4 in cuDNN GEMM fusions. Merging this change closes #33794 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#33794 from openxla:cudnn_gemm_int4 e1b8dc7daff4963b93152d2a5c81c4d91a9f14d8 PiperOrigin-RevId: 830894321

Imported from GitHub PR #33794 📝 Summary of Changes Support int4 in cuDNN GEMM fusions. 🎯 Justification Accelerates some int4 GEMM fusions (under the flag xla_gpu_cudnn_gemm_fusion_level). 🚀 Kind of Contribution ⚡️ Performance Improvement 📊 Benchmark (for Performance Improvements) > Please measure and include speedups for one of the public HLOs in `compiler/xla/tools/benchmarks/hlo/`. These do not use int4. 🧪 Unit Tests: yes 🧪 Execution Tests: yes Copybara import of the project: -- e1b8dc7 by Ilia Sergachev <isergachev@nvidia.com>: [GPU] Support int4 in cuDNN GEMM fusions. Merging this change closes #33794 FUTURE_COPYBARA_INTEGRATE_REVIEW=#33794 from openxla:cudnn_gemm_int4 e1b8dc7 PiperOrigin-RevId: 830894321

Imported from GitHub PR openxla/xla#33794 📝 Summary of Changes Support int4 in cuDNN GEMM fusions. 🎯 Justification Accelerates some int4 GEMM fusions (under the flag xla_gpu_cudnn_gemm_fusion_level). 🚀 Kind of Contribution ⚡️ Performance Improvement 📊 Benchmark (for Performance Improvements) > Please measure and include speedups for one of the public HLOs in `compiler/xla/tools/benchmarks/hlo/`. These do not use int4. 🧪 Unit Tests: yes 🧪 Execution Tests: yes Copybara import of the project: -- e1b8dc7daff4963b93152d2a5c81c4d91a9f14d8 by Ilia Sergachev <isergachev@nvidia.com>: [GPU] Support int4 in cuDNN GEMM fusions. Merging this change closes #33794 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#33794 from openxla:cudnn_gemm_int4 e1b8dc7daff4963b93152d2a5c81c4d91a9f14d8 PiperOrigin-RevId: 830894321

Imported from GitHub PR openxla/xla#33794 📝 Summary of Changes Support int4 in cuDNN GEMM fusions. 🎯 Justification Accelerates some int4 GEMM fusions (under the flag xla_gpu_cudnn_gemm_fusion_level). 🚀 Kind of Contribution ⚡️ Performance Improvement 📊 Benchmark (for Performance Improvements) > Please measure and include speedups for one of the public HLOs in `compiler/xla/tools/benchmarks/hlo/`. These do not use int4. 🧪 Unit Tests: yes 🧪 Execution Tests: yes Copybara import of the project: -- e1b8dc7daff4963b93152d2a5c81c4d91a9f14d8 by Ilia Sergachev <isergachev@nvidia.com>: [GPU] Support int4 in cuDNN GEMM fusions. Merging this change closes #33794 PiperOrigin-RevId: 831264661

[GPU] Support int4 in cuDNN GEMM fusions.

e1b8dc7

sergachev force-pushed the cudnn_gemm_int4 branch from f609d3d to e1b8dc7 Compare November 11, 2025 10:18

dimitar-asenov approved these changes Nov 11, 2025

View reviewed changes

copybara-service bot mentioned this pull request Nov 11, 2025

PR #33794: [GPU] Support int4 in cuDNN GEMM fusions. #33809

Merged

copybara-service bot mentioned this pull request Nov 11, 2025

PR #33794: [GPU] Support int4 in cuDNN GEMM fusions. tensorflow/tensorflow#104166

Closed

copybara-service bot closed this in 09464f6 Nov 12, 2025

sergachev deleted the cudnn_gemm_int4 branch November 12, 2025 10:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GPU] Support int4 in cuDNN GEMM fusions. #33794

[GPU] Support int4 in cuDNN GEMM fusions. #33794

sergachev commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[GPU] Support int4 in cuDNN GEMM fusions. #33794

[GPU] Support int4 in cuDNN GEMM fusions. #33794

Conversation

sergachev commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants