Skip to content

Conversation

@sergachev
Copy link
Contributor

📝 Summary of Changes
Support int4 in cuDNN GEMM fusions.

🎯 Justification
Accelerates some int4 GEMM fusions (under the flag xla_gpu_cudnn_gemm_fusion_level).

🚀 Kind of Contribution
⚡️ Performance Improvement

📊 Benchmark (for Performance Improvements)

Please measure and include speedups for one of the public HLOs in
compiler/xla/tools/benchmarks/hlo/.

These do not use int4.

🧪 Unit Tests:
yes

🧪 Execution Tests:
yes

copybara-service bot pushed a commit that referenced this pull request Nov 11, 2025
Imported from GitHub PR #33794

📝 Summary of Changes
Support int4 in cuDNN GEMM fusions.

🎯 Justification
Accelerates some int4 GEMM fusions (under the flag xla_gpu_cudnn_gemm_fusion_level).

🚀 Kind of Contribution
⚡️ Performance Improvement

📊 Benchmark (for Performance Improvements)
> Please measure and include speedups for one of the public HLOs in
`compiler/xla/tools/benchmarks/hlo/`.

These do not use int4.

🧪 Unit Tests:
yes

🧪 Execution Tests:
yes
Copybara import of the project:

--
e1b8dc7 by Ilia Sergachev <isergachev@nvidia.com>:

[GPU] Support int4 in cuDNN GEMM fusions.

Merging this change closes #33794

FUTURE_COPYBARA_INTEGRATE_REVIEW=#33794 from openxla:cudnn_gemm_int4 e1b8dc7
PiperOrigin-RevId: 830894321
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Nov 11, 2025
Imported from GitHub PR openxla/xla#33794

📝 Summary of Changes
Support int4 in cuDNN GEMM fusions.

🎯 Justification
Accelerates some int4 GEMM fusions (under the flag xla_gpu_cudnn_gemm_fusion_level).

🚀 Kind of Contribution
⚡️ Performance Improvement

📊 Benchmark (for Performance Improvements)
> Please measure and include speedups for one of the public HLOs in
`compiler/xla/tools/benchmarks/hlo/`.

These do not use int4.

🧪 Unit Tests:
yes

🧪 Execution Tests:
yes
Copybara import of the project:

--
e1b8dc7daff4963b93152d2a5c81c4d91a9f14d8 by Ilia Sergachev <isergachev@nvidia.com>:

[GPU] Support int4 in cuDNN GEMM fusions.

Merging this change closes #33794

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#33794 from openxla:cudnn_gemm_int4 e1b8dc7daff4963b93152d2a5c81c4d91a9f14d8
PiperOrigin-RevId: 830894321
copybara-service bot pushed a commit that referenced this pull request Nov 12, 2025
Imported from GitHub PR #33794

📝 Summary of Changes
Support int4 in cuDNN GEMM fusions.

🎯 Justification
Accelerates some int4 GEMM fusions (under the flag xla_gpu_cudnn_gemm_fusion_level).

🚀 Kind of Contribution
⚡️ Performance Improvement

📊 Benchmark (for Performance Improvements)
> Please measure and include speedups for one of the public HLOs in
`compiler/xla/tools/benchmarks/hlo/`.

These do not use int4.

🧪 Unit Tests:
yes

🧪 Execution Tests:
yes
Copybara import of the project:

--
e1b8dc7 by Ilia Sergachev <isergachev@nvidia.com>:

[GPU] Support int4 in cuDNN GEMM fusions.

Merging this change closes #33794

FUTURE_COPYBARA_INTEGRATE_REVIEW=#33794 from openxla:cudnn_gemm_int4 e1b8dc7
PiperOrigin-RevId: 830894321
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Nov 12, 2025
Imported from GitHub PR openxla/xla#33794

📝 Summary of Changes
Support int4 in cuDNN GEMM fusions.

🎯 Justification
Accelerates some int4 GEMM fusions (under the flag xla_gpu_cudnn_gemm_fusion_level).

🚀 Kind of Contribution
⚡️ Performance Improvement

📊 Benchmark (for Performance Improvements)
> Please measure and include speedups for one of the public HLOs in
`compiler/xla/tools/benchmarks/hlo/`.

These do not use int4.

🧪 Unit Tests:
yes

🧪 Execution Tests:
yes
Copybara import of the project:

--
e1b8dc7daff4963b93152d2a5c81c4d91a9f14d8 by Ilia Sergachev <isergachev@nvidia.com>:

[GPU] Support int4 in cuDNN GEMM fusions.

Merging this change closes #33794

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#33794 from openxla:cudnn_gemm_int4 e1b8dc7daff4963b93152d2a5c81c4d91a9f14d8
PiperOrigin-RevId: 830894321
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Nov 12, 2025
Imported from GitHub PR openxla/xla#33794

📝 Summary of Changes
Support int4 in cuDNN GEMM fusions.

🎯 Justification
Accelerates some int4 GEMM fusions (under the flag xla_gpu_cudnn_gemm_fusion_level).

🚀 Kind of Contribution
⚡️ Performance Improvement

📊 Benchmark (for Performance Improvements)
> Please measure and include speedups for one of the public HLOs in
`compiler/xla/tools/benchmarks/hlo/`.

These do not use int4.

🧪 Unit Tests:
yes

🧪 Execution Tests:
yes
Copybara import of the project:

--
e1b8dc7daff4963b93152d2a5c81c4d91a9f14d8 by Ilia Sergachev <isergachev@nvidia.com>:

[GPU] Support int4 in cuDNN GEMM fusions.

Merging this change closes #33794

PiperOrigin-RevId: 831264661
@sergachev sergachev deleted the cudnn_gemm_int4 branch November 12, 2025 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants