[ROCm] more hipify v2 fixes #4854

jeffdaily · 2025-09-10T18:58:35Z

Prior to the pytorch hipify v2 PR is landed, additional fixes are needed for the experimental gen_ai HIP sources. The fbgemm_gpu *.hip sources do not undergo additional hipify steps and they were written to assume pytorch's hipify v1 interfaces. Some small changes are necessary to make the sources more flexible to either hipify v1 or v2 torch APIs.

netlify · 2025-09-10T18:58:41Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`148c5d7`
🔍 Latest deploy log	https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/68dae1964ab705000893d7f1
😎 Deploy Preview	https://deploy-preview-4854--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

facebook-github-bot · 2025-09-11T05:13:57Z

@q10 has imported this pull request. If you are a Meta employee, you can view this in D82186865.

facebook-github-bot · 2025-09-11T16:32:52Z

@atalman has imported this pull request. If you are a Meta employee, you can view this in D82186865.

q10 · 2025-09-11T17:17:48Z

Hi @jeffdaily could you resolve the branch conflicts? Otherwise I think the PR looks good for landing

jeffdaily · 2025-09-12T00:38:37Z

@q10 done.

facebook-github-bot · 2025-09-12T01:52:03Z

@q10 has imported this pull request. If you are a Meta employee, you can view this in D82186865.

q10 · 2025-09-12T19:49:54Z

...erimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/fp8_rowwise_grouped_gemm.hip

 #include "kernels/fp8_rowwise_grouped_kernel_manifest.h"
 #include "kernels/fp8_rowwise_grouped_heuristic.hpp"
-#include "fbgemm_gpu/quantize/tuning_cache.hpp"
+#include "fbgemm_gpu/quantize/tuning_cache_hip.hpp"


@jeffdaily This is breaking internal builds

fatal error: 'fbgemm_gpu/quantize/tuning_cache_hip.hpp' file not found 26 | #include "fbgemm_gpu/quantize/tuning_cache_hip.hpp"

I think the internal hipification doesn't require updating the filepath.

When I build locally this file does exist. It is created by the hipify step during cmake. Does your internal build run the same hipify step that cmake does? The point here is that the tuning_cache.hpp file is hipified to a new file with _hip suffix and the contents of that hipified file are correct instead of needing the clunky manual #ifdefs that were in the original header.

@jeffdaily I think I understand the internal situation now.

The internal build system only hipifies files with .cu or .cuh file extension, and since fbgemm_gpu/quantize/tuning_cache.hpp is just a regular C++ file with no CUDA syntax, it does not hipify it. This is probably why we had to rely on the ifdefs in the first place.

So there appears to be two possible solutions:

Keep it as .hpp file, along with the ifdefs.

Rename the file with .cuh extension, and update the sources that #include this file accordingly.

I'm currently verifying the second solution with the internal CI.

Thanks. How did that internal test go?

@jeffdaily Option 2 appears to work with the internal CI

@jeffdaily hmm, so even though option 2 works with the internal CI, it breaks in OSS, bc the renamed file, tuning_cache.cuh, isn't HIPified - see build logs in https://github.com/pytorch/FBGEMM/actions/runs/17969709031/job/51109221018?pr=4921

This means we would have to revert back to using #ifdef USE_ROCM at least within that file...

I tried renaming the file fbgemm_gpu/experimental/gen_ai/src/quantize/common/include/fbgemm_gpu/quantize/tuning_cache.hpp to tuning_cache.cuh, updated all #include statements, and my local build succeeded. In your log above I see that for the CK source file you're including tuning_cache.cuh, but the file should be named tuning_cache_hip.cuh after hipify runs.

I pushed my change 148c5d7.

Summary: X-link: facebookresearch/FBGEMM#1898 Prior to the pytorch hipify v2 PR is landed, additional fixes are needed for the experimental gen_ai HIP sources. The fbgemm_gpu *.hip sources do not undergo additional hipify steps and they were written to assume pytorch's hipify v1 interfaces. Some small changes are necessary to make the sources more flexible to either hipify v1 or v2 torch APIs. Pull Request resolved: pytorch#4854 Reviewed By: atalman Differential Revision: D82186865 Pulled By: q10

Summary: Pull Request resolved: pytorch#4921 X-link: facebookresearch/FBGEMM#1898 Prior to the pytorch hipify v2 PR is landed, additional fixes are needed for the experimental gen_ai HIP sources. The fbgemm_gpu *.hip sources do not undergo additional hipify steps and they were written to assume pytorch's hipify v1 interfaces. Some small changes are necessary to make the sources more flexible to either hipify v1 or v2 torch APIs. Pull Request resolved: pytorch#4854 Reviewed By: atalman Differential Revision: D82186865 Pulled By: q10

Summary: X-link: facebookresearch/FBGEMM#1969 Prior to the pytorch hipify v2 PR is landed, additional fixes are needed for the experimental gen_ai HIP sources. The fbgemm_gpu *.hip sources do not undergo additional hipify steps and they were written to assume pytorch's hipify v1 interfaces. Some small changes are necessary to make the sources more flexible to either hipify v1 or v2 torch APIs. Pull Request resolved: pytorch#4854 Differential Revision: D83519493 Pulled By: q10

Summary: Pull Request resolved: pytorch#4947 X-link: facebookresearch/FBGEMM#1969 Prior to the pytorch hipify v2 PR is landed, additional fixes are needed for the experimental gen_ai HIP sources. The fbgemm_gpu *.hip sources do not undergo additional hipify steps and they were written to assume pytorch's hipify v1 interfaces. Some small changes are necessary to make the sources more flexible to either hipify v1 or v2 torch APIs. Pull Request resolved: pytorch#4854 Differential Revision: D83519493 Pulled By: q10

facebook-github-bot · 2025-10-01T04:28:34Z

@q10 merged this pull request in 072e323.

jeffdaily added 4 commits September 10, 2025 00:01

[ROCm] more fixes needed for hipify v2 readiness

f118862

cmake test for hipify v2

bc2775e

introduce HIPIFY_V2

cac5d94

fix

4b7113e

pytorch-bot bot added the module: rocm label Sep 10, 2025

meta-cla bot added the cla signed label Sep 10, 2025

Merge branch 'main' into rocm_hipify_v2_fixes

eeb2130

atalman approved these changes Sep 12, 2025

View reviewed changes

q10 reviewed Sep 12, 2025

View reviewed changes

rename tuning_cache.hpp to tuning_cache.cuh

148c5d7

facebook-github-bot closed this in 072e323 Oct 1, 2025

facebook-github-bot added the Merged label Oct 1, 2025

[ROCm] more hipify v2 fixes #4854

[ROCm] more hipify v2 fixes #4854

Uh oh!

Conversation

jeffdaily commented Sep 10, 2025

Uh oh!

netlify bot commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Uh oh!

facebook-github-bot commented Sep 11, 2025

Uh oh!

facebook-github-bot commented Sep 11, 2025

Uh oh!

q10 commented Sep 11, 2025

Uh oh!

jeffdaily commented Sep 12, 2025

Uh oh!

facebook-github-bot commented Sep 12, 2025

Uh oh!

q10 Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeffdaily Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

q10 Sep 13, 2025

Choose a reason for hiding this comment

Uh oh!

jeffdaily Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

q10 Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

q10 Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

jeffdaily Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

jeffdaily Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Oct 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

netlify bot commented Sep 10, 2025 •

edited

Loading

q10 Sep 12, 2025 •

edited

Loading