-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMD][BACKEND] Switch to code object v5 #5005
base: main
Are you sure you want to change the base?
Conversation
@@ -311,7 +311,7 @@ jobs: | |||
runner: ${{fromJson(needs.Runner-Preparation.outputs.matrix-HIP)}} | |||
name: Integration-Tests (${{matrix.runner[1] == 'gfx90a' && 'mi210' || 'mi300x'}}) | |||
container: | |||
image: rocm/pytorch:rocm6.1_ubuntu22.04_py3.10_pytorch_2.4 | |||
image: rocm/pytorch:rocm6.2.2_ubuntu22.04_py3.10_pytorch_release_2.3.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find rocm 6.2 images with pytorch 2.4. Maybe someone could point me to them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need pytorch 2.4?
Previously we need pytorch 2.3 (or 2.2) for fp8 support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall some interpreter tests failed due to old pytorch version (which didn't support some types). I cannot recall the exact details though.
8cf7619
to
d5b5d81
Compare
d5b5d81
to
36fa63c
Compare
@@ -234,5 +234,5 @@ Supported Platforms: | |||
Supported Hardware: | |||
|
|||
- NVIDIA GPUs (Compute Capability 8.0+) | |||
- AMD GPUs (ROCm 5.2+) | |||
- AMD GPUs (ROCm 6.2+) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So rocm 5.x is no longer supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to the switch to cov5 test_optimize_thread_locality
and some device_prints
(the new test) will cause segfaults on 6.1 and 6.0, I haven't tested 5.x versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to chat a bit regarding when to land this then, given it's a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for figuring out the docker issue! We need to separate this pull request into two parts--one updating the docker, the other to follow to switch to COv5.
…ch 2.5.1 installation
Switches to code object v5 which requires to bump
rocm
to6.2+
to avoid segfaults fordevice_prints
andtl.num_programs
.The added unit test covers the previous
segfault
withdevice_prints
for 2d tensors.This is a preparation to test llvm/llvm-project@c4d8920
I am not making a trivial change, such as fixing a typo in a comment.
I have written a PR description following these
rules.
I have run
pre-commit run --from-ref origin/main --to-ref HEAD
.Select one of the following.
/test
forlit
tests/unittest
for C++ tests/python/test
for end-to-end testsFILL THIS IN
.Select one of the following.
lit
tests.lit
tests I have added follow these best practices,including the "tests should be minimal" section. (Usually running Python code
and using the instructions it generates is not minimal.)