Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

find rocm components individually #2567

Merged

Conversation

afzpatel
Copy link

Required for spack since all the rocm components are located in different paths

@afzpatel afzpatel requested review from i-chaochen and hsharsha June 10, 2024 18:59
@afzpatel afzpatel force-pushed the find-rocm-components-individually branch from 9011d62 to abaaa4d Compare June 11, 2024 11:15
@i-chaochen
Copy link

retest Ubuntu-GPU-multi please
retest Ubuntu-CPU please

@hsharsha
Copy link

retest Ubuntu-CPU please
retest Ubuntu-GPU-single please

@afzpatel
Copy link
Author

retest Ubuntu-GPU-single please

Copy link

@hsharsha hsharsha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@i-chaochen
Copy link

retest Ubuntu-GPU-single please

2 similar comments
@afzpatel
Copy link
Author

retest Ubuntu-GPU-single please

@afzpatel
Copy link
Author

retest Ubuntu-GPU-single please

@afzpatel afzpatel marked this pull request as ready for review June 18, 2024 09:36
@afzpatel
Copy link
Author

I'm not able to reproduce the ci build failure for Ubuntu-GPU-single on my local machine, any ideas what the issue might be?

@afzpatel
Copy link
Author

retest Ubuntu-GPU-single please

@hsharsha
Copy link

I'm not able to reproduce the ci build failure for Ubuntu-GPU-single on my local machine, any ideas what the issue might be?

I think it is landing on a node where the rocm driver is not stable which causes the tests to fail. If you can add the logs to your local run, here we can disregard the CI failure and merge this change.

@i-chaochen
Copy link

retest Ubuntu-GPU-single please

1 similar comment
@i-chaochen
Copy link

retest Ubuntu-GPU-single please

@i-chaochen
Copy link

I'm not able to reproduce the ci build failure for Ubuntu-GPU-single on my local machine, any ideas what the issue might be?

It's because one of nodes has driver issue, we already marked it offline and I'm tring to retest it another node.

@afzpatel
Copy link
Author

Here's a log of my test run on mi100. There are only 2 test failures, one of which is that the test failed to allocate enough memory on the device. The errors I was getting on the ci are not present in the local run.
mi100_command.log

@i-chaochen
Copy link

retest Ubuntu-GPU-single please

@i-chaochen
Copy link

Here's a log of my test run on mi100. There are only 2 test failures, one of which is that the test failed to allocate enough memory on the device. The errors I was getting on the ci are not present in the local run. mi100_command.log

Yes, it wouldn't have any problem because I can see gpu-pycpp is green

@afzpatel afzpatel merged commit c467913 into ROCm:develop-upstream Jun 19, 2024
8 of 9 checks passed
@afzpatel
Copy link
Author

Now that the pr is merged, this change needs to be backported to the r2.14-rocm-enhanced-spack branch. Also another branch called r2.16-rocm-enhanced-spack should be created with this change and f4f4e86

@hsharsha
Copy link

f4f4e86
Instead of creating a new branch w can we just use that as patch and apply it during spack tf installtion? That way we can use the rocm-2.xx-enhanced along with the patch.

@afzpatel
Copy link
Author

f4f4e86 Instead of creating a new branch w can we just use that as patch and apply it during spack tf installtion? That way we can use the rocm-2.xx-enhanced along with the patch.

Sounds good, I'll try that out.

@afzpatel afzpatel deleted the find-rocm-components-individually branch June 20, 2024 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants