adding preliminary AMDGPU support #99

leios · 2022-09-12T09:28:22Z

I am still running the tests, so this PR is still a draft and might change in the next few days, but it adds preliminary support for AMDGPU devices.

There were a few small things to note:

I modified every part of the code with CuArray to use an Array Type AT value instead. There were a few places (such as in zygote.jl with internal functions that were only used for the GPU. In these cases, AT became a function argument. In most cases, AT could be created without any change to the function calls, themselves.
In a few cases (such as interactions/implicit_solver.jl), I added a new conditional with ROCArray instead of CuArray
In a few cases, the System(...) call was modified to try and figure out what device you wanted to use if you set gpu to true, but will default to CUDA if both GPUs are present and things are left unspecified
The tests were modified to have run_cuda_tests and run_rocm_tests. It will then populate the array gpu_array_types with CuArray, ROCArray, or both. I tried to modify all the tests to use the array types from this array.

Again, still messing around with things now, but thought I would put the draft PR up in the case that there are design decisions to talk about.

jgreener64 · 2022-09-12T14:19:09Z

Thanks for doing this. I think this is the right idea and from a quick look the implementation seems good. Let me know when it is ready for review.

leios · 2022-09-15T08:29:27Z

In principle, the PR runs now, but I am getting some scalar indexing in the tests somewhere that I need to fix. I've been noticing scalar indexing in other Molly runs for a while now, but haven't been able to pin down where they are happening or if the issue is amd-specific, so I'll keep playing around here and try to find the problem.

leios · 2022-09-15T08:35:39Z

accumulateadd(x) = accumulate(+, x)

# Accumulate values in an array based on the ordered boundaries in bounds
# Used to speed up views with repeated indices on the GPU when you have the boun
ds
@views @inbounds function accumulate_bounds(arr, bounds)
    zf = zero(arr[1:1])
    accum_pad = vcat(zf, accumulateadd(arr))
    accum_bounds = accum_pad[bounds]
    accum_bounds_offset = vcat(zf, accum_bounds[1:(end - 1)])
    return accum_bounds .- accum_bounds_offset
end

It seems to be here with the accumulate function. I'll see if there's a simple solution...

…erted later

jgreener64 · 2022-09-15T11:54:45Z

I have noticed scalar indexing occasionally when using Molly too, though it is disallowed in the current tests so I think that case is AMD-specific.

I am not against allowing scalar indexing in specified places if performance isn't too bad. The accumulate_bounds function is a hack to allow faster AD with Zygote. It will be removed entirely later with a switch to GPU kernels. If overall performance on AMD is okay I wouldn't worry too much.

leios · 2022-09-22T06:01:54Z

Since the code runs, but tests are not quite passing, I think it's best to keep this PR open and refactor it into a KernelAbstractions PR on top of the kernels branch when it is done. In the mean time, I'll just figure out the AMD-specific bugs and push them here.

jgreener64 · 2022-09-22T12:06:54Z

Sounds good.

leios · 2024-10-24T15:06:12Z

Closing as it is outdated and #147 supersedes it.

adding preliminary AMDGPU support

00f8fc1

leios added 2 commits September 14, 2022 13:30

it compiles and runs basic examples on CPU and GPU. Now for tests

a3481fb

fixing tests

b42d77a

leios added 4 commits September 15, 2022 05:32

typo in runtest

a1db524

new attempt at move_array

7243fdf

attempting to prevent CI from running on my draft PR. This can be rev…

9df920c

…erted later

one more typo...

ebc4f76

This was referenced Sep 15, 2022

Differentiable GPU kernels #60

Open

GPU tests fail on GTX970 and P100 #57

Closed

jgreener64 mentioned this pull request Sep 28, 2022

Roadmap and ideas for Molly.jl development #2

Open

leios mentioned this pull request Sep 7, 2023

KernelAbstractions support #147

Draft

leios closed this Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding preliminary AMDGPU support #99

adding preliminary AMDGPU support #99

leios commented Sep 12, 2022

jgreener64 commented Sep 12, 2022

leios commented Sep 15, 2022

leios commented Sep 15, 2022

jgreener64 commented Sep 15, 2022

leios commented Sep 22, 2022

jgreener64 commented Sep 22, 2022

leios commented Oct 24, 2024

adding preliminary AMDGPU support #99

adding preliminary AMDGPU support #99

Conversation

leios commented Sep 12, 2022

jgreener64 commented Sep 12, 2022

leios commented Sep 15, 2022

leios commented Sep 15, 2022

jgreener64 commented Sep 15, 2022

leios commented Sep 22, 2022

jgreener64 commented Sep 22, 2022

leios commented Oct 24, 2024