simplify test machinery #2498

CarloLucibello · 2024-10-12T15:47:23Z

This PR introduces a single function test_gradients that can be used across all possible differentiation scenarios (on cpu and any gpu backend, on layer or simple functions...).

CarloLucibello · 2024-10-13T09:20:13Z

@pxl-th I'm seeing something odd with the convolutions on AMDGPU
Could you test locally the following example? Looks like both assertions fail:

using Flux, AMDGPU, Zygote

m = Conv((2, 2), 3 => 4)
x = rand(Float32, 5, 5, 3, 2)

y, g = Zygote.withgradient(x -> sum(m(x)), x)

dev = gpu_device(force=true)    
m_gpu = m |> dev
x_gpu = x |> dev
y_gpu, g_gpu = Zygote.withgradient(x -> sum(m_gpu(x)), x_gpu)

@assert y_gpu ≈ y  atol=1e-4
@assert Array(g_gpu) ≈ g  atol=1e-4

pxl-th · 2024-10-13T13:15:31Z

@CarloLucibello something broke cpu-gpu transfer for convolutions.
MIOpen only supports cross-correlation, so we need to transpose filters when moving to gpu.

This is not executed anymore:

Flux.jl/ext/FluxAMDGPUExt/functor.jl

Line 89 in 09a16ee

function Adapt.adapt_structure(to::FluxAMDGPUAdaptor, m::CPU_CONV)

CarloLucibello · 2024-10-13T14:28:07Z

Ah right, that is when using MLDataDevices.gpu_device. Flux.gpu should work fine instead.
I'll try to fix it.

CarloLucibello · 2024-10-13T15:44:26Z

I investigated a bit and hooking up into MLDataDevices's data transfer mechanism is not straightforward. So I propose we merge this PR as it is and proceed with a fix later. The problem is not introduced by this PR in any case, it was pre-existing and now just exposed.

Can I get an approve?

pxl-th

Yes, I can take a look at it as well tomorrow, we can fix this is a separate PR.

CarloLucibello · 2024-10-13T17:00:03Z

I think the solution involves modifying this line
https://github.com/LuxDL/MLDataDevices.jl/blob/71ed455bb2a898a128b32aec0a67ba44fe8321d7/src/public.jl#L340
to use some custom and publicly exposed isleaf that we can hook into from Flux. Cc @avik-pal

simplify test machinery

6defd04

CarloLucibello marked this pull request as draft October 12, 2024 15:47

CarloLucibello added 8 commits October 12, 2024 18:03

fix

a8af95d

fix

b94cf64

fix

77a9611

f64

18ab9b1

fixes

9367b95

fix

088565f

fix cuda test

fde9664

tweaks

13ddabd

CarloLucibello added 2 commits October 13, 2024 12:53

fix cuda device

a2c92ba

test less

fbf66c4

CarloLucibello marked this pull request as ready for review October 13, 2024 12:10

fix cuda tests

18a8a11

CarloLucibello requested a review from pxl-th October 13, 2024 15:44

pxl-th approved these changes Oct 13, 2024

View reviewed changes

CarloLucibello merged commit 35b893a into master Oct 13, 2024
6 of 9 checks passed

CarloLucibello deleted the cl/checkgrad branch October 13, 2024 17:12

This was referenced Oct 16, 2024

Define isleaf LuxDL/MLDataDevices.jl#84

Merged

make gpu(x) = gpu_device()(x) #2502

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

simplify test machinery #2498

simplify test machinery #2498

CarloLucibello commented Oct 12, 2024 •

edited

Loading

CarloLucibello commented Oct 13, 2024

pxl-th commented Oct 13, 2024 •

edited

Loading

CarloLucibello commented Oct 13, 2024

CarloLucibello commented Oct 13, 2024

pxl-th left a comment

CarloLucibello commented Oct 13, 2024

simplify test machinery #2498

simplify test machinery #2498

Conversation

CarloLucibello commented Oct 12, 2024 • edited Loading

CarloLucibello commented Oct 13, 2024

pxl-th commented Oct 13, 2024 • edited Loading

CarloLucibello commented Oct 13, 2024

CarloLucibello commented Oct 13, 2024

pxl-th left a comment

Choose a reason for hiding this comment

CarloLucibello commented Oct 13, 2024

CarloLucibello commented Oct 12, 2024 •

edited

Loading

pxl-th commented Oct 13, 2024 •

edited

Loading