Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specialize permutedims kernel for the permutation. #338

Merged
merged 2 commits into from
Nov 30, 2020
Merged

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented Nov 30, 2020

About a 30% performance improvement. I assume it's reasonable to specialize the kernel on the permutation, if we're using tuples in the first place. We can always add a fully-dynamic kernel that could also work with Vector-typed inputs.

julia> src = CuArray(rand(Complex{Float64}, 256, 256, 256));

julia> dst = similar(src);

julia> @benchmark CUDA.@sync permutedims!(dst, src, (2, 1, 3))
BenchmarkTools.Trial: 
  memory estimate:  576 bytes
  allocs estimate:  24
  --------------
  minimum time:     1.532 ms (0.00% GC)
  median time:      1.561 ms (0.00% GC)
  mean time:        1.614 ms (0.00% GC)
  maximum time:     16.173 ms (0.00% GC)
  --------------
  samples:          3094
  evals/sample:     1

julia> @benchmark CUDA.@sync permutedims!(dst, src, (2, 1, 3))
BenchmarkTools.Trial: 
  memory estimate:  512 bytes
  allocs estimate:  22
  --------------
  minimum time:     2.226 ms (0.00% GC)
  median time:      2.262 ms (0.00% GC)
  mean time:        2.333 ms (0.00% GC)
  maximum time:     14.548 ms (0.00% GC)
  --------------
  samples:          2141
  evals/sample:     1

@ali-ramadhan
Copy link

This is a super helpful speedup, thank you @maleadt!

@maleadt maleadt merged commit 292848f into master Nov 30, 2020
@bors bors bot deleted the tb/permutedims branch November 30, 2020 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants