enable Gpu address spaces #10884

gwenzek · 2022-02-13T20:40:34Z

Follow up on #10189 , this enable GPU specific address spaces leveraging previous work from @Snektron in #9649.

Right now this is directly mapping to the LLVM address spaces, but with Snektron we didn't found good reason to deviate from this and we believe it should be compatible across different architectures.

Note: this is still experimental feature because Zig doesn't have any officially supported GPU target.

gwenzek · 2022-02-14T09:04:11Z

Tagging @Vexu who reviewed the previous PR (I'm not able to assign reviewers, not sure if it's intended or not)

Snektron

Looks good to me

src/Sema.zig

Snektron · 2022-02-14T09:24:41Z

One thing you might also want to implement is the logic for obtaining the default address space in a given context for this architecture:

zig/src/target.zig

Lines 598 to 614 in f506810

    
           pub fn defaultAddressSpace( 
        
               target: std.Target, 
        
               context: enum { 
        
                   /// Query the default address space for global constant values. 
        
                   global_constant, 
        
                   /// Query the default address space for global mutable values. 
        
                   global_mutable, 
        
                   /// Query the default address space for function-local values. 
        
                   local, 
        
                   /// Query the default address space for functions themselves. 
        
                   function, 
        
               }, 
        
           ) std.builtin.AddressSpace { 
        
               _ = target; 
        
               _ = context; 
        
               return .generic; 
        
           }

Currently this just returns the generic address space, but since this is not allowed for SPIR-V, the idea was to assign a default address space for each context. This would for example set const x = y; at global scope to constant, and function-local variables to local. Do you happen to know if theres a big difference between the performance of different address spaces? From the LLVM NVPTX Docs it looks like the the default address space is generic, and I assume this is what CUDA also uses, but I'm not very sure about that. If CUDA instead makes function-locals local (address space 5) instead, it might be best to replicate that in Zig.

gwenzek · 2022-02-14T16:56:51Z

Do you happen to know if there's a big difference between the performance of different address spaces?

So when you dereference a generic pointer, the GPU runtime need to first identify what kind of pointer it is before dereferencing.
I don't know how costly it is, but it's certainly better to use qualified pointers as much as possible.

The slowest memory are local and global, then constant memory is a bit faster, then shared memory is significantly faster and finally registers is the fastest memory.

the shared memory is on chip memory, and compete with memory of the L1 cache (IIRC, need to find a source), it's generally act as a user controlled cache.
constant address space can be the same than global on some Nvidia GPUs, but is otherwise faster because it has dedicated caching
local memory is generally only used for kernels that require too many registers.

you might also want to implement is the logic for obtaining the default address space

I've tried based on your recommendation, but I think it will require more work. It tend to generate compile errors when
generating the builtin.zig for nvptx64

error: expected *const target.Target.Cpu.Model, found *addrspace(.constant) const target.Target.Cpu.Model

I've added the commit for reference, but will probably rollback since it's seems mostly useless for now:

zig/src/target.zig

Lines 604 to 614 in aa7acf0

    
           .nvptx, .nvptx64 => switch (context) { 
        
               // TODO: this should be .constant, but it generates an error when generating the builtin.zig for nvptx64 
        
               // error: expected *const target.Target.Cpu.Model, found *addrspace(.constant) const target.Target.Cpu.Model 
        
               .global_constant => .generic, 
        
               .global_mutable => .global, 
        
               // Note: local memory is similar in term of performance to global memory 
        
               // we generally don't want to use it. The device will decide to do so if 
        
               // needed, when loading the assembly. 
        
               .local => .generic, 
        
               .function => .generic, 
        
           },

What would be useful would be to mark kernel parameters that are pointers as pointers to global memory, but I haven't found a way to do so.

I'm also not sure if we should expose the .local address space. For Cuda at least, it's not something we should use. It's only use when the device doesn't have enough registers to compute a given kernel, but it's quite slow.

Snektron · 2022-02-17T11:15:37Z

you might also want to implement is the logic for obtaining the default address space

I've tried based on your recommendation, but I think it will require more work. It tend to generate compile errors when generating the builtin.zig for nvptx64

error: expected *const target.Target.Cpu.Model, found *addrspace(.constant) const target.Target.Cpu.Model

I've added the commit for reference, but will probably rollback since it's seems mostly useless for now:

Hm, maybe need to experiment a bit more with that. Let's address that another time.

What would be useful would be to mark kernel parameters that are pointers as pointers to global memory, but I haven't found a way to do so.

Do you mean automatically? Perhaps it could be inferred based on the function calling convention, I am having second thoughts about the whole "automatically inferring address spaces" thingy anyway. Perhaps things should remain more explicit?

In the explicit (and current) case it would look like

export fn kernel(ptr: *addrspace(.global) u32) callconv(.PtxKernel) void {
  ...
}

I'm also not sure if we should expose the .local address space. For Cuda at least, it's not something we should use. It's only use when the device doesn't have enough registers to compute a given kernel, but it's quite slow.

Hm i see. The AMDGPU backend also exposes a local address space, and it seems that any alloca instruction automatically gains this address space (so in that sense its the correct mental model for local variables). How does this work for PTX? Do we need to avoid alloca all together?

src/Sema.zig

gwenzek · 2022-02-17T22:11:47Z

The AMDGPU backend also exposes a local address space, and it seems that any alloca instruction automatically gains this address space. How does this work for PTX?

PTX does also have alloca but it's considered an experimental feature: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#stack-manipulation-instructions-alloca
The returned pointer is effectively in .local space.
The code that I saw generated is always fully inlined, and there is no function calls nor alloca I don't know if it's because my kernels are too simples, or is that's just a common practice for PTX format.

gwenzek · 2022-02-20T21:18:08Z

Im also not sure if global should be valid for anything thats not a pointer. It should in theory be possible to declare a local and shared global value, but im not sure whether it can be initialized at all.

So PTX can initialize .global and .constant. It seems to also be possible with AMDGPU, but it's a bit harder.
Maybe it won't be possible on all platforms but let's be permissive for now.
.local and .shared can't be preinitialized, but I haven't found a way to prevent this in the Zig frontend, so the errors are triggered in the backend.

relevant ptx documentation
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#state-spaces__properties-state-spaces

gwenzek · 2022-02-21T22:12:27Z

Thanks for the insightful exchanges Robin,
and thanks for the reviews !

Time to go back making more examples of GPU Zig programming :-)

gwenzek changed the title ~~introduces Gpu address spaces~~ enable Gpu address spaces Feb 13, 2022

Snektron approved these changes Feb 14, 2022

View reviewed changes

src/Sema.zig Outdated Show resolved Hide resolved

Snektron requested changes Feb 17, 2022

View reviewed changes

src/Sema.zig Outdated Show resolved Hide resolved

gwenzek force-pushed the nvptx_addr branch 2 times, most recently from 06941f6 to 632e147 Compare February 17, 2022 21:54

gwenzek force-pushed the nvptx_addr branch from 632e147 to 33a1d86 Compare February 18, 2022 15:17

Vexu approved these changes Feb 19, 2022

View reviewed changes

gwenzek force-pushed the nvptx_addr branch from 33a1d86 to 5dd6d14 Compare February 20, 2022 21:42

gwenzek added 2 commits February 21, 2022 09:48

introduces Gpu address spaces

f7f563f

add some tests

73228ee

gwenzek force-pushed the nvptx_addr branch from 5dd6d14 to 73228ee Compare February 21, 2022 08:48

andrewrk merged commit 628e9e6 into ziglang:master Feb 21, 2022

gwenzek deleted the nvptx_addr branch February 21, 2022 22:12

nektro mentioned this pull request Jan 14, 2023

langref: document addrspace Pointer attribute #14313

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable Gpu address spaces #10884

enable Gpu address spaces #10884

gwenzek commented Feb 13, 2022 •

edited

Loading

gwenzek commented Feb 14, 2022

Snektron left a comment

Snektron commented Feb 14, 2022 •

edited

Loading

gwenzek commented Feb 14, 2022

Snektron commented Feb 17, 2022

gwenzek commented Feb 17, 2022

gwenzek commented Feb 20, 2022

gwenzek commented Feb 21, 2022

enable Gpu address spaces #10884

enable Gpu address spaces #10884

Conversation

gwenzek commented Feb 13, 2022 • edited Loading

gwenzek commented Feb 14, 2022

Snektron left a comment

Choose a reason for hiding this comment

Snektron commented Feb 14, 2022 • edited Loading

gwenzek commented Feb 14, 2022

Snektron commented Feb 17, 2022

gwenzek commented Feb 17, 2022

gwenzek commented Feb 20, 2022

gwenzek commented Feb 21, 2022

gwenzek commented Feb 13, 2022 •

edited

Loading

Snektron commented Feb 14, 2022 •

edited

Loading