Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: semantics for address spaces #15232

Open
11 of 12 tasks
Snektron opened this issue Apr 10, 2023 · 1 comment
Open
11 of 12 tasks

Proposal: semantics for address spaces #15232

Snektron opened this issue Apr 10, 2023 · 1 comment
Labels
accepted This proposal is planned. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@Snektron
Copy link
Collaborator

Snektron commented Apr 10, 2023

Some architectures (namely: GPUs, some embedded devices) have different address spaces that are to be used for different purposes. Zig already has some support for address spaces (see issue #653), but not all features I have in mind for these are currently implemented. This issue serves both as an official "proposal" for the stuff not yet implemented, and as a reference / discussion place for what has.

To recap the basic idea from #653 quickly, the basic idea is to ascribe each pointer with an "address space" tag. This can currently be done in Zig using the addrspace keyword, which accepts any of the enum values from builtin.AddressSpace. Some basic rules. Checkmark checked means that its implemented:

  • 1. An address space can be regarded as a different physical place to store memory. This means two pointers which have two different address spaces but hold the same integer value may point to two different objects.
  • 2. Address spaces may overlap: It is allowed for two pointers with different address spaces to point to the same object or the same physical memory. Note that the address space may imply different load/store semantics, depending on the target architecture.
  • 3. Variables may be placed in a specific address space using the addrspace keyword: var a: i32 addrspace(.shared) = undefined;. Taking the address of such a variable yields a pointer to this same address space: &a yields *addrspace(.shared) i32.
  • 4. The target architecture determines whether a variable placed in a particular address space may be initialized. If not, they must always be initialized with undefined.
  • 5. Placing variables in specific address spaces may only be done in namespace scope, not in function scope.
    • Side note: CUDA and HIP allow shared memory to be declared in function scope. Perhaps this rule should be refined?

Apart from "real"/"specific" address spaces, there is one special address space: the generic address space. This is the "default" address space, and should be supported on all architectures. Some rules regarding the generic address space:

  • 6. If a pointer does have the addrspace attribute, it points to the generic address space. *addrspace(.generic) T and *T are synonymous.
  • 7. The generic address space may either be a real address space (as on most architectures), or a meta-address space that abstracts the real address space a pointer points to.
    • Side note: Some GPU architectures do not have a real generic address space. For these, LLVM tries to magically resolve the real address space.
  • 8. Pointers pointing to the generic address space may be cast to specific address spaces using @addrSpaceCast, and pointers pointing to specific address spaces may be cast to the generic address space. If such a cast is not possible, depending on the target architecture, a compile error should be emitted.
  • 9. Casting a pointer to a different address space and casting it back does not need to yield the same pointer value, but does need to point to the same object.
    • Side note: Is this problematic? TODO: Look into how other languages formalize this.
  • 10. Dereferencing a non-generic pointer which' pointee does not reside in the pointer's address space is undefined behavior.
    • Side note: This is to make behavior like this invalid: @addrSpaceCast(.shared, @addrSpaceCast(.generic, global_ptr)).* = 123;.
  • 11. Taking the address of a variable which is is not explicitly placed in an address space yields a generic pointer. The backing address space may be different, dependending on the target.
    • Side note: Some architectures, like AMDGPU and SPIR-V, require a specific address space for function locals. If taking the address of these variables would produce an address space different than generic would break too much code. This also goes for global variables.
  • 12. The size of a pointer may be different depending on the address space. For example, @sizeOf(*addrspace(.shared) i32) may yield 4, while @sizeOf(*i32) may yield 8. The size of a generic pointer should be equal to the size of usize.
    • Side note: This provides a solution for Proposal: usize definition should be refined #5185, the size type of a particular address space could be constructed as follows:
      fn sizeType(as: AddrSpace) type {
        return @Type(.{.Int = .{
          .signedness = .unsigned,
          .bits = @sizeOf(*addrspace(as) i32) * 8,
        }});
      }
    • Side note: This requires quite a large overhaul in the compiler, I counted over 200 call sites to ptrSize that would need to be changed.
@andrewrk andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Apr 10, 2023
@andrewrk andrewrk added this to the 0.11.0 milestone Apr 10, 2023
@andrewrk andrewrk added the accepted This proposal is planned. label Apr 10, 2023
@andrewrk andrewrk modified the milestones: 0.11.0, 0.12.0 Apr 10, 2023
@matu3ba
Copy link
Contributor

matu3ba commented Apr 10, 2023

The size of a generic pointer should be equal to the size of usize.

This will not work with CHERI. See this blog post https://tratt.net/laurie/blog/2022/making_rust_a_better_fit_for_cheri_and_other_platforms.html for context:

"The major issue is, in my opinion, much more surprising. In essence, most (all?) CHERI devices allow traditional (single width) pointers to be used alongside (double width) capabilities. Conventionally a program which uses only capabilities is said to be compiled and running in "pure capability mode" while a program which uses both traditional pointers and capabilities is said to be compiled and running in "hybrid mode" [9]. Most discussion around CHERI presupposes pure capability mode, but the lesser known hybrid mode has many uses [10]. Hybrid mode does, however, mean that we can no longer assume that all pointers are capabilities."

As far as it looks to me, LLVM still has not cleared up their pointer semantics/provenance model yet necessary for upstreaming the CHERI backend and offering guidance here.

@andrewrk andrewrk modified the milestones: 0.13.0, 0.12.0 Jun 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted This proposal is planned. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

3 participants