Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for writing LLVM 5 output when in opaque pointer mode #5

Merged
merged 1 commit into from
Jun 7, 2024

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented Jun 3, 2024

Although not required on LLVM 16 or below, where we can just use --opaque-pointers=0, on LLVM 17 this mode has been removed so we need to support targeting LLVM 5/7 in opaque pointer mode. Needless to say, this is tricky. Not only are opaque pointers unsupported on LLVM 5/7, the fact that opaque pointers are indistinguishable makes it hard to emit the requires type information where needed.

In this PR, I add support for LLVM 16 to target LLVM 5 in opaque pointer mode (all to be generalized to other LLVM versions once the concept has been proven). The core idea is that, before writing out the IR, we pre-process the module to add no-op calls to pointer conversion routines (no-op in that they take a ptr and return one), adding metadata with pointer element type information as identified by looking at the instruction in detail. Then, during serialization, we detect calls to these intrinsics and lower them to actual bitcasts from a pseudo-opaque pointer type {}* to an actual typed pointer (repurposing the TypedPointerType that was added for DXIL), and vice versa. In addition, we spoof the emitted bitcode at various places to ensure the expected type information is present.

For example, starting with:

define i32 @foobar(i1 %cond) {
  %box = alloca i32, i32 0
  %ptr = select i1 %cond, ptr null, ptr %box
  %value = load i32, ptr %ptr
  ret i32 %value
}

This is pre-processed to:

define i32 @foobar(i1 %cond) {
  %box = alloca i32, i32 0, align 4, !ptr_type !0
  %opaque_box = call ptr @llvm.opaque_ptr.p0(ptr %box)
  %ptr = select i1 %cond, ptr null, ptr %opaque_box
  %typed_ptr = call ptr @llvm.typed_ptr.p0(ptr %ptr), !ptr_type !0
  %value = load i32, ptr %typed_ptr, align 4
  ret i32 %value
}

declare ptr @llvm.opaque_ptr.p0(ptr %0)

declare ptr @llvm.typed_ptr.p0(ptr %0)

!0 = !{i32 0}

And is then written out as:

define i32 @foobar(i1 %cond) {
  %box = alloca i32, i32 0, align 4
  %opaque_box = bitcast i32* %box to {}*
  %ptr = select i1 %cond, {}* null, {}* %opaque_box
  %typed_ptr = bitcast {}* %ptr to i32*
  %value = load i32, i32* %typed_ptr, align 4
  ret i32 %value
}

It seems like this approach may just work, as I'm already passing a fair number of nontrivial tests of Metal.jl.

@maleadt maleadt marked this pull request as draft June 3, 2024 14:33
@maleadt maleadt force-pushed the tb/opaque_pointers branch 3 times, most recently from e4d5137 to 7ce3abb Compare June 5, 2024 13:50
@maleadt
Copy link
Member Author

maleadt commented Jun 5, 2024

Looking good. This passes Metal.jl tests, with the exception of code that calls intrinsics expecting a typed pointer:

@threadgroup_memory = internal addrspace(3) global [128 x i8] undef
declare <64 x half> @air.simdgroup_matrix_8x8_load.v64f16.p3f16(ptr addrspace(3))

define void @kernel() {
  %0 = call <64 x half> @air.simdgroup_matrix_8x8_load.v64f16.p3f16(ptr addrspace(3) @threadgroup_memory)
  ret void
}

Which is rewritten to:

@threadgroup_memory = internal addrspace(3) global [128 x i8] undef
declare <64 x half> @air.simdgroup_matrix_8x8_load.v64f16.p3f16({} addrspace(3)*)

define void @kernel() {
  %opaque_threadgroup_memory = call ptr addrspace(3) @llvm.opaque_ptr.p3(ptr addrspace(3) @threadgroup_memory), !ptr_type !0
  %0 = call <64 x half> @air.simdgroup_matrix_8x8_load.v64f16.p3f16(ptr addrspace(3) %opaque_threadgroup_memory)
  ret void
}

declare ptr addrspace(3) @llvm.opaque_ptr.p3(ptr addrspace(3) %0)
!0 = !{[128 x i8] zeroinitializer}

And finally written as:

@threadgroup_memory = internal addrspace(3) global [128 x i8] undef
declare <64 x half> @air.simdgroup_matrix_8x8_load.v64f16.p3f16({} addrspace(3)*)

define void @kernel() {
  %opaque_threadgroup_memory = bitcast [128 x i8] addrspace(3)* @threadgroup_memory to {} addrspace(3)*
  %0 = call <64 x half> @air.simdgroup_matrix_8x8_load.v64f16.p3f16({} addrspace(3)* %opaque_threadgroup_memory)
  ret void
}

declare {} addrspace(3)* @llvm.opaque_ptr.p3({} addrspace(3)*)

... which is expected, but breaks the Metal back-end as it expects correctly-typed inputs to the intrinsic. The way I'm going to fix this, is by having Metal.jl provide IR with type hints:

@threadgroup_memory = internal addrspace(3) global [128 x i8] undef
declare <64 x half> @air.simdgroup_matrix_8x8_load.v64f16.p3f16(ptr addrspace(3))

define void @kernel() {
  %typed_threadgroup_memory = call ptr addrspace(3) @llvm.opaque_ptr.p3(ptr addrspace(3) @threadgroup_memory), !ptr_type !0
  %1 = call <64 x half> @air.simdgroup_matrix_8x8_load.v64f16.p3f16(ptr addrspace(3) %typed_threadgroup_memory)
  ret void
}

declare ptr addrspace(3) @llvm.opaque_ptr.p3(ptr addrspace(3) %0)
!0 = !{[128 x i8] zeroinitializer}

I think there's no real way around the front-end providing these hints, because there may be no reasonable way to infer the pointer element type (poking another hole in the DXIL-based approach).

@maleadt
Copy link
Member Author

maleadt commented Jun 6, 2024

Changed the front-end provided metadata format:

declare !arg_eltypes !2 <64 x half> @air.simdgroup_matrix_8x8_load.v64f32.p3f32(ptr addrspace(3))
define void @typed_intrinsic_call() {
  %1 = call <64 x half> @air.simdgroup_matrix_8x8_load.v64f32.p3f32(ptr addrspace(3) @threadgroup_memory)
  ret void
}
!2 = !{i32 0, [128 x i8] zeroinitializer}

This is easier to add by the front-end, because we can keep the existing ccalls in Metal.jl and just have a pass adding type metadata to intrinsics.

With this, all Metal.jl tests pass!

@maleadt maleadt force-pushed the tb/opaque_pointers branch 7 times, most recently from b0a0638 to 116d531 Compare June 7, 2024 09:08
@maleadt maleadt marked this pull request as ready for review June 7, 2024 09:50
@maleadt maleadt force-pushed the tb/opaque_pointers branch from 116d531 to 1f6be47 Compare June 7, 2024 09:50
@maleadt
Copy link
Member Author

maleadt commented Jun 7, 2024

Alright, this seems to work fine. Let's merge and apply it to the LLVM 7.0 writer and to the LLVM 17 branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant