-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for writing LLVM 5 output when in opaque pointer mode #5
Conversation
e4d5137
to
7ce3abb
Compare
Looking good. This passes Metal.jl tests, with the exception of code that calls intrinsics expecting a typed pointer: @threadgroup_memory = internal addrspace(3) global [128 x i8] undef
declare <64 x half> @air.simdgroup_matrix_8x8_load.v64f16.p3f16(ptr addrspace(3))
define void @kernel() {
%0 = call <64 x half> @air.simdgroup_matrix_8x8_load.v64f16.p3f16(ptr addrspace(3) @threadgroup_memory)
ret void
} Which is rewritten to: @threadgroup_memory = internal addrspace(3) global [128 x i8] undef
declare <64 x half> @air.simdgroup_matrix_8x8_load.v64f16.p3f16({} addrspace(3)*)
define void @kernel() {
%opaque_threadgroup_memory = call ptr addrspace(3) @llvm.opaque_ptr.p3(ptr addrspace(3) @threadgroup_memory), !ptr_type !0
%0 = call <64 x half> @air.simdgroup_matrix_8x8_load.v64f16.p3f16(ptr addrspace(3) %opaque_threadgroup_memory)
ret void
}
declare ptr addrspace(3) @llvm.opaque_ptr.p3(ptr addrspace(3) %0)
!0 = !{[128 x i8] zeroinitializer} And finally written as: @threadgroup_memory = internal addrspace(3) global [128 x i8] undef
declare <64 x half> @air.simdgroup_matrix_8x8_load.v64f16.p3f16({} addrspace(3)*)
define void @kernel() {
%opaque_threadgroup_memory = bitcast [128 x i8] addrspace(3)* @threadgroup_memory to {} addrspace(3)*
%0 = call <64 x half> @air.simdgroup_matrix_8x8_load.v64f16.p3f16({} addrspace(3)* %opaque_threadgroup_memory)
ret void
}
declare {} addrspace(3)* @llvm.opaque_ptr.p3({} addrspace(3)*) ... which is expected, but breaks the Metal back-end as it expects correctly-typed inputs to the intrinsic. The way I'm going to fix this, is by having Metal.jl provide IR with type hints: @threadgroup_memory = internal addrspace(3) global [128 x i8] undef
declare <64 x half> @air.simdgroup_matrix_8x8_load.v64f16.p3f16(ptr addrspace(3))
define void @kernel() {
%typed_threadgroup_memory = call ptr addrspace(3) @llvm.opaque_ptr.p3(ptr addrspace(3) @threadgroup_memory), !ptr_type !0
%1 = call <64 x half> @air.simdgroup_matrix_8x8_load.v64f16.p3f16(ptr addrspace(3) %typed_threadgroup_memory)
ret void
}
declare ptr addrspace(3) @llvm.opaque_ptr.p3(ptr addrspace(3) %0)
!0 = !{[128 x i8] zeroinitializer} I think there's no real way around the front-end providing these hints, because there may be no reasonable way to infer the pointer element type (poking another hole in the DXIL-based approach). |
Changed the front-end provided metadata format: declare !arg_eltypes !2 <64 x half> @air.simdgroup_matrix_8x8_load.v64f32.p3f32(ptr addrspace(3))
define void @typed_intrinsic_call() {
%1 = call <64 x half> @air.simdgroup_matrix_8x8_load.v64f32.p3f32(ptr addrspace(3) @threadgroup_memory)
ret void
}
!2 = !{i32 0, [128 x i8] zeroinitializer} This is easier to add by the front-end, because we can keep the existing With this, all Metal.jl tests pass! |
b0a0638
to
116d531
Compare
116d531
to
1f6be47
Compare
Alright, this seems to work fine. Let's merge and apply it to the LLVM 7.0 writer and to the LLVM 17 branch. |
Although not required on LLVM 16 or below, where we can just use
--opaque-pointers=0
, on LLVM 17 this mode has been removed so we need to support targeting LLVM 5/7 in opaque pointer mode. Needless to say, this is tricky. Not only are opaque pointers unsupported on LLVM 5/7, the fact that opaque pointers are indistinguishable makes it hard to emit the requires type information where needed.In this PR, I add support for LLVM 16 to target LLVM 5 in opaque pointer mode (all to be generalized to other LLVM versions once the concept has been proven). The core idea is that, before writing out the IR, we pre-process the module to add no-op calls to pointer conversion routines (no-op in that they take a
ptr
and return one), adding metadata with pointer element type information as identified by looking at the instruction in detail. Then, during serialization, we detect calls to these intrinsics and lower them to actual bitcasts from a pseudo-opaque pointer type{}*
to an actual typed pointer (repurposing theTypedPointerType
that was added for DXIL), and vice versa. In addition, we spoof the emitted bitcode at various places to ensure the expected type information is present.For example, starting with:
This is pre-processed to:
And is then written out as:
It seems like this approach may just work, as I'm already passing a fair number of nontrivial tests of Metal.jl.