[vulkan phase2] Vulkan Runtime (#6924)

* Import Vulkan runtime changes from personal branch * Fix build to work with latest changes in main * Hookup Vulkan into Target, DeviceInterface and OffloadGPULoops * Add Vulkan runtime to Makefile * Add Vulkan target to Python bindings * Add runtime linker support to target Vulkan CodeGen * Add Vulkan windows decorator to runtime targets * Wrap debug messages for internal runtime classes with DEBUG_INTERNAL Error on failed string termination * Silence clang-tidy warnings for redundant expressions on Vulkan enum values * Clang tidy & format pass * Fix formatting for single line statements * Move Vulkan option to top-level CMakeLists.txt and enable SPIR-V as needed * Fix Vulkan & SPIRV dependencies for makefile * Add Halide version info to Makefile Add HALIDE_VERSION compiler definitions to compilation * Add HL_VERSION_FLAGS to RUNTIME_CXX_FLAGS * Finish refactoring of Vulkan CodeGen to use SpirV-IR. Added splitmix64 based hashing scheme for types and constants. Numerous fixes to instruction packing. Added debug symbols to all variables. * Clang tidy/format pass. * Fix formatting * Remove leftover ifdef * Fix build error for clang OSX for mismatched type comparison * Refactor loops and conditionals to use blocks * Clang tidy/format pass * Add detailed comments for acquire context parameters * Add comments describing loader method exports and dynamically resolved function pointers Other minor cleanups * Change aborts to debug asserts for context parameters. Add error handling to acquire context. * Cache Vulkan descriptor sets and other shader module objects in compilation cache for reuse * Replace platform specific strncpy for grabbing Extension strings with StringUtils::copy_upto * Enable device features for selected device * Fix alignment constraints for to match Vulkan buffer memory requirements. Add env vars to control Vulkan Memory Allocator config. * Add Vulkan to list of supported APIs in README.md Add Vulkan specific README_vulkan.md * Clang tidy/format pass * Fix conform_alignment to handle zero values * Fix declaration of custom_allocation_callbacks to be static. Change to constexpr for invalid values * Whitespace change to trigger build. * Handle Vulkan kernels that don't require storage buffers. Updated test status. Fixes 7 test cases. * Add src/mini_vulkan.h Apache 2.0 license requirements to License file * Add descriptor set binding info as pre-amble to SPIR-V code module Fix shared memory allocation to use global variables in workgroup storage space Add extern calls for spirv and glsl builtins Add memory fence call to gpu thread barrier Add missing visitors to Vulkan CodeGen Add scalar index & vector index methods for load/store * Clang tidy & format pass * Update test results for Vulkan docs. Passing: 326 Failing: 39 * Fix formatting * Remove extraneous parentheses for is_array_type() * Add Vulkan library to linkage fo Halide generator helpers * Add SPIR-V formatted output (for debugging) * Only declare SIMT intrinics that are actually used. Cleanup & refactor add_kernel method. * Add Vulkan handler to test targets * Clang format/tidy pass * Add doc-strings to SPIR-V interface * Adjust runtime array to widest vector width based on alignment and dense vector loads/stores Fix scalar and vector load/stores Fix casts for vectors Add missing nan, inf, neg_inf, is_finite builtins * Add missing bitwise and logical and methods. Cleanups. * Add comments about necessary packages on Ubuntu v22.04 vs earlier versions * Clang tidy & format pass. * Update Vulkan test results. Pass: 329 Fail: 36 * Remove unused Produce/Consume visitor method * Fix Molten VK initialization to work with v1.3+ loader Add support for direct casts for same-size types Add missing mux, mix, lerp, sinh, tanh, etc intrinsics Add explicit storage access for variables Add a macro to enable debug messages in Vulkan Memory Allocator * Disable dynamic shared memory portion of test for Vulkan (since its not supported yet) * Disable uncached portion of test for Vulkan (since it may OOM) * Disable float64 support in Type::supports_type() for Vulkan target since it's not widely supported * Fix Shuffle to handle all known cases Hookup VulkanMemoryAllocator to gpu allocation cache. Fix if_then_else to allow calls and statements to be used Fix loop counter comparison, and don't allow dynamic loops to be unrolled. Fix scalarize to use CompositeInsert instead of VectorInsertDynamic Fix FMod to use FRem (cause SPIR-V's FMod doesn't do what you'd expect ... but FRem does?!) Use exact same sematics for barriers as GLSL Compute ... still not passing everything Fix SPIR-V block termination checks, keys for null constants, and other cleanups * Clang tidy & format pass * Update correctness test results. PASS: 338, FAIL: 27 * Move counter inside debug #define to fix build * Relax tolerance for newton's method to match other GPU APIs Skip gpu dynamic shared testfor Vulkan (since dynamic shared allocations aren't supported yet) Update correctness test status. PASS: 340, FAIL: 25 * Clang format/tidy pass * Skip Vulkan for float64 for correctness test round (since f64 is optional) * Skip Vulkan for tests that rely upon device crop, and slice. * Only test small vector widths for Vulkan (since widths >=8 are optional) * Caninicalize gpu vars for Vulkan * Fix loop initialization, and increments Add all explicit types, and fix constant declarations Add missing fast intrinsics Convert results of logical ops into expected types (instead of bools) * Add SpvInstruction::add_operands(), add_immediates() and template based append() Make integer logical operations explicit. Better handling of constant data. * Clang format & tidy pass * Fix windows build ... refactor convert_to_bool to use std::vectors rather than dynamic fixed sized arrays * Skip asyn_device_copy, device_buffer_copy, device_crop, and device_slice tests for Vulkan (for now). * Don't test large vector widths for Vulkan (since they are optionally supported) * Clear Vulkan buffer allocations prior to use (tbd if this is necessary) * Skip Vulkan for async copy chain test * Skip Vulkan for interpreter test * Clang tidy/format pass * Fix formatting * Fix build ... use error messages for errors * Separate shared memory resources by element type for Vulkan. * Add Vulkan to conditional for fusing gpu loops * Reorder reset method to match declaration ordering. * Cleanup debug log messages for Vulkan resources * Assert alignment is power of two * Only split regions that have already been freed. Add more debug messages to log * Explicitly cleanup Vulkan command buffers as after they are used Avoid recreating descriptor sets Tidy up Vulkan debug messages * Fix Div, Mod, and div_round_to_zero for integer cases Cleanup reset method * Skip Vulkan for async_copy_chain * Skip 64-bit values on Vulkan since they are optionally supported * Skip interleave_rgb for Vulkan (which doesn't support cropping) * Skip interpreter for Vulkan (which doesn't support dynamic allocation of shared mem). * Clang Tidy/Format pass * Handle calls to pow with negative values for Vulkan Add integer and float constant helpers to SPIRV * Only test real numbers for pow with Vulkan * Clang tidy/format pass * Fix logic so a region request of an entire block matches if exactly the same size as an empty block * Create a zero size buffer to check for alignment Return null handles after freeing * Add more verbose debug output for malloc * Fix UConvert logic to avoid narrowing an integer type less than 8 bits Remove optimization path for division which seems to fail worse than DIV Cleanup DIV and MOD operators * Clang format/tidy pass * Fix SConvert & UConvert ops * Add retain semantics to block allocator interface Update test to validate retain/release/reclaim functionality * Implement device_crop, device_slice and release_crop for Vulkan. Re-enable device_crop, device_slice and interleave_rgb tests. * Clang format/tidy pass * Implement device copy for Vulkan. Enable device copy test. * Clang format/tidy pass * Fix signed mod operator and use euclidean identity (just like glsl) * Clang format/tidy pass * Fix to handle Mod on vectors (use vector constant for bitwise and) * Fix pow operator for Vulkan, and re-enable math test to full range. * Add error checking for return types for conditionals Use bool types for ops that require them, and adapt to expected return types * Handle deallocation for existing regions prior to coalescing. Cleanup region allocator logic for availability. Augment block_allocator test to cover allocation reuse. * Clang tidy/format pass * Fix reserved accounting for regions * Add more details to Windows specific Vulkan build config * Update SPIR-V headers to v1.6 * Add support for dynamic shared memory allocations for Vulkan Add dynamic workgroup dispatching to Vulkan Add optional feature flags for Vulkan capabilities Add Vulkan API version flags for target features Enable v1.3 path if requested Re-enable tests for added features Update Vulkan docs with status updates and feature flags * Enable Vulkan asyc_device_copy test. * Disable Vulkan performance test for async gpu (for now). * Disable Vulkan from python AOT tests and tutorials (since it requires linkage against the vulkan loader system library). * Update Vulkan readme with latest status. Everything works! More or less. =) * Clang format pass * Cleanup formatting for Halide version info in Makefile * Fix typos and address review comments for Vulkan readme * Change value casts to match Halide conventions * Fix typos in comments * Add static_assert to rotl to make compilation errors clearer (instead of using enable_if) Fix debug(3) formatting to avoid super long messages Use lookup table for SPIR-V op code names * Fix typos and logic for Vulkan capabilities * Remove leftover debug ifdef * Fix typo in comments * Rename copy_upto(...) method to be copy_up_to(...) * Handle error case for uninitialized buffer allocation (rather than abort) Fix typos in comments * Support any arbitary number of devices and queues for context creation Fix typos in comments * Add get/set alloc_config methods and API hooks for configuring the VulkanMemoryAllocator * Remove leftover debug ifdef * Hookup API methods for get/set alloc_config when initializing the VulkanMemoryAllocator * Remove empty lines in main * Add required capability flags for 8-bit and 16-bit uniform and storage buffer access Handle casts for GLSL ops (spec requires all args to be the same type as the return type) * Add VkPhysicalDevice8BitStorageFeaturesKHR and related constants * Query for 8-bit and 16-bit uniform and storage access support. Enable these as part of the device feature query chain. * Use VK_WHOLE_SIZE for setting buffer (to pass validation ... otherwise size has to be a multiple of alignment) Remove useless debug asserts for static variables Fix debug logging messages for allocations of scalars (which may not have a dim array) * Query for device limits to enforce min alignment constraints for storage and uniform buffers * Fix shutdown sequence to iterate over descriptor sets Avoid bug in validation layer by reordering destruction sequence * Clang format & tidy pass * Fix logic for locating entry point shader binding ... assume exact match for entry point name Cleanup entry point binding variables and clarify usage * Remove accidentally uncommented debug statements * Cleanup debug output for buffer related updates * Fix split and allocate methods in region allocator to fix issues with alignment constraints - discovered a hang if requested size couldn't be fulfilled after adjusting to aligned sizes - cause was incorrect splitting of existing regions Cleanup region allocator iteration, cleanup and shutdown Added maximum_pool_size configuration option to Vulkan Memory Allocator to restrict pool sizes * Added notes about TARGET_VULKAN=ON being the default now Added links to LunarG MoltenVK SDK installer, and brew packages * Fix markdown formatting * Fix error code handling in Vulkan runtime and internal datastructures. Refactor all (well nearly all) return values to use halide error codes. Reduce the usage of abort_if() for recoverable errors. * Fix typo in error message * Fix typo in readme * Skip GPU allocation cache test on MacOSX since MoltenVK only supports 30 buffers to be allocated * Skip widening reduction test on Vulkan for Mac OSX/IOS since MoltenVK fails to translate calls with vector types for builtins like min/max. etc * Skip doubles in vector cast test on Vulkan for Mac OSX/IOS since Molten doesn't support them * Skip gpu_dynamic_shared and gpu_specialize test for Vulkan on Mac OSX/IOS since MoltenVK doesn't support the dynamic shared memory allocation or dynamic grid size. * Clang format / tidy pass * Resolve conflicts for mini_webgpu.h ... revert to main * Use unique intrinsic var names for each kernel Cleanup constant value declarations with template helper methods Add comments on workgroup size usage * Wrap debug output under ifdef DEBUG_RUNTIME_INTERNAL macro guard Add nearest_multiple constraint to block/region allocator * Add vk_clear_device_buffer utility method Add nearest_multiple constrating to vulkan memory allocatori + fixes correctness/multiple_outputs test Add vkCreateBuffer/vkDestroyBuffer debug output i + for gpu_object_lifetime_tracker Cleanup shutdown for shader_module destruction * Add note about nearest_multiple constraint for vulkan memory allocator * Hookup gpu_object_lifetime_tracker with Vulkan debug statements * Skip dynamic shared memory portion of test for Vulkan on iOS/OSX. * Fix stale comment for float type support. Fix incorrect lowering for intrinsic. --------- Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com> Co-authored-by: Steven Johnson <srj@google.com>
halide · Apr 25, 2023 · 4d86539 · 4d86539
1 parent fcddcf8
commit 4d86539
Show file tree

Hide file tree

Showing 69 changed files with 21,799 additions and 1,770 deletions.
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -83,6 +83,10 @@ endif ()
 
 # Enable the SPIR-V target if requested (must declare before processing dependencies)
 option(TARGET_SPIRV "Include SPIR-V target" OFF)
+option(TARGET_VULKAN "Include Vulkan target" ON)
+if (TARGET_VULKAN)
+ set(TARGET_SPIRV ON) # required
+endif()
 
 ##
 # Import dependencies

diff --git a/LICENSE.txt b/LICENSE.txt
@@ -195,6 +195,23 @@ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 FROM,OUT OF OR IN CONNECTION WITH THE MATERIALS OR THE USE OR OTHER DEALINGS
 IN THE MATERIALS.
 
+
+----
+
+src/mini_vulkan.h is Copyright (c) 2014-2017 The Khronos Group Inc.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
 ----
 
 apps/linear_algebra/include/cblas.h is licensed under the BLAS license.

diff --git a/Makefile b/Makefile
@@ -9,6 +9,12 @@
 # For correctness and performance tests this include halide build time and run time. For
 # the tests in test/generator/ this times only the halide build time.
 
+# Halide project version
+HALIDE_VERSION_MAJOR ?= 15
+HALIDE_VERSION_MINOR ?= 0
+HALIDE_VERSION_PATCH ?= 0
+HALIDE_VERSION=$(HALIDE_VERSION_MAJOR).$(HALIDE_VERSION_MINOR).$(HALIDE_VERSION_PATCH)
+
 # Disable built-in makefile rules for all apps to avoid pointless file-system
 # scanning and general weirdness resulting from implicit rules.
 MAKEFLAGS += --no-builtin-rules
@@ -124,6 +130,8 @@ WITH_OPENCL ?= not-empty
 WITH_METAL ?= not-empty
 WITH_OPENGLCOMPUTE ?= not-empty
 WITH_D3D12 ?= not-empty
+WITH_VULKAN ?= not-empty
+WITH_SPIRV ?= not-empty
 WITH_WEBGPU ?= not-empty
 WITH_INTROSPECTION ?= not-empty
 WITH_EXCEPTIONS ?=
@@ -134,6 +142,12 @@ WITH_LLVM_INSIDE_SHARED_LIBHALIDE ?= not-empty
 HL_TARGET ?= host
 HL_JIT_TARGET ?= host
 
+HL_VERSION_FLAGS = \
+ -DHALIDE_VERSION="$(HALIDE_VERSION)" \
+ -DHALIDE_VERSION_MAJOR=$(HALIDE_VERSION_MAJOR) \
+ -DHALIDE_VERSION_MINOR=$(HALIDE_VERSION_MINOR) \
+ -DHALIDE_VERSION_PATCH=$(HALIDE_VERSION_PATCH)
+
 X86_CXX_FLAGS=$(if $(WITH_X86), -DWITH_X86, )
 X86_LLVM_CONFIG_LIB=$(if $(WITH_X86), x86, )
 
@@ -176,6 +190,12 @@ EXCEPTIONS_CXX_FLAGS=$(if $(WITH_EXCEPTIONS), -DHALIDE_WITH_EXCEPTIONS -fexcepti
 HEXAGON_CXX_FLAGS=$(if $(WITH_HEXAGON), -DWITH_HEXAGON, )
 HEXAGON_LLVM_CONFIG_LIB=$(if $(WITH_HEXAGON), hexagon, )
 
+SPIRV_CXX_FLAGS=$(if $(WITH_SPIRV), -DWITH_SPIRV -isystem $(ROOT_DIR)/dependencies/spirv/include, )
+SPIRV_LLVM_CONFIG_LIB=$(if $(WITH_SPIRV), , )
+
+VULKAN_CXX_FLAGS=$(if $(WITH_VULKAN), -DWITH_VULKAN, )
+VULKAN_LLVM_CONFIG_LIB=$(if $(WITH_VULKAN), , )
+
 WEBASSEMBLY_CXX_FLAGS=$(if $(WITH_WEBASSEMBLY), -DWITH_WEBASSEMBLY, )
 WEBASSEMBLY_LLVM_CONFIG_LIB=$(if $(WITH_WEBASSEMBLY), webassembly, )
 
@@ -198,7 +218,7 @@ LLVM_CXX_FLAGS_LIBCPP := $(findstring -stdlib=libc++, $(LLVM_CXX_FLAGS))
 endif
 
 CXX_FLAGS = $(CXXFLAGS) $(CXX_WARNING_FLAGS) $(RTTI_CXX_FLAGS) -Woverloaded-virtual $(FPIC) $(OPTIMIZE) -fno-omit-frame-pointer -DCOMPILING_HALIDE
-
+CXX_FLAGS += $(HL_VERSION_FLAGS)
 CXX_FLAGS += $(LLVM_CXX_FLAGS)
 CXX_FLAGS += $(PTX_CXX_FLAGS)
 CXX_FLAGS += $(ARM_CXX_FLAGS)
@@ -215,6 +235,8 @@ CXX_FLAGS += $(INTROSPECTION_CXX_FLAGS)
 CXX_FLAGS += $(EXCEPTIONS_CXX_FLAGS)
 CXX_FLAGS += $(AMDGPU_CXX_FLAGS)
 CXX_FLAGS += $(RISCV_CXX_FLAGS)
+CXX_FLAGS += $(SPIRV_CXX_FLAGS)
+CXX_FLAGS += $(VULKAN_CXX_FLAGS)
 CXX_FLAGS += $(WEBASSEMBLY_CXX_FLAGS)
 
 # This is required on some hosts like powerpc64le-linux-gnu because we may build
@@ -241,6 +263,8 @@ LLVM_STATIC_LIBFILES = \
  $(POWERPC_LLVM_CONFIG_LIB) \
  $(HEXAGON_LLVM_CONFIG_LIB) \
  $(AMDGPU_LLVM_CONFIG_LIB) \
+ $(SPIRV_LLVM_CONFIG_LIB) \
+ $(VULKAN_LLVM_CONFIG_LIB) \
  $(WEBASSEMBLY_LLVM_CONFIG_LIB) \
  $(RISCV_LLVM_CONFIG_LIB)
 
@@ -265,6 +289,7 @@ TEST_LD_FLAGS = -L$(BIN_DIR) -lHalide $(COMMON_LD_FLAGS)
 
 # In the tests, some of our expectations change depending on the llvm version
 TEST_CXX_FLAGS += -DLLVM_VERSION=$(LLVM_VERSION_TIMES_10)
+TEST_CXX_FLAGS += $(HL_VERSION_FLAGS)
 
 # In the tests, default to exporting no symbols that aren't explicitly exported
 TEST_CXX_FLAGS += -fvisibility=hidden -fvisibility-inlines-hidden
@@ -305,13 +330,22 @@ TEST_METAL = 1
 endif
 endif
 
+ifneq ($(WITH_VULKAN), )
+ifneq (,$(findstring vulkan,$(HL_TARGET)))
+TEST_VULKAN = 1
+endif
+endif
+
 ifeq ($(UNAME), Linux)
 ifneq ($(TEST_CUDA), )
 CUDA_LD_FLAGS ?= -L/usr/lib/nvidia-current -lcuda
 endif
 ifneq ($(TEST_OPENCL), )
 OPENCL_LD_FLAGS ?= -lOpenCL
 endif
+ifneq ($(TEST_VULKAN), )
+VULKAN_LD_FLAGS ?= -lvulkan
+endif
 OPENGL_LD_FLAGS ?= -lGL
 HOST_OS=linux
 endif
@@ -324,6 +358,10 @@ endif
 ifneq ($(TEST_OPENCL), )
 OPENCL_LD_FLAGS ?= -framework OpenCL
 endif
+ifneq ($(TEST_VULKAN), )
+# The Vulkan loader is distributed as a dylib on OSX (not a framework)
+VULKAN_LD_FLAGS ?= -lvulkan
+endif
 ifneq ($(TEST_METAL), )
 METAL_LD_FLAGS ?= -framework Metal -framework Foundation
 endif
@@ -335,6 +373,10 @@ ifneq ($(TEST_OPENCL), )
 TEST_CXX_FLAGS += -DTEST_OPENCL
 endif
 
+ifneq ($(TEST_VULKAN), )
+TEST_CXX_FLAGS += -DTEST_VULKAN
+endif
+
 ifneq ($(TEST_METAL), )
 # Using Metal APIs requires writing Objective-C++ (or Swift). Add ObjC++
 # to allow tests to create and destroy Metal contexts, etc. This requires
@@ -433,6 +475,7 @@ SOURCE_FILES = \
  CodeGen_LLVM.cpp \
  CodeGen_Metal_Dev.cpp \
  CodeGen_OpenCL_Dev.cpp \
+ CodeGen_Vulkan_Dev.cpp \
  CodeGen_OpenGLCompute_Dev.cpp \
  CodeGen_Posix.cpp \
  CodeGen_PowerPC.cpp \
@@ -623,6 +666,7 @@ HEADER_FILES = \
  CodeGen_LLVM.h \
  CodeGen_Metal_Dev.h \
  CodeGen_OpenCL_Dev.h \
+ CodeGen_Vulkan_Dev.h \
  CodeGen_OpenGLCompute_Dev.h \
  CodeGen_Posix.h \
  CodeGen_PTX_Dev.h \
@@ -853,8 +897,10 @@ RUNTIME_CPP_COMPONENTS = \
  windows_profiler \
  windows_threads \
  windows_threads_tsan \
+ windows_vulkan \
  windows_yield \
  write_debug_image \
+ vulkan \
  x86_cpu_features \
 
 RUNTIME_LL_COMPONENTS = \
@@ -883,6 +929,7 @@ RUNTIME_EXPORTED_INCLUDES = $(INCLUDE_DIR)/HalideRuntime.h \
  $(INCLUDE_DIR)/HalideRuntimeOpenGLCompute.h \
  $(INCLUDE_DIR)/HalideRuntimeMetal.h \
  $(INCLUDE_DIR)/HalideRuntimeQurt.h \
+ $(INCLUDE_DIR)/HalideRuntimeVulkan.h \
  $(INCLUDE_DIR)/HalideRuntimeWebGPU.h \
  $(INCLUDE_DIR)/HalideBuffer.h \
  $(INCLUDE_DIR)/HalidePyTorchHelpers.h \
@@ -1049,6 +1096,7 @@ RUNTIME_CXX_FLAGS = \
  -Wno-unused-function \
  -Wvla \
  -Wsign-compare
+RUNTIME_CXX_FLAGS += $(HL_VERSION_FLAGS)
 
 $(BUILD_DIR)/initmod.windows_%_x86_32.ll: $(SRC_DIR)/runtime/windows_%_x86.cpp $(BUILD_DIR)/clang_ok
  @mkdir -p $(@D)

diff --git a/README.md b/README.md
@@ -7,7 +7,7 @@ currently targets:
 - CPU architectures: X86, ARM, Hexagon, PowerPC, RISC-V
 - Operating systems: Linux, Windows, macOS, Android, iOS, Qualcomm QuRT
 - GPU Compute APIs: CUDA, OpenCL, OpenGL Compute Shaders, Apple Metal, Microsoft
- Direct X 12
+ Direct X 12, Vulkan
 
 Rather than being a standalone programming language, Halide is embedded in C++.
 This means you write C++ code that builds an in-memory representation of a