[TOPI][Vulkan, Metal] Avoid passing int64 scalar arg to VK/Metal runtime #7457

masahi · 2021-02-14T19:57:44Z

I hit the error below when running TIR sort/scan on Vulkan backend:

Line 186 in 1831c17

LOG(FATAL) << "Do not support 64bit argument to device function";

This is because unlike most other kernels, TIR sort/scan needs to pass an integer scalar from host to GPU, to realize multipass kernel launches:

tvm/python/tvm/topi/cuda/sort.py

Lines 203 to 206 in 1e0d356

    
           with ib.for_range(0, lim, dtype="int64") as l2_width: 
        
               width = 2 << l2_width 
        
               # Define and launch the cuda kernel 
        
               with ib.new_scope():

Currently, width argument, which is int64 scalar, is passed to GPU backend runtime. But VK/Metal runtime use the calling convention that is different from the one used in CUDA/OpenCL (search for PackFuncNonBufferArg) and VK/Metal runtime don't support passing 64 bit scalar, see:

tvm/src/runtime/pack_args.h

Lines 41 to 49 in 1831c17

    
           /*! 
        
            * \brief argument union type of 32bit. 
        
            * Choose 32 bit because most GPU API do not work well with 64 bit. 
        
            */ 
        
           union ArgUnion { 
        
             int32_t v_int32; 
        
             uint32_t v_uint32; 
        
             float v_float32; 
        
           };

tvm/src/runtime/vulkan/vulkan.cc

Line 1047 in 1831c17

    
           void VulkanWrappedFunc::operator()(TVMArgs args, TVMRetValue* rv, const ArgUnion* pack_args) const {

The fix to this problem is simply to pass int32 scalar instead, and does cast to int64 inside GPU kernel. This enabled TIR scan tests to pass on Vulkan. It also fixed the runtime error that happened while running TIR sort, but the sort result is still not correct on Vulkan. I suspect there is an issue in our SPIR-V codegen.

tqchen · 2021-02-14T22:19:39Z

Thanks masa, perhaps it is a good time to revisit whether Vulkan metal could work with i64.

masahi · 2021-02-16T11:35:46Z

Does it make sense to use vkCmdPushConstants twice, one for 32 bit and another for 64 bit? If that is not possible, I think we need this patch for a workaround.

masahi · 2021-02-16T11:39:51Z

Maybe a more straightforward approach is to send a scalar as a buffer of size 1, just like CUDA/OpenCL backend does.

tqchen · 2021-02-16T19:33:15Z

I take another look and it seems that both vulkan and metal(after metal 2.2) now support i64.

Perhaps we could update the ArgUnion solution of these APIs to allow pass 64 bit integer. For example

 union ArgUnion64 { 
   int32_t v_int32; 
   uint32_t v_uint32; 
   float v_float32;
   int64_t v_int64;
}'

On the device end, always create array of size two int32_t v_int32[2], and read the value from the first element. This would require all arguments to be aligned to 64 bit, given not a lot of arguments are being passed in this way, we should be good.

@masahi do you mind to make that change and test out instead?

masahi · 2021-02-16T21:40:00Z

ok I'll try that

tqchen · 2021-02-24T21:18:43Z

The way device side works is through creating a struct.

say the original function is fn(int32 arg0, int64 arg1)

We generate the following device code:

struct ArgBuffer {
   // always generate two int values to pad things to the same alignment
   int32 arg0[2];
   int64 arg1
};

fn (ArgBuffer* arg_buffer) {
   int32 arg0 = arg_buffer->arg0[0];
   int64 arg1 = arg_buffer->arg1;
}

masahi · 2021-03-02T23:28:00Z

#7572

masahi added 2 commits February 15, 2021 04:12

Avoid passing int64 scalar to vulkan runtime

104cc54

get_valid_count works on vulkan but log2 is done by spirv not host

00e503a

masahi marked this pull request as draft February 14, 2021 20:01

masahi closed this Mar 2, 2021

masahi mentioned this pull request Mar 2, 2021

[Vulkan] Support passing 64 bit scalar #7572

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TOPI][Vulkan, Metal] Avoid passing int64 scalar arg to VK/Metal runtime #7457

[TOPI][Vulkan, Metal] Avoid passing int64 scalar arg to VK/Metal runtime #7457

masahi commented Feb 14, 2021 •

edited

Loading

tqchen commented Feb 14, 2021

masahi commented Feb 16, 2021

masahi commented Feb 16, 2021

tqchen commented Feb 16, 2021

masahi commented Feb 16, 2021

tqchen commented Feb 24, 2021

masahi commented Mar 2, 2021

	with ib.for_range(0, lim, dtype="int64") as l2_width:
	width = 2 << l2_width
	# Define and launch the cuda kernel
	with ib.new_scope():

	/*!
	* \brief argument union type of 32bit.
	* Choose 32 bit because most GPU API do not work well with 64 bit.
	*/
	union ArgUnion {
	int32_t v_int32;
	uint32_t v_uint32;
	float v_float32;
	};

[TOPI][Vulkan, Metal] Avoid passing int64 scalar arg to VK/Metal runtime #7457

[TOPI][Vulkan, Metal] Avoid passing int64 scalar arg to VK/Metal runtime #7457

Conversation

masahi commented Feb 14, 2021 • edited Loading

tqchen commented Feb 14, 2021

masahi commented Feb 16, 2021

masahi commented Feb 16, 2021

tqchen commented Feb 16, 2021

masahi commented Feb 16, 2021

tqchen commented Feb 24, 2021

masahi commented Mar 2, 2021

masahi commented Feb 14, 2021 •

edited

Loading