Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TOPI][Vulkan, Metal] Avoid passing int64 scalar arg to VK/Metal runtime #7457

Closed
wants to merge 2 commits into from

Conversation

masahi
Copy link
Member

@masahi masahi commented Feb 14, 2021

I hit the error below when running TIR sort/scan on Vulkan backend:

LOG(FATAL) << "Do not support 64bit argument to device function";

This is because unlike most other kernels, TIR sort/scan needs to pass an integer scalar from host to GPU, to realize multipass kernel launches:

with ib.for_range(0, lim, dtype="int64") as l2_width:
width = 2 << l2_width
# Define and launch the cuda kernel
with ib.new_scope():

Currently, width argument, which is int64 scalar, is passed to GPU backend runtime. But VK/Metal runtime use the calling convention that is different from the one used in CUDA/OpenCL (search for PackFuncNonBufferArg) and VK/Metal runtime don't support passing 64 bit scalar, see:

/*!
* \brief argument union type of 32bit.
* Choose 32 bit because most GPU API do not work well with 64 bit.
*/
union ArgUnion {
int32_t v_int32;
uint32_t v_uint32;
float v_float32;
};

void VulkanWrappedFunc::operator()(TVMArgs args, TVMRetValue* rv, const ArgUnion* pack_args) const {

The fix to this problem is simply to pass int32 scalar instead, and does cast to int64 inside GPU kernel. This enabled TIR scan tests to pass on Vulkan. It also fixed the runtime error that happened while running TIR sort, but the sort result is still not correct on Vulkan. I suspect there is an issue in our SPIR-V codegen.

@masahi masahi marked this pull request as draft February 14, 2021 20:01
@tqchen
Copy link
Member

tqchen commented Feb 14, 2021

Thanks masa, perhaps it is a good time to revisit whether Vulkan metal could work with i64.

@masahi
Copy link
Member Author

masahi commented Feb 16, 2021

Does it make sense to use vkCmdPushConstants twice, one for 32 bit and another for 64 bit? If that is not possible, I think we need this patch for a workaround.

@masahi
Copy link
Member Author

masahi commented Feb 16, 2021

Maybe a more straightforward approach is to send a scalar as a buffer of size 1, just like CUDA/OpenCL backend does.

@tqchen
Copy link
Member

tqchen commented Feb 16, 2021

I take another look and it seems that both vulkan and metal(after metal 2.2) now support i64.

Perhaps we could update the ArgUnion solution of these APIs to allow pass 64 bit integer. For example

 union ArgUnion64 { 
   int32_t v_int32; 
   uint32_t v_uint32; 
   float v_float32;
   int64_t v_int64;
}'

On the device end, always create array of size two int32_t v_int32[2], and read the value from the first element. This would require all arguments to be aligned to 64 bit, given not a lot of arguments are being passed in this way, we should be good.

@masahi do you mind to make that change and test out instead?

@masahi
Copy link
Member Author

masahi commented Feb 16, 2021

ok I'll try that

@tqchen
Copy link
Member

tqchen commented Feb 24, 2021

The way device side works is through creating a struct.

say the original function is fn(int32 arg0, int64 arg1)

We generate the following device code:

struct ArgBuffer {
   // always generate two int values to pad things to the same alignment
   int32 arg0[2];
   int64 arg1
};

fn (ArgBuffer* arg_buffer) {
   int32 arg0 = arg_buffer->arg0[0];
   int64 arg1 = arg_buffer->arg1;
}

@masahi
Copy link
Member Author

masahi commented Mar 2, 2021

#7572

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants