[TOPI][SPIRV] Cast to float32 not float64 before log2 in sort/scan #7669

masahi · 2021-03-16T06:36:48Z

This fixes TIR scan + dynamic input shape on VK / SPIRV. I debugged this problem by doing scan on 2 elements, so that up/downsweep run only one iteration.

I found that when scan_axis_size is dynamic whose runtime value is 2, the value of lim is 0 instead of expected 1. Surprisingly this issue was fixed by casting scan_axis_size to float32 instead of 64. I realized that generally GPUs (especially low ends) don't have great support for fp64, so I think this is better.

Now dynamic cumsum, argwhere etc are working with VK.

@mbrookhart

mbrookhart

I'm kind of stunned by this. I'm not sure how it's possible that ceil(log2(x)) == 0 for x > 1 in any datatype. That feels like a kind of fundamental issue with the intrinsic...

I worry ever so slightly about rounding issues taking a large input size that's just over a power of 2 and casting it to a large number just under a power of 2 in float32 (due to the lack of precision), and then this will return the wrong number.

I wrote a little script to test this:

import numpy as np
for i in range(30):
    n = np.array(2**i + 1).astype("int64")
    f = n.astype("float32")
    n2 = f.astype("int64")
    print(i, n, n2)
    assert n2 >= n

and it asserts at n = 2**24 + 1 = 16,777,217

That's larger than anything we can currently fit in GPU memory, so I don't think it's an issue at the moment, but it's a little uncomfortably close for my tastes.

Maybe we should add a warning/assert if the input size is too big?

tqchen · 2021-03-16T17:33:11Z

Perhaps we should think about other alternatives for such an intrinsics.

see

https://llvm.org/docs/LangRef.html#llvm-ctlz-intrinsic
__builtin_clz in c++ code
https://stackoverflow.com/questions/39046194/is-there-a-way-to-use-clz-in-a-vulkan-compute-shader

masahi · 2021-03-16T19:28:06Z

Ok updated to cast to float32 only in the problematic case, which is VK + dynamic input on TIR scan. I think this is an acceptable solution for now. Of course, the best solution is to implement TIR level CSE, since the host is doing the same compute anyway and there is no point computing log2 etc in device.

Interestingly, TIR mergepath kernel used in sort, which is also littered with glsl log2 and ceil, doesn't cast to float64 before log2 in the GPU IR. If you see the IR dump https://gist.github.com/masahi/c0979c61907af15f9924b3b3d72fe6a7, there is no float64 anywhere. But for TIR scan downsweep kernel, there is a cast to float64. So I removed cast to float32 in TIR sort.

It could also be the case that our SPIRV codegen for int64 to float64 cast is busted, but I haven't checked. Another weird thing is that glsl log2 on fp64 works correctly if the input size is static.

mbrookhart

LGTM

masahi · 2021-03-16T20:07:20Z

Made it a draft while I am reading about clz bit hacks

masahi · 2021-03-16T20:32:52Z

@mbrookhart @tqchen The SPIRV spec says their log2 intrinsics only support 16 or 32 bit floating point https://www.khronos.org/registry/spir-v/specs/1.0/GLSL.std.450.html

The operand x must be a scalar or vector whose component type is 16-bit or 32-bit floating-point.

masahi · 2021-03-16T21:15:04Z

Looks like one reasonable way to implement ceil(log2(x) is 32 - clz(x) + (x & (x-1) ? 1 : 0) for 32 bit integers. We need to be careful with 32 bit vs 64 bit and signed vs unsigned.

We need to add intrinsic lowering of tvm.tir.clz for llvm and spirv. I'll do that next week.

Co-authored-by: Masahiro Masuda <masahi129@gmail.com>

masahi · 2021-04-16T13:32:35Z

@mbrookhart I'm finally back with this, we can now do integer ceil(log2(x)) without cast to float for vulkan.

mbrookhart

Thanks, Masa! This looks great. I'll merge when it passes CI

masahi · 2021-04-17T03:03:43Z

Thanks @mbrookhart @tqchen

…pache#7669) * [TOPI] Cast to float32 before log2 in sort/scan * revert sort change since this seems unnecessary * only does cast to float32 on vk + dynamic input case * check against IntImm instead of Var * revert change * use clz for ceil_log2 when compiling for vk * add doc on ceil_log2 * fix pylint Co-authored-by: Masahiro Masuda <masahi@129@gmail.com>

mbrookhart reviewed Mar 16, 2021

View reviewed changes

mbrookhart approved these changes Mar 16, 2021

View reviewed changes

masahi marked this pull request as draft March 16, 2021 20:06

tmoreau89 added a commit to tmoreau89/tvm that referenced this pull request Mar 16, 2021

adding masa's fix from apache#7669

44d6e50

Co-authored-by: Masahiro Masuda <masahi129@gmail.com>

masahi force-pushed the spirv-scan-dyn branch from 8345b26 to 0bfc1cc Compare April 3, 2021 08:44

masahi mentioned this pull request Apr 12, 2021

[TIR] Add a new intrinsic count leading zeros for LLVM and SPIR-V #7825

Merged

masahi and others added 6 commits April 16, 2021 22:12

[TOPI] Cast to float32 before log2 in sort/scan

e5d196a

revert sort change since this seems unnecessary

c701595

only does cast to float32 on vk + dynamic input case

1bdd892

check against IntImm instead of Var

10a4078

revert change

aabc763

use clz for ceil_log2 when compiling for vk

a70dd1d

masahi force-pushed the spirv-scan-dyn branch from 0bfc1cc to a70dd1d Compare April 16, 2021 13:15

add doc on ceil_log2

cc2c3f9

masahi marked this pull request as ready for review April 16, 2021 13:31

mbrookhart approved these changes Apr 16, 2021

View reviewed changes

fix pylint

53b78df

masahi merged commit e082ef5 into apache:main Apr 17, 2021

junrushao mentioned this pull request Nov 1, 2021

Apache TVM v0.8 Release Note Candidate #9416

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TOPI][SPIRV] Cast to float32 not float64 before log2 in sort/scan #7669

[TOPI][SPIRV] Cast to float32 not float64 before log2 in sort/scan #7669

masahi commented Mar 16, 2021 •

edited

Loading

mbrookhart left a comment

tqchen commented Mar 16, 2021

masahi commented Mar 16, 2021 •

edited

Loading

mbrookhart left a comment

masahi commented Mar 16, 2021

masahi commented Mar 16, 2021

masahi commented Mar 16, 2021 •

edited

Loading

masahi commented Apr 16, 2021

mbrookhart left a comment

masahi commented Apr 17, 2021

[TOPI][SPIRV] Cast to float32 not float64 before log2 in sort/scan #7669

[TOPI][SPIRV] Cast to float32 not float64 before log2 in sort/scan #7669

Conversation

masahi commented Mar 16, 2021 • edited Loading

mbrookhart left a comment

Choose a reason for hiding this comment

tqchen commented Mar 16, 2021

masahi commented Mar 16, 2021 • edited Loading

mbrookhart left a comment

Choose a reason for hiding this comment

masahi commented Mar 16, 2021

masahi commented Mar 16, 2021

masahi commented Mar 16, 2021 • edited Loading

masahi commented Apr 16, 2021

mbrookhart left a comment

Choose a reason for hiding this comment

masahi commented Apr 17, 2021

masahi commented Mar 16, 2021 •

edited

Loading

masahi commented Mar 16, 2021 •

edited

Loading

masahi commented Mar 16, 2021 •

edited

Loading