Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Float16 Images? #1140

Open
krOoze opened this issue Dec 17, 2019 · 12 comments
Open

Support for Float16 Images? #1140

krOoze opened this issue Dec 17, 2019 · 12 comments

Comments

@krOoze
Copy link
Contributor

krOoze commented Dec 17, 2019

There seems to be lacking support for Float16 (same for the other extra types, really) Samplers, and Storage Images. Storage Images feel like they should be supported without reservations.

The VK_KHR_16bit_storage extension does not support this.

GL_AMD_gpu_shader_half_float_fetch can do this, but there seems to be missing Vulkan extension accepting SPV_AMD_gpu_shader_half_float_fetch shader.

Migrated from https://community.khronos.org/t/why-vulkan-spir-v-opimageread-opimagefetch-opimagewrite-always-use-v4float/105009

@NicolBolas
Copy link

NicolBolas commented Dec 27, 2019

I'm not sure how useful it would be to have 16-bit float return values from such samplers. There could be some use case for storage images (particularly writing values), but I don't understand why samplers, complete with filtering, would need this.

The thread suggested register pressure as a reason to want 16-bit return values directly. But it seems to me that a platform where both register pressure from 32-bit floats is an issue and where the hardware can actually fetch 16-bit filtered values from a texture is probably smart enough to see the conversion to 16-bit values immediately after the fetch operation in SPIR-V and simply perform the more efficient operation.

That is, if there is some performance benefit, I would think that f16vec4(texelFetch(A, ivec2(0, 0), 0)) can be made just as effective at achieving that benefit as adding a specialized sampler type.

@krOoze
Copy link
Contributor Author

krOoze commented Dec 30, 2019

There could be some use case for storage images

Feels like it should just be there.
Unlike Samplers there are no extra moving parts to consider, and it is simply in nature of a Storage Image to just give the appropriate value.

but I don't understand why samplers, complete with filtering, would need this.

Would the case be different if we were speaking about Float64?
And what about texelFetch()?

is probably smart enough to see the conversion to 16-bit values immediately after the fetch operation in SPIR-V and simply perform the more efficient operation.

I am not entirely convinced that is in the spirit of explicitness.

@NicolBolas
Copy link

Would the case be different if we were speaking about Float64?

Yes. Float64 implies the presence of an image format that supports 64-bit floating-point values. And the only reason to have 64-bit values in your image is because you want to have the extra precision that such values allow. So having that precision reduced down to 32-bits makes the entire concept pointless.

Having a format encoded into the sampler type directly like this is only useful when the default case can't provide equivalent results.

And what about texelFetch()?

Are you suggesting we create a whole new set of sampler types just for texelFetch to use a specific format? Rather than overloading existing tools, it'd be better to create a whole new concept for fetching data from a ImageView directly as is. A formattedTexelFetch function, one which takes a sampler type and a format type.

I am not entirely convinced that is in the spirit of explicitness.

Neither is the fact that shader writers aren't required to qualify branches as to whether they will introduce divergence. Yet that's how SPIR-V works. This is treated as an implementation detail that compilers have to work out on their own, for their own hardware. I don't see why this would be different.

@krOoze
Copy link
Contributor Author

krOoze commented Dec 30, 2019

Having a format encoded into the sampler type directly like this is only useful when the default case can't provide equivalent results.

There's something to be said about API consistency though.
If it is typed entity in the first place, then allowing some types there and not others seems weird.

Are you suggesting

Never! Just prodding for holes.
The original AMD extension indeed had gone the way of creating the whole range of sampler and storage image types though. It is the obvious (but perhaps lazy) way of fixing this...

create a whole new concept

By all means.

shader writers aren't required to qualify branches as to whether they will introduce divergence.

Um, branches on variables are divergent. Why would anyone need to qualify a naturally divergent thing as divergent.
We do not write shaders in SSA form neither.

Anyway, I accept your point even without dangerous analogies.
I think you are just trying to be practical about it. You can technically do it with a workaround, and if the driver is not completely dumb it should work. But still, it is at least noteworthy quirk of the API type system, and I want to hear the Khronos' stance. I do not insist it needs to be done. Just feels the topic was not explored or forgotten about when the KHR extensions were made, since all this was already part of previous vendor extension.

@Tobski
Copy link
Contributor

Tobski commented Jan 6, 2020

FWIW we've (AMD) adopted roughly the approach @NicolBolas outlined above, which is why we've not pushed for it to go into core Vulkan.

However we do allow RelaxedPrecision decorations on samplers which some vendors probably do use, so maybe there's an orthogonality issue here? There are probably some weird corner cases where being explicit could be useful as well. The WG hasn't actively considered this request before but we'll take a look at it and let you know what happens.

Thanks!

@alecazam
Copy link

There's currently no way I've found to get transpiled texture2d from GLSL without this extension implemented. MSL supports f16sampler2d and ftexture2d, but if it can't be expressed, then repeated casts are needed, and it's unclear that the optimizer simplfies this casting logic. I don't see this extension on any platform, or even on AMD which defined the extension on gpuinfo.org.

The point is not for the sampler to do work in f16, but for the type and end result to be correct. Here's what I get now in the MSL. mediump doesn't work in transpiled code, with all expressions using full float. I'm switching to float16_t, but that has all sorts of missing storage extensions for inputOuput16 interpolation on Adreno and Nvidia.

texture2d<float> u_lightPRTexture [[texture(4)]]
half4 colSharp = half4(u_lightCSTexture.read(uint2(int2(lightIndex, tileIndex)), 0));

@darksylinc
Copy link

darksylinc commented Jan 19, 2022

I'm here because I was appalled to see there is no f16texture2D on Vulkan, but there is on OpenGL.

Metal has it. It's expressed with texture<half>

That is, if there is some performance benefit, I would think that f16vec4(texelFetch(A, ivec2(0, 0), 0)) can be made just as effective at achieving that benefit as adding a specialized sampler type.

Because developer sanity is an issue. I'm porting 32-bit code to work in 16-bit and I prefer to do the following:

uniform f16texture2D myTexture;

half4 a = texture( myTexture, uv0 );
half4 b = texture( myTexture, uv1 );
half3 c = texture( myTexture, uv2 ).xyz;
half3 d = texture( myTexture, uv3 ).xyz;

x = a + b;
y = c + d;

Over the following:

uniform texture2D myTexture;

half4 a = half4( texture( myTexture, uv0 ) );
half4 b = half4( texture( myTexture, uv1 ) );
half3 c = half3( texture( myTexture, uv2 ).xyz;
half3 d = half3( texture( myTexture, uv3 ).xyz );

x = a + b;
y = c + d;

Readability suffers. Porting efforts are larger. It's also error prone e.g. typing half3 d = half4( texture( myTexture, uv ).xyz ); will complain because I accidentally copy-pasted half4 instead of 3.

The first bit of code works if I compile with GL_AMD_gpu_shader_half_float_fetch in glslang but then the validation layer will complain and the driver will crash:

ERROR: [Validation] Code 0 : Validation Error: [ UNASSIGNED-CoreValidation-Shader-InconsistentSpirv ] Object 0: handle = 0x1f53a20, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x6bbb14 | SPIR-V module not valid: Expected Sampled Type to be a 32-bit int or float scalar type for Vulkan environment
  %13 = OpTypeImage %half 2D 0 1 0 1 Unknown

Are you suggesting we create a whole new set of sampler types just for texelFetch to use a specific format? Rather than overloading existing tools, it'd be better to create a whole new concept for fetching data from a ImageView directly as is. A formattedTexelFetch function, one which takes a sampler type and a format type.

I disagree. We want to support GPUs with and without 16-bit support. And we're extremely likely to use texelFetch on both paths (i.e. share the same line of code). If f16texelFetch appears, it's just going to be an unnecessary PITA,

texelFetch is merely the syntactic equivalent of myBuffer[idx]; which works regardless of myBuffer's data type.

@solidpixel
Copy link

solidpixel commented Jan 20, 2022

In terms of current behavior we (Arm) will use fp16 either if the sampler is tagged as RelaxedPrecision, or if the compiler sees the result of the texture() call is immediately cast down to RelaxedPrecision storage. I.e. both of these will work as expected:

layout(set = 0, binding = 0) uniform mediump sampler2D samp;
f16vec4 tmp = texture(samp, coords) ...

... or ...

layout(set = 0, binding = 0) uniform highp sampler2D samp;
f16vec4 tmp = f16vec4(texture(samp, coords)) ...

@Tobski
Copy link
Contributor

Tobski commented Jan 20, 2022

@darksylinc if GLSL is the issue then we could just get GLSLang to generate the right code under the covers? Vendors all do the half float optimisation where appropriate already, and there's nothing stopping us just making the high level language simpler here. If we do this as a Vulkan/SPIR-V change then it'd rely on driver updates which aren't always possible (e.g. on some mobile devices). If the high level code is the problem, perhaps we should fix the high level language. If that's sufficient for you, then that's a much more solvable problem?

@darksylinc
Copy link

If that's sufficient for you, then that's a much more solvable problem?

Yes. Each problem in each domain: if spirv wants stay its current way for technical reasons (e.g. easier to support, maps better to hw capabilities) then it should stay tht way

But as for GLSL, the current status in insanity in terms of readability and maintenance and could use quality of life improvements

@alecazam
Copy link

alecazam commented Jan 20, 2022

Again, relaxed precision designators have no bearing on transpiled codegen. This needs to be formalized as part of the glsl syntax which it already is with the AMD code extension, or spirv-cross needs to gen the correct use of texture. Also casting to mediump isn’t possible with macros to half since it’s not a type.

@darksylinc
Copy link

darksylinc commented Jan 20, 2022

Also casting to mediump isn’t possible with macros to half since it’s not a type.

Ditto on this.

The following GLSL code won't work:

#define half2 mediump vec2

half2 a = half2( 0, 0 ); // error (undesired)
half2 a = vec2( 0, 0 ); // ok (undesired)

But it does with explicit float16_t

#define half2 f16vec2

half2 a = half2( 0, 0 ); // ok
half2 a = vec2( 0, 0 ); // cast error (desired)

Again, relaxed precision designators have no bearing on transpiled codegen

Ah yeah, although that's outside of what I need, he makes a good point: the information is lost in SPIRV, thus it cannot be properly translated when transpiling to Metal. Valve talks about this in https://www.lunarg.com/wp-content/uploads/2019/09/Automatic-RelaxedPrecision-Decoration-and-Conversion-in-Spirv-Opt_r1.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants