-
Notifications
You must be signed in to change notification settings - Fork 937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[meta] f16 support #4384
Comments
enable f16;
f16 extension support
Thanks for filing! There is some more info here: gpuweb/gpuweb#2512 I suspect this might not be too hard to implement in the Metal and Vulkan back-ends; however for D3D it will require us to generate DXIL (i.e. requires a new back-end; see #4302). |
Actually according to this table SM6.2 and |
This one is interesting to prioritize, because it's in the v1 specs, but we don't need to support it in order to be compliant and usable for our v1 release. |
Would just like to add some weight to the prioritization of this: I am currently trying to do some machine learning and vector search stuff with WGSL, and this is currently a huge blocker to almost everything I'm trying to do. This is basically the only significant thing keeping the Another big reason for having this is that even though Rust has the Sorry if this sounds a bit pushy, that's not how I want to come across, just want to emphasize that this really is an important feature, and that this is currently a total blocker for several important applications, especially in ML. |
I concur. While the current "unpack2x16float" function is currently usable, I have encountered some difficulties in using it efficiently in some use cases. This forces me to use less optimal approaches, resulting in slower speed and higher memory usage. I understand that you may have multiple priorities to balance, but if possible, I would greatly appreciate your consideration in prioritizing the f16 |
Notes from digging into this today:
Would love to get more involved in shipping this feature. |
After we are done implementing support for const-expressions, we should be able to propagate abstract types when we do the evaluation of const-epressions. |
![image](https://github.com/bevyengine/bevy/assets/47158642/dbb62645-f639-4f2b-b84b-26fd915c186d) # Objective - Add Screen space ambient occlusion (SSAO). SSAO approximates small-scale, local occlusion of _indirect_ diffuse light between objects. SSAO does not apply to direct lighting, such as point or directional lights. - This darkens creases, e.g. on staircases, and gives nice contact shadows where objects meet, giving entities a more "grounded" feel. - Closes #3632. ## Solution - Implement the GTAO algorithm. - https://www.activision.com/cdn/research/Practical_Real_Time_Strategies_for_Accurate_Indirect_Occlusion_NEW%20VERSION_COLOR.pdf - https://blog.selfshadow.com/publications/s2016-shading-course/activision/s2016_pbs_activision_occlusion.pdf - Source code heavily based on [Intel's XeGTAO](https://github.com/GameTechDev/XeGTAO/blob/0d177ce06bfa642f64d8af4de1197ad1bcb862d4/Source/Rendering/Shaders/XeGTAO.hlsli). - Add an SSAO bevy example. ## Algorithm Overview * Run a depth and normal prepass * Create downscaled mips of the depth texture (preprocess_depths pass) * GTAO pass - for each pixel, take several random samples from the depth+normal buffers, reconstruct world position, raytrace in screen space to estimate occlusion. Rather then doing completely random samples on a hemisphere, you choose random _slices_ of the hemisphere, and then can analytically compute the full occlusion of that slice. Also compute edges based on depth differences here. * Spatial denoise pass - bilateral blur, using edge detection to not blur over edges. This is the final SSAO result. * Main pass - if SSAO exists, sample the SSAO texture, and set occlusion to be the minimum of ssao/material occlusion. This then feeds into the rest of the PBR shader as normal. --- ## Future Improvements - Maybe remove the low quality preset for now (too noisy) - WebGPU fallback (see below) - Faster depth->world position (see reverted code) - Bent normals - Try interleaved gradient noise or spatiotemporal blue noise - Replace the spatial denoiser with a combined spatial+temporal denoiser - Render at half resolution and use a bilateral upsample - Better multibounce approximation (https://drive.google.com/file/d/1SyagcEVplIm2KkRD3WQYSO9O0Iyi1hfy/view) ## Far-Future Performance Improvements - F16 math (missing naga-wgsl support https://github.com/gfx-rs/naga/issues/1884) - Faster coordinate space conversion for normals - Faster depth mipchain creation (https://github.com/GPUOpen-Effects/FidelityFX-SPD) (wgpu/naga does not currently support subgroup ops) - Deinterleaved SSAO for better cache efficiency (https://developer.nvidia.com/sites/default/files/akamai/gameworks/samples/DeinterleavedTexturing.pdf) ## Other Interesting Papers - Visibility bitmask (https://link.springer.com/article/10.1007/s00371-022-02703-y, https://cdrinmatane.github.io/posts/cgspotlight-slides/) - Screen space diffuse lighting (https://github.com/Patapom/GodComplex/blob/master/Tests/TestHBIL/2018%20Mayaux%20-%20Horizon-Based%20Indirect%20Lighting%20(HBIL).pdf) ## Platform Support * SSAO currently does not work on DirectX12 due to issues with wgpu and naga: * gfx-rs/wgpu#3798 * gfx-rs/naga#2353 * SSAO currently does not work on WebGPU because r16float is not a valid storage texture format https://gpuweb.github.io/gpuweb/wgsl/#storage-texel-formats. We can fix this with a fallback to r32float. --- ## Changelog - Added ScreenSpaceAmbientOcclusionSettings, ScreenSpaceAmbientOcclusionQualityLevel, and ScreenSpaceAmbientOcclusionBundle --------- Co-authored-by: IceSentry <c.giguere42@gmail.com> Co-authored-by: IceSentry <IceSentry@users.noreply.github.com> Co-authored-by: Daniel Chia <danstryder@gmail.com> Co-authored-by: Elabajaba <Elabajaba@users.noreply.github.com> Co-authored-by: Robert Swain <robert.swain@gmail.com> Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com> Co-authored-by: Brandon Dyer <brandondyer64@gmail.com> Co-authored-by: Edgar Geier <geieredgar@gmail.com> Co-authored-by: Nicola Papale <nicopap@users.noreply.github.com> Co-authored-by: Carter Anderson <mcanders1@gmail.com>
![image](https://github.com/bevyengine/bevy/assets/47158642/dbb62645-f639-4f2b-b84b-26fd915c186d) # Objective - Add Screen space ambient occlusion (SSAO). SSAO approximates small-scale, local occlusion of _indirect_ diffuse light between objects. SSAO does not apply to direct lighting, such as point or directional lights. - This darkens creases, e.g. on staircases, and gives nice contact shadows where objects meet, giving entities a more "grounded" feel. - Closes bevyengine#3632. ## Solution - Implement the GTAO algorithm. - https://www.activision.com/cdn/research/Practical_Real_Time_Strategies_for_Accurate_Indirect_Occlusion_NEW%20VERSION_COLOR.pdf - https://blog.selfshadow.com/publications/s2016-shading-course/activision/s2016_pbs_activision_occlusion.pdf - Source code heavily based on [Intel's XeGTAO](https://github.com/GameTechDev/XeGTAO/blob/0d177ce06bfa642f64d8af4de1197ad1bcb862d4/Source/Rendering/Shaders/XeGTAO.hlsli). - Add an SSAO bevy example. ## Algorithm Overview * Run a depth and normal prepass * Create downscaled mips of the depth texture (preprocess_depths pass) * GTAO pass - for each pixel, take several random samples from the depth+normal buffers, reconstruct world position, raytrace in screen space to estimate occlusion. Rather then doing completely random samples on a hemisphere, you choose random _slices_ of the hemisphere, and then can analytically compute the full occlusion of that slice. Also compute edges based on depth differences here. * Spatial denoise pass - bilateral blur, using edge detection to not blur over edges. This is the final SSAO result. * Main pass - if SSAO exists, sample the SSAO texture, and set occlusion to be the minimum of ssao/material occlusion. This then feeds into the rest of the PBR shader as normal. --- ## Future Improvements - Maybe remove the low quality preset for now (too noisy) - WebGPU fallback (see below) - Faster depth->world position (see reverted code) - Bent normals - Try interleaved gradient noise or spatiotemporal blue noise - Replace the spatial denoiser with a combined spatial+temporal denoiser - Render at half resolution and use a bilateral upsample - Better multibounce approximation (https://drive.google.com/file/d/1SyagcEVplIm2KkRD3WQYSO9O0Iyi1hfy/view) ## Far-Future Performance Improvements - F16 math (missing naga-wgsl support https://github.com/gfx-rs/naga/issues/1884) - Faster coordinate space conversion for normals - Faster depth mipchain creation (https://github.com/GPUOpen-Effects/FidelityFX-SPD) (wgpu/naga does not currently support subgroup ops) - Deinterleaved SSAO for better cache efficiency (https://developer.nvidia.com/sites/default/files/akamai/gameworks/samples/DeinterleavedTexturing.pdf) ## Other Interesting Papers - Visibility bitmask (https://link.springer.com/article/10.1007/s00371-022-02703-y, https://cdrinmatane.github.io/posts/cgspotlight-slides/) - Screen space diffuse lighting (https://github.com/Patapom/GodComplex/blob/master/Tests/TestHBIL/2018%20Mayaux%20-%20Horizon-Based%20Indirect%20Lighting%20(HBIL).pdf) ## Platform Support * SSAO currently does not work on DirectX12 due to issues with wgpu and naga: * gfx-rs/wgpu#3798 * gfx-rs/naga#2353 * SSAO currently does not work on WebGPU because r16float is not a valid storage texture format https://gpuweb.github.io/gpuweb/wgsl/#storage-texel-formats. We can fix this with a fallback to r32float. --- ## Changelog - Added ScreenSpaceAmbientOcclusionSettings, ScreenSpaceAmbientOcclusionQualityLevel, and ScreenSpaceAmbientOcclusionBundle --------- Co-authored-by: IceSentry <c.giguere42@gmail.com> Co-authored-by: IceSentry <IceSentry@users.noreply.github.com> Co-authored-by: Daniel Chia <danstryder@gmail.com> Co-authored-by: Elabajaba <Elabajaba@users.noreply.github.com> Co-authored-by: Robert Swain <robert.swain@gmail.com> Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com> Co-authored-by: Brandon Dyer <brandondyer64@gmail.com> Co-authored-by: Edgar Geier <geieredgar@gmail.com> Co-authored-by: Nicola Papale <nicopap@users.noreply.github.com> Co-authored-by: Carter Anderson <mcanders1@gmail.com>
![image](https://github.com/bevyengine/bevy/assets/47158642/dbb62645-f639-4f2b-b84b-26fd915c186d) # Objective - Add Screen space ambient occlusion (SSAO). SSAO approximates small-scale, local occlusion of _indirect_ diffuse light between objects. SSAO does not apply to direct lighting, such as point or directional lights. - This darkens creases, e.g. on staircases, and gives nice contact shadows where objects meet, giving entities a more "grounded" feel. - Closes bevyengine#3632. ## Solution - Implement the GTAO algorithm. - https://www.activision.com/cdn/research/Practical_Real_Time_Strategies_for_Accurate_Indirect_Occlusion_NEW%20VERSION_COLOR.pdf - https://blog.selfshadow.com/publications/s2016-shading-course/activision/s2016_pbs_activision_occlusion.pdf - Source code heavily based on [Intel's XeGTAO](https://github.com/GameTechDev/XeGTAO/blob/0d177ce06bfa642f64d8af4de1197ad1bcb862d4/Source/Rendering/Shaders/XeGTAO.hlsli). - Add an SSAO bevy example. ## Algorithm Overview * Run a depth and normal prepass * Create downscaled mips of the depth texture (preprocess_depths pass) * GTAO pass - for each pixel, take several random samples from the depth+normal buffers, reconstruct world position, raytrace in screen space to estimate occlusion. Rather then doing completely random samples on a hemisphere, you choose random _slices_ of the hemisphere, and then can analytically compute the full occlusion of that slice. Also compute edges based on depth differences here. * Spatial denoise pass - bilateral blur, using edge detection to not blur over edges. This is the final SSAO result. * Main pass - if SSAO exists, sample the SSAO texture, and set occlusion to be the minimum of ssao/material occlusion. This then feeds into the rest of the PBR shader as normal. --- ## Future Improvements - Maybe remove the low quality preset for now (too noisy) - WebGPU fallback (see below) - Faster depth->world position (see reverted code) - Bent normals - Try interleaved gradient noise or spatiotemporal blue noise - Replace the spatial denoiser with a combined spatial+temporal denoiser - Render at half resolution and use a bilateral upsample - Better multibounce approximation (https://drive.google.com/file/d/1SyagcEVplIm2KkRD3WQYSO9O0Iyi1hfy/view) ## Far-Future Performance Improvements - F16 math (missing naga-wgsl support https://github.com/gfx-rs/naga/issues/1884) - Faster coordinate space conversion for normals - Faster depth mipchain creation (https://github.com/GPUOpen-Effects/FidelityFX-SPD) (wgpu/naga does not currently support subgroup ops) - Deinterleaved SSAO for better cache efficiency (https://developer.nvidia.com/sites/default/files/akamai/gameworks/samples/DeinterleavedTexturing.pdf) ## Other Interesting Papers - Visibility bitmask (https://link.springer.com/article/10.1007/s00371-022-02703-y, https://cdrinmatane.github.io/posts/cgspotlight-slides/) - Screen space diffuse lighting (https://github.com/Patapom/GodComplex/blob/master/Tests/TestHBIL/2018%20Mayaux%20-%20Horizon-Based%20Indirect%20Lighting%20(HBIL).pdf) ## Platform Support * SSAO currently does not work on DirectX12 due to issues with wgpu and naga: * gfx-rs/wgpu#3798 * gfx-rs/naga#2353 * SSAO currently does not work on WebGPU because r16float is not a valid storage texture format https://gpuweb.github.io/gpuweb/wgsl/#storage-texel-formats. We can fix this with a fallback to r32float. --- ## Changelog - Added ScreenSpaceAmbientOcclusionSettings, ScreenSpaceAmbientOcclusionQualityLevel, and ScreenSpaceAmbientOcclusionBundle --------- Co-authored-by: IceSentry <c.giguere42@gmail.com> Co-authored-by: IceSentry <IceSentry@users.noreply.github.com> Co-authored-by: Daniel Chia <danstryder@gmail.com> Co-authored-by: Elabajaba <Elabajaba@users.noreply.github.com> Co-authored-by: Robert Swain <robert.swain@gmail.com> Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com> Co-authored-by: Brandon Dyer <brandondyer64@gmail.com> Co-authored-by: Edgar Geier <geieredgar@gmail.com> Co-authored-by: Nicola Papale <nicopap@users.noreply.github.com> Co-authored-by: Carter Anderson <mcanders1@gmail.com>
Hi! I am currently developing |
Just to crosslink, Rust now has a nightly-only binary16 |
From WGSL spec: https://gpuweb.github.io/gpuweb/wgsl/#floating-point-types
TODO
enable
directive #5476The text was updated successfully, but these errors were encountered: