Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KTX2Loader: Support transcoding UASTC HDR to BC6H and RGBA16 #29730

Merged
merged 5 commits into from
Oct 24, 2024

Conversation

donmccurdy
Copy link
Collaborator

@donmccurdy donmccurdy commented Oct 23, 2024

Related:

Adds support for transcoding UASTC HDR (unsigned half float), in the following order of preference:

  1. ASTC (requires WEBGL_compressed_texture_astc)
  2. BC6H (requires EXT_texture_compression_bptc, requires transcoder)
  3. RGBA16 (requires transcoder)

I don't have a Windows desktop device available for testing. If someone would be able to check the webgl_loader_texture_ktx2.html example, select the "RGBA16 Linear (UASTC HDR)" sample in the dropdown, and then report the output to the JS Console, that would be much appreciated! On my macOS laptop, the output is...

format: RGBA_ASTC_4x4
type: HalfFloatType
colorSpace: srgb-linear

... but I'd expect a Windows device to select BC6H, instead.

Unrelated — I found a bug while working on the PR, which caused KTX2Loader to often select a non-optimal transcoding format, e.g. selecting BCn while ETC1/2 was also available. That issue is fixed in this PR as well, and should improve transcoding time and quality. Reverted for now; this caused other issues.

To create Basis HDR textures you'll need the latest release of the basisu CLI, and a .exr input file:

basisu sample.exr

@donmccurdy donmccurdy marked this pull request as ready for review October 23, 2024 04:15
@zeux
Copy link
Contributor

zeux commented Oct 23, 2024

So on Linux with Firefox & AMD GPU (radeonsi), WebGL 2 exposes both EXT_texture_compression_bptc and WEBGL_compressed_texture_astc. However, ASTC isn't actually supported in hardware; radv exposes ASTC for... some reason (?).. and does a software decompress on upload from what I saw. Maybe this will change if Firefox switches to ANGLE, unsure if they have plans.

As a consequence, on Linux this PR actually prints "format: RGBA_ASTC_4x4" when selecting the HDR format; additionally, the texture doesn't render correctly:

image

If I force disable ASTC by patching KTX2Loader.js to pretend it isn't supported, then I get proper result with "RGB_BPTC_UNSIGNED" printed in the log:

image

@zeux
Copy link
Contributor

zeux commented Oct 23, 2024

found a bug while working on the PR, which caused KTX2Loader to often select a non-optimal transcoding format, e.g. selecting BCn while ETC1/2 was also available

Is this part of the PR? I thought it's a separate commit but neither commits nor changes make it obvious as to where it is.

@zeux
Copy link
Contributor

zeux commented Oct 23, 2024

Oh, re: radeonsi, correction: the underlying GL driver only exposes GL_KHR_texture_compression_astc_ldr and doesn't expose GL_KHR_texture_compression_astc_hdr. This is common on mobile hardware as well from I recall: some devices opt out of HDR support to conserve die area for decoding. Looks like WebGL exposes both via the same extension but requires to use getSupportedProfiles() to check for support which three.js does not do.

@donmccurdy
Copy link
Collaborator Author

@zeux thanks so much, this is really helpful!

I'll update the PR to use getSupportedProfiles() in WebGL. The WebGPU spec doesn't mention ldr/hdr profiles, and as all listed formats are unorm, it probably doesn't support the HDR format, so I'll test that and disable it in WebGPU if needed.

found a bug while working on the PR, which caused KTX2Loader to often select a non-optimal transcoding format, e.g. selecting BCn while ETC1/2 was also available

Is this part of the PR? I thought it's a separate commit but neither commits nor changes make it obvious...

These lines were the problem...

const ETC1S_OPTIONS = FORMAT_OPTIONS.sort( ( a, b ) => a.priorityETC1S - b.priorityETC1S );
const UASTC_OPTIONS = FORMAT_OPTIONS.sort( ( a, b ) => a.priorityUASTC - b.priorityUASTC );

... because .sort() modifies the array in-place, the ETC1S options were sorted by UASTC priority. I've added a .filter() call in this PR, creating a new array for each format.

@zeux
Copy link
Contributor

zeux commented Oct 23, 2024

These lines were the problem...

Got it, thanks. After looking at this again, the behavior before this PR in the sample was: ETC1S selects BPTC, UASTC selects ASTC; the behavior after this PR is: ETC1S selects ETC2, UASTC selects ASTC.

I feel like it might make sense to revisit the priorities separately given the hybrid behavior noted above; even before this change, I'm not sure that ETC1S selecting BPTC was the correct course of action (this results in 1 byte per pixel and uses BC7 which can reach a very high quality; I'd think that the quality of ETC1S encoding is sufficiently representable via DXT1/5 based on presence of alpha, which would save memory when textures don't have alpha, but I haven't checked this); after this change though, on Linux effectively I'm getting driver-uncompressed formats everywhere.

@zeux
Copy link
Contributor

zeux commented Oct 23, 2024

WebGPU doesn't support ASTC HDR yet; there's a pending proposal for this via a separate feature string: gpuweb/gpuweb#3856

@donmccurdy
Copy link
Collaborator Author

donmccurdy commented Oct 23, 2024

The target format priorities for ETC1S are based on this decision tree:

https://github.com/KhronosGroup/3D-Formats-Guidelines/blob/main/KTXDeveloperGuide.md#transcode-target-selection-rgb-and-rgba

Preferring BPTC over ETC2 was an accident, but I see your point that preferring BC1/3 over BC7 could be a better choice.

... after this change though, on Linux effectively I'm getting driver-uncompressed formats everywhere.

Ouch, do you mean that the ETC1S→ETC2 transcode path is being decompressed by drivers on Linux too? From your earlier comment I assumed this software decompress was only a concern when using ASTC HDR without checking the supported profiles, and wouldn't affect the non-HDR ETC1S and UASTC formats.

@zeux
Copy link
Contributor

zeux commented Oct 23, 2024

Let's ignore the LDR behavior for the purpose of this PR, I need to look at the implications of what the driver is doing here wrt speed and quality... but Mesa actually has transcode paths internally now, so the texture should be recompressed into the relevant format. So ETC1S would go to the driver as ETC2 (and UASTC as ASTC) with minimal amount of transcoding via Basis, but the driver will then transcode from ETC2 to DXT1/5 (based on presence of alpha) or from ASTC to DXT5. (in the previous comments I was assuming the driver simply decompresses, but that's no longer the case as of a few years ago, as it now implements transcoding)

The transcoding for ETC2 is basically a block recompression: it decompresses each block into 4x4 RGBA8 and then compresses it back using a custom compression code that's probably not super high quality but since the starting point is ETC1S which is much weaker than ETC2 it might be enough.

The transcoding for ASTC likely loses a fair bit of quality, as it chooses DXT5 as the target format; that format is weaker than UASTC, and since transcoder is also doing a decompress-recompress with custom code it might have weaknesses of its own.

From this perspective, your change would actually be beneficial for memory size on Linux... (as before the driver was just handed a BPTC texture, and now it's handed a ETC2 texture which it transcodes to DXT1) -- but I'd need to understand the quality implications here to see if we actually need priority tuning, aside from the BPTC-DXT1 question that I'd also need to test.

@zeux
Copy link
Contributor

zeux commented Oct 23, 2024

Let's ignore the LDR behavior for the purpose of this PR

As I mentioned, before this PR, ETC1S textures would select BPTC on Linux/AMD configuration, and now they select ETC2 and they trigger the ETC2->DXT transcode in the driver.

This is actually very problematic: the driver transcode is very slow, and synchronous as it happens during WebGL texture upload on the CPU. It takes 8 seconds (!!) and during this the web page is completely unresponsive, using webgl_loader_gltf_compressed.html as is on this PR. On master, the transcode happens in Basis background worker and takes at most ~160ms on one of four workers. Obviously on master the sorting issue merely masks the problem by favoring BPTC over ETC2 (and excluding ASTC as the source is ETC1S).

The "green" area on the timeline is the driver work to transcode the textures.
image

The behavior with ASTC is a little better: transcode is still synchronous, but faster and takes ~2 seconds; transcode is asynchronous and reasonably faster. Note that this PR does not change the effective ASTC LDR behavior, so this problem exists on master for ASTC, but not for ETC1S:

image

(the test above is the same asset, coffeemat.glb, but converted from ETC1S to UASTC)

If I force-disable ASTC (which makes the transcoder choose BPTC for UASTC inputs), the stall disappears; Wasm transcoding takes a little longer (~300ms per worker instead of ~200ms - note that UASTC -> ASTC transcoding is still decidedly not free).

image

None of these problems affect Chrome on Linux: it uses ANGLE, and it restricts ASTC and ETC support with vendor checks by only exposing them on specific Intel GPUs:

    // Although "Sandy Bridge", "Ivy Bridge", and "Haswell" may support GL_ARB_ES3_compatibility
    // extension, ETC2/EAC formats are emulated there. Newer Intel GPUs support them natively.
    ANGLE_FEATURE_CONDITION(
        features, allowETCFormats,
        isIntel && !IsSandyBridge(device) && !IsIvyBridge(device) && !IsHaswell(device));

    // Mesa always exposes ASTC extension but only Intel Gen9, Gen11, and Gen12 have hardware
    // support for it. Newer Intel GPUs (Gen12.5+) do not support ASTC.
    ANGLE_FEATURE_CONDITION(features, allowAstcFormats,
                            !isMesa || isIntel && (Is9thGenIntel(device) || IsGeminiLake(device) ||
                                                   IsCoffeeLake(device) || Is11thGenIntel(device) ||
                                                   Is12thGenIntel(device)));

So, conclusion:

  • The fix to sorting behavior is unfortunately creating a very significant problem wrt upload behavior for ETC1S on Linux/AMD/Firefox
  • The same problem existed for UASTC LDR inputs, this PR doesn't change that, the problem is less severe vs ETC as the transcode is multiple times faster, but it's still not great and should probably be addressed.
  • Neither of these exist on Chromium as it uses ANGLE which disables the emulated extensions.

I'm not sure if this should be classified as a Firefox issue that should be fixed on their end, or something that KTX2Loader should work around by adjusting the priorities; I would probably argue for the latter as the performance impact is severe, but this can probably still be done separately from this PR?

@donmccurdy
Copy link
Collaborator Author

donmccurdy commented Oct 24, 2024

Thanks again @zeux! Could I ask how you're producing the graphs above? Do you know whether that's possible on macOS? I've reverted the would-be fix for ETC1S, so the selected transcoding target should be the same before and after this PR. I also added an inline comment about the situation. Finally, I added the required check for getSupportedProfiles().

As followups after this PR, we could:

  1. Consider detecting the Firefox user agent and adjusting sort-order as a special case
  2. Apply a similar fix for UASTC in Firefox
  3. File an issue with Firefox explaining the situation

Note that in WebGPU we don't currently have access to BPTC, so if similar emulation occurs there may be more issues to work through at some point.

@zeux
Copy link
Contributor

zeux commented Oct 24, 2024

The graphs are captured from Firefox’s builtin profiler. I assume this is Linux specific: on macOS
I wouldn’t think the GL driver is exposing any extensions the HW doesn’t support natively, but not sure. Apple Silicon based Macs do support the full gamut of formats (so I wouldn’t be surprised to see all of ETC/ASTC/BPTC/DXT exposed) but that’s all not emulated so there shouldn’t be a penalty for choosing “incorrectly”.

re: WebGPU, unless I’m mistaken, BPTC should be exposed under -bc feature along with other BCn formats? bc7-rgba-unorm format in this case (LDR) and bc6h-rgb-ufloat for HDR

@mrdoob mrdoob added this to the r170 milestone Oct 24, 2024
@donmccurdy
Copy link
Collaborator Author

donmccurdy commented Oct 24, 2024

re: WebGPU, unless I’m mistaken, BPTC should be exposed under -bc feature along with other BCn formats?

Oops I see – there is no 'texture-compression-bptc', just 'texture-compression-bc'. I'm not sure what happened here, we'll need to make some corrections in our WebGPU feature detection. I'll follow up on that in a separate PR.

@donmccurdy donmccurdy merged commit 35adad2 into mrdoob:dev Oct 24, 2024
11 checks passed
@donmccurdy donmccurdy deleted the feat/ktx2loader-uastc-hdr-bc6h branch October 24, 2024 21:27
@donmccurdy
Copy link
Collaborator Author

Opened a new issue to track the remaining tasks:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants