Zlib compress all wasm files and decompress them during prefetch #170

eqrion · 2025-08-29T16:47:18Z

Partially fixes #154. A full fix would target some JS files. Opening to get early thoughts on it, there are some integration questions.

JetStreamDriver.js is extended to decompress .z files using zlib during prefetch. If prefetch is disabled, these files are still prefetched to ensure the decompression time is outside of the score. In the browser this uses DecompressionStream. In the shell this uses the zlib-wasm code to decompress the file.
A compress.py script is added that finds all wasm files and compresses them with zlib and then removes the originals.

Open questions:

Is it worth using something other than zlib? We need to support the shell, and I didn't want to vendor in a new library just for this.
Should we keep all the original uncompressed files? This patch doesn't, but instead the compress.py script can automatically decompress all the files in the tree for anyone who wants to read the build artifacts.
Should the compression happen in each individual build script or one central file for the repo? I sort of liked having it in one file because then I could implement automatic decompression for all builds easily. But it adds an extra step when building that might not be obvious.

netlify · 2025-08-29T16:47:24Z

✅ Deploy Preview for webkit-jetstream-preview ready!

Name	Link
🔨 Latest commit	`c3dce11`
🔍 Latest deploy log	https://app.netlify.com/projects/webkit-jetstream-preview/deploys/68b1d99af7234800085a66dd
😎 Deploy Preview	https://deploy-preview-170--webkit-jetstream-preview.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

camillobruni · 2025-09-01T16:38:21Z

We should go with a npm run script for decompression just for consistency with the rest
I'd be fine with the .z files given the easy way to get to the decompressed .wasm files
Not sure how folks feel about --no-prefetch and wasm in this case (at least for JS I'd want to have the uncompressed source files there so I can easily see the source file path in the raw profile), maybe we need to warn about this and just force manually run npm run decompress
I think you altered the code for &prefetchResources=false for JS-blobs in the browser – we should keep on using the raw sources there (see New Workload: prismjs source code highlighting #149 for an example)

danleh · 2025-09-01T16:41:26Z

Very cool! I'll leave some detailed comments on the PR next, but responding first to your question 1:

Is it worth using something other than zlib?

No, I don't think that's worth the hassle. Some quick data / experiment: I copied all .wasm files plus this list of "large input files" (including some model files from #148)

./transformersjs/build/models/Xenova/distilbert-base-uncased-finetuned-sst-2-english/onnx/model_uint8.onnx
./transformersjs/build/models/Xenova/whisper-tiny.en/onnx/decoder_model_merged_quantized.onnx
./transformersjs/build/models/Xenova/whisper-tiny.en/onnx/encoder_model_quantized.onnx
./transformersjs/build/models/Xenova/whisper-tiny.en/tokenizer.json
./transformersjs/build/models/Xenova/distilbert-base-uncased-finetuned-sst-2-english/tokenizer.json
./wasm/tfjs-model-coco-ssd.js
./wasm/tfjs-model-mobilenet-v1.js
./wasm/tfjs-model-mobilenet-v3.js
./wasm/tfjs-bundle.js
./wasm/tfjs-model-use.js
./wasm/dotnet/build-interp/wwwroot/_framework/icudt_no_CJK.dat
./wasm/dotnet/build-aot/wwwroot/_framework/icudt_no_CJK.dat
./wasm/dotnet/build-interp/wwwroot/_framework/icudt_CJK.dat
./wasm/dotnet/build-aot/wwwroot/_framework/icudt_CJK.dat
./wasm/dotnet/build-interp/wwwroot/_framework/icudt_EFIGS.dat
./wasm/dotnet/build-aot/wwwroot/_framework/icudt_EFIGS.dat
./SeaMonster/inspector-json-payload.js
./code-load/inspector-payload-minified.js

and compared different compression methods:

method	size	relative to uncompressed	relative to zlib
uncompressed	243MiB	100%	181%
zlib (script from this PR, uses -6 by default IIUC)	134MiB	55%	100%
`gzip -9`	134MiB	55%	99.6%
`zstd -19`	127MiB	52%	94.7%

I don't think those small savings of a better algorithm / library are worth adding another dependency for. (And we also could no longer use CompressionStream in the browser, since that only seems to support DEFLATE and gzip [spec])

danleh · 2025-09-01T17:42:53Z

Regarding the other points (including Camillo's):

We should go with a npm run script for decompression just for consistency with the rest

+1 to staying in the JavaScript/npm ecosystem. Would be happy to provide a port / alternative to compress.py in JavaScript in a PR later today.

Should we keep all the original uncompressed files?

One reason for this change was to make the repository smaller on disk (excluding .git/) for vendoring JetStream, so let's not keep the uncompressed files checked in. Also in particular for Wasm files or machine learning model weights, one cannot diff them conveniently anyway, e.g., when reviewing PRs here, so I don't see much value in keeping them. Having a simple script to uncompress sounds good enough.

Should the compression happen in each individual build script or one central file for the repo?

I agree that it's convenient to have a single script to decompress everything (in particular given the next point by Camillo). But I would like the build scripts to be self-contained / a single step; otherwise I think it's easy to forget or at least annoying having to run another python3 compress.py (or npm run compress) command after each build, e.g., when updating a workload. The compress command could take a list of files as input (including glob patterns), e.g., npm compress **/*.{wasm,dat} in the build script, and npm decompress could use **/*.z as the pattern by default.

Not sure how folks feel about --no-prefetch and wasm in this case (at least for JS I'd want to have the uncompressed source files there so I can easily see the source file path in the raw profile), maybe we need to warn about this and just force manually run npm run decompress

Agreed; right now compression always forces blob URLs. How about disabling decompression and stripping .z from each path when prefetchResources=false / --no-prefetch is given (i.e., make compression and no-preload-mode mutually exclusive) and then add something like Disabling resource prefetching! Also, please run 'npm decompress' to provide all the uncompressed resources in case you see failing requests or missing files. to the warning in

JetStream/JetStreamDriver.js

Line 85 in 5be6cdc

console.warn("Disabling resource prefetching!");

Based on `compress.py` from WebKit#170, with some modifications: - Can be run as `npm run compress` or simply `node compress.mjs` - Uses best zlib compression setting. - Takes arbitrary glob patterns for input files, defaulting to all .z files for decompression. - Copies the file mode over to avoid spurious git diffs.

Will also be useful for WebKit#170

danleh · 2025-09-01T14:54:13Z

JetStreamDriver.js

+
+        // If we aren't supposed to prefetch this and don't need to decompress it,
+        // then return code snippet that will load the url on-demand.
+        let compressed = isCompressed(url);


nit: const compressed = ...

danleh · 2025-09-01T14:54:34Z

JetStreamDriver.js

            return `load("${url}");`

        if (this.requests.has(url)) {
            return this.requests.get(url);
        }

-        const contents = readFile(url);
+        let contents;
+        if (isCompressed(url)) {


nit: use compressed from above.

danleh · 2025-09-01T15:32:35Z

compress.py

+                      help='Decompress all .z files in current directory and subdirectories')
+    parser.add_argument('--keep-input', action='store_true',
+                       help='Keep input files after processing (default: remove input files)')
+    parser.add_argument('--directory', default='.',


nit: for consistency with find and the likes, I would have expected this to be a positional argument, i.e.,

parser.add_argument('directory', nargs='?', default='.', help='Directory to search for files (default: current directory)')

danleh · 2025-09-02T00:25:41Z

JetStreamDriver.js

+
+// Fallback for shell environments without TextDecoder. This only handles valid
+// UTF-8, invalid buffers will lead to unexpected results.
+function decodeUTF8(int8Array) {


This could use the shared polyfill in #173 instead.

danleh · 2025-09-02T00:26:33Z

compress.py

Could use the node script from #172 instead (which stays in the NPM ecosystem, uses best zlib compression ration, copied file mode over).

danleh · 2025-09-02T00:27:28Z

JetStreamDriver.js

@@ -1161,7 +1273,7 @@ class GroupedBenchmark extends Benchmark {
            await benchmark.prefetchResourcesForBrowser();
    }

-    async retryPrefetchResourcesForBrowser() {
+    async retryjForBrowser() {


nit: intended naming change?

eqrion · 2025-09-02T16:24:52Z

Thanks for the reviews!

I like the idea of using node for the compression script, will use #172 once it has merged. And also the shared polyfill for TextDecoder.

* Not sure how folks feel about `--no-prefetch` and wasm in this case (at least for JS I'd want to have the uncompressed source files there so I can easily see the source file path in the raw profile), maybe we need to warn about this and just force manually run `npm run decompress`

Yeah that seems like a better path than just silently re-enabling prefetching for those files. I'll implement that.

    Should we keep all the original uncompressed files?
One reason for this change was to make the repository smaller on disk (excluding .git/) for vendoring JetStream, so let's not keep the uncompressed files checked in. Also in particular for Wasm files or machine learning model weights, one cannot diff them conveniently anyway, e.g., when reviewing PRs here, so I don't see much value in keeping them. Having a simple script to uncompress sounds good enough.

As long as the uncompressed files are not used by the default runner, it is fine for them to be checked in. I can exclude them when vendoring in the JS3 repo into Firefox, and only copy over the .z files. But also it does seem nice to only have one canonical version of things.

What might change this is if we wanted to compress JS files too (which can be diff'ed and inspected easily). From #154, there were three large JS files (excluding tfjs which is disabled) that could be good candidates for this:

12      ./web-tooling-benchmark/cli.js
12      ./web-tooling-benchmark/browser.js
12      ./RexBench/FlightPlanner/waypoints.js

How do folks feel about compressing JS too? If that's okay with folks, then we probably should keep the uncompressed versions around.

I agree that it's convenient to have a single script to decompress everything (in particular given the next point by Camillo). But I would like the build scripts to be self-contained / a single step; otherwise I think it's easy to forget or at least annoying having to run another python3 compress.py (or npm run compress) command after each build, e.g., when updating a workload. The compress command could take a list of files as input (including glob patterns), e.g., npm compress **/.{wasm,dat} in the build script, and npm decompress could use **/.z as the pattern by default.

That's fine with me too. I was just running out of time on Friday and wanted to have something quicker. Updating all the build scripts probably isn't too bad.

camillobruni · 2025-09-02T21:51:28Z

Thanks for kicking this off 👍

+1 on compressing large JS files too – given that this would just work transparently with prefetching!
Some of my pending PRs do have indeed huge files.
If we add an npm run shell ... helper or so, we could even hide the decompression transparently – so that would be fine

danleh · 2025-09-03T10:39:43Z

#172 landed, so feel free to use / rebase this on top of it.

~~Also +1 to compress large JS files.~~

As discussed, that could still work without keeping the original / uncompressed files in the repo. Basically the default config would do preloading and decompression during that preloading, so no uncompressed files on disk required. And without preloading, we just rewrite the URLs/file loads to strip .z and 404 / error out if not present, thus requiring to run npm compress -- -d beforehand (optionally integrated into a single step with npm run shell or npm run server as Camillo proposed).

Edit: Re-reading/thinking about the arguments, I am not so sure about compressing source JS files any more. (JS files that are just like blobs, generated, and won't be manually modified, e.g., inputs for babel are fine to compress.) Keeping uncompressed JS source files around for diff/code review/maintenance sounds like a good idea. In terms of transfer size during loading, there won't be any benefit to compressing in the repo, since a competent web server will use some compression scheme anyway. E.g., the Netlify preview uses brotli (see screenshot, ~2MB vs ~12MB uncompressed for waypoints.js)

eqrion · 2025-09-04T21:18:30Z

@danleh @camillobruni

Here's an alternative idea. What if we just left all of the files in this tree uncompressed, and only added support to JetStreamDriver for decompressing? It would then be up to anyone vendoring the tree to compress whatever files they want and rewrite the paths in the driver. I can have a script that does this as part of the mozilla vendoring process.

We wouldn't need to update any build scripts, or do anything for disablePrefetching+compression (we wouldn't be doing that on the vendored copy). I probably could drop all the shell polyfilling for zlib too because we only would be running the vendored copy in the browser. We'd also continue to get good diff's for free.

eqrion added 3 commits August 29, 2025 10:49

Decompress .z files in prefetch

5055ed2

Compress argon2.wasm

f9e5b43

Compress all wasm files

c3dce11

danleh mentioned this pull request Sep 1, 2025

Add a zlib-based compress.mjs script #172

Merged

danleh added a commit to danleh/JetStream that referenced this pull request Sep 2, 2025

Move and share TextEncoder/TextDecoder polyfill

7f080a0

Will also be useful for WebKit#170

danleh mentioned this pull request Sep 2, 2025

Move and share shell TextEncoder/TextDecoder polyfill #173

Merged

danleh added a commit to danleh/JetStream that referenced this pull request Sep 2, 2025

Move and share TextEncoder/TextDecoder polyfill

78eea34

Will also be useful for WebKit#170

danleh reviewed Sep 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Zlib compress all wasm files and decompress them during prefetch #170

Zlib compress all wasm files and decompress them during prefetch #170

Uh oh!

eqrion commented Aug 29, 2025

Uh oh!

netlify bot commented Aug 29, 2025 •

edited

Loading

Uh oh!

camillobruni commented Sep 1, 2025

Uh oh!

danleh commented Sep 1, 2025 •

edited

Loading

Uh oh!

danleh commented Sep 1, 2025

Uh oh!

danleh Sep 1, 2025

Uh oh!

danleh Sep 1, 2025

Uh oh!

danleh Sep 1, 2025

Uh oh!

danleh Sep 2, 2025

Uh oh!

danleh Sep 2, 2025

Uh oh!

danleh Sep 2, 2025

Uh oh!

eqrion commented Sep 2, 2025

Uh oh!

camillobruni commented Sep 2, 2025

Uh oh!

danleh commented Sep 3, 2025 •

edited

Loading

Uh oh!

eqrion commented Sep 4, 2025

Uh oh!

Uh oh!

Zlib compress all wasm files and decompress them during prefetch #170

Are you sure you want to change the base?

Zlib compress all wasm files and decompress them during prefetch #170

Uh oh!

Conversation

eqrion commented Aug 29, 2025

Uh oh!

netlify bot commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for webkit-jetstream-preview ready!

Uh oh!

camillobruni commented Sep 1, 2025

Uh oh!

danleh commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danleh commented Sep 1, 2025

Uh oh!

danleh Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

danleh Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

danleh Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

danleh Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

danleh Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

danleh Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

eqrion commented Sep 2, 2025

Uh oh!

camillobruni commented Sep 2, 2025

Uh oh!

danleh commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eqrion commented Sep 4, 2025

Uh oh!

Uh oh!

netlify bot commented Aug 29, 2025 •

edited

Loading

danleh commented Sep 1, 2025 •

edited

Loading

danleh commented Sep 3, 2025 •

edited

Loading