-
Notifications
You must be signed in to change notification settings - Fork 1k
Cross-Origin Storage API Extension #1442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Cross-Origin Storage API Extension #1442
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
src/utils/CrossOriginStorage.js
Outdated
@@ -0,0 +1,71 @@ | |||
class CrossOriginStorage { | |||
static isAvailable = () => "crossOriginStorage" in navigator; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should use typeof
check here, otherwise we get crashes in Node.js. For example, from the unit tests:
2025-10-17T14:21:29.0544102Z FAIL tests/pipelines.test.js
2025-10-17T14:21:29.0567084Z ● Pipelines › Audio Classification › should be an instance of AudioClassificationPipeline
2025-10-17T14:21:29.0568045Z
2025-10-17T14:21:29.0568875Z The error below may be caused by using the wrong test environment, see https://jestjs.io/docs/configuration#testenvironment-string.
2025-10-17T14:21:29.0569870Z Consider using the "jsdom" test environment.
2025-10-17T14:21:29.0570210Z
2025-10-17T14:21:29.0570439Z ReferenceError: navigator is not defined
2025-10-17T14:21:29.0570742Z
2025-10-17T14:21:29.0571255Z �[0m �[90m 1 |�[39m �[36mclass�[39m �[33mCrossOriginStorage�[39m {
2025-10-17T14:21:29.0572641Z �[31m�[1m>�[22m�[39m�[90m 2 |�[39m �[36mstatic�[39m isAvailable �[33m=�[39m () �[33m=>�[39m �[32m"crossOriginStorage"�[39m �[36min�[39m navigator�[33m;�[39m
2025-10-17T14:21:29.0573856Z �[90m |�[39m �[31m�[1m^�[22m�[39m
2025-10-17T14:21:29.0574386Z �[90m 3 |�[39m
2025-10-17T14:21:29.0575040Z �[90m 4 |�[39m match �[33m=�[39m �[36masync�[39m (request) �[33m=>�[39m {
2025-10-17T14:21:29.0576260Z �[90m 5 |�[39m �[36mconst�[39m hashValue �[33m=�[39m �[36mawait�[39m �[36mthis�[39m�[33m.�[39m_getFileHash(request)�[33m;�[39m�[0m
2025-10-17T14:21:29.0577116Z
2025-10-17T14:21:29.0577475Z at Function.isAvailable (src/utils/CrossOriginStorage.js:2:54)
2025-10-17T14:21:29.0578105Z at getModelFile (src/utils/hub.js:483:27)
2025-10-17T14:21:29.0578631Z at getModelText (src/utils/hub.js:696:26)
2025-10-17T14:21:29.0579145Z at getModelJSON (src/utils/hub.js:716:24)
2025-10-17T14:21:29.0579821Z at Function.from_pretrained (src/models/auto/processing_auto.js:51:42)
2025-10-17T14:21:29.0580496Z at loadItems (src/pipelines.js:3527:27)
2025-10-17T14:21:29.0581023Z at pipeline (src/pipelines.js:3465:27)
2025-10-17T14:21:29.0581858Z at Object.<anonymous> (tests/pipelines/test_pipelines_audio_classification.js:15:20)
2025-10-17T14:21:29.0582434Z
2025-10-17T14:21:29.0583006Z ● Pipelines › Audio Classification › batch_size=1 › default (top_k=5)
See
Line 35 in fcf2ec9
const IS_WEBGPU_AVAILABLE = typeof navigator !== 'undefined' && 'gpu' in navigator; |
for example
src/utils/CrossOriginStorage.js
Outdated
if (!hashValue) { | ||
return undefined; | ||
} | ||
const hash = { algorithm: "SHA-256", value: hashValue }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know I hard-coded it, but future versions of COS may use other hashing algorithms. So to future-proof this, maybe make this stand out more by putting it in a constant at the top.
}; | ||
|
||
_getFileHash = async (url) => { | ||
if (/\/resolve\/main\/onnx\//.test(url)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is essentially scraping the website. Maybe leave the original comment from my code where this was linked to an explanation on the HF docs. Also see the comment above about future-proofing this for possible algorithm changes.
await writableStream.close(); | ||
}; | ||
|
||
_getFileHash = async (url) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't "see" the requests for the ORT Wasm files. Those should be 100% cached in COS for guaranteed cache hits as any Transformers.js or ONNX Runtime Web uses the same few files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to happen in ORT, or you can of course do it "by hand".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Wasm file fetch might happen here (line 12), but not 100% sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we have been meaning to "control" this on the Transformers.js side by loading and caching the binary, then pointing wasmPaths
to this buffer.
Just need to get around to adding it :)
_getFileHash = async (url) => { | ||
if (/\/resolve\/main\/onnx\//.test(url)) { | ||
const rawUrl = url.replace(/\/resolve\//, "/raw/"); | ||
const text = await fetch(rawUrl).then((response) => response.text()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This runs every time, which means you can't run fully offline. Instead, this should cache the mapping url=>hash and return the cached value. I had this in my initial implementation and remember there was some trickery needed to make it work with the actual URLs (I don't remember, but maybe it had to do with the post-redirect URLs that point at the CDN? Just copy what I had, this worked :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I did that deliberately. From my point of view, it's a question of separation of concerns/responsibilities. I dont think it is the responsibility of transformers.js to ensure that everything works offline. It is our responsibility to do our best to keep the download payload as little as possible. But here I dont this we need to cache this request since it is tiny.
On the other hand, we would risk that new versions of an ONNX file would not be loaded because the cached SHA value does not change. And it would not be obvious to the user or the app developer why.
In my opinion, if a developer wanted to have a fully offline solution they should solve the offline-caching on a ServiceWorker-level. We could help with that but we should not abstract it away by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. And stale-while-revalidate as a caching strategy for these "get SHA-256 hash" routes would work perfectly both for always being offline-capable and for never missing a new model. This should likely be added somewhere as a best practice in the docs, but for here: LGTM.
This should not be merged yet.
Instead its an experimantal implementation of the Cross-Origin Storage API that the Google Chrome Team is working on:
https://github.com/explainers-by-googlers/cross-origin-storage
To test is you need to install the Cross-Origin Storage API extension in your browser:
https://github.com/web-ai-community/cross-origin-storage-extension?tab=readme-ov-file