Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify path to .wasm file in v2.0.0-beta.5 #282

Closed
ryan-codingintrigue opened this issue May 14, 2019 · 5 comments
Closed

Specify path to .wasm file in v2.0.0-beta.5 #282

ryan-codingintrigue opened this issue May 14, 2019 · 5 comments

Comments

@ryan-codingintrigue
Copy link

Problem.
In the latest version of tesseract.js-core@v2.0.0-beta.5 the .wasm file is embedded within the file as a Base64 Data URI.

Any workers which have an appropriately strict connect-src Content-Security-Policy header in place will fail to load.

In addition to the above, it might be nice to be able to cache the .wasm separately from the worker .js

Describe the solution you'd like
It would be great if we could optionally specify a path to the .wasm file for the worker to fetch from. If that path isn't specified, then the worker may download from Base64 as a fallback

@jeromewu
Copy link
Member

@ryan-codingintrigue

Is there any side effect if we choose to provide only non-embedded version? I am not quiet sure about how to do both with fallback capability.

@ryan-codingintrigue
Copy link
Author

It does mean that you must specify some form of wasmPath to locate the file which is unfortunately another parameter.

I think the usual thing to do is, if a path is provided use that, otherwise fallback to the current directory + wasm filename.

if(options.wasmPath) {
    fetch(options.wasmPath)
} else {
    fetch("tesseract.wasm");
}

If you wanted to avoid that, you could provide two distinct core bundles which can be used for corePath if desired:

  • tesseract-core.js - Default. Contains fetch("data:application/octet-stream...)
  • tesseract.js-core-external.js - Optional. Contains fetch(options.wasmPath)

@jeromewu
Copy link
Member

jeromewu commented May 14, 2019

I have upgrade tesseract-core.js to 2.0.0-beta.8 which contains 3 versions of tesseract-core:

Right now ASM version and WASM version (embedded) and it works fine, but I actually cannot get WASM version working. I leave it there for now, and feel free to send PR 😃

BTW, you can try it using code below (in browser) with tesseract.js@2.0.0-alpha.4:

const { TesseractWorker } = Tesseract;
const worker = new TesseractWorker({
  corePath: 'https://unpkg.com/tesseract.js-core@2.0.0-beta.8/tesseract-core.js',
});

worker
  .recognize(image)
  .then(result => console.log(result.text));

@WebReflection
Copy link
Contributor

FWIW, this is how I managed to make it work via an extension:

const worker = createWorker({
  errorHandler: console.error,
  workerBlobURL: false,
  workerPath: extension.getURL("js/worker.js"),
  corePath: extension.getURL("js/tesseract-core.wasm.js"),
  langPath: extension.getURL("tessdata")
});

At least in Firefox, this doesn't result into loading anything embedded, and the js/tesseract-core.wasm.js path is retrieved without issues.

@Balearica
Copy link
Member

Closing the issue as the existing solution (embedded js/wasm) seems to work fine. Feel free to reopen if there's a compelling reason for pursuing this further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants