Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search a specific pattern in the image #492

Closed
ZioTano opened this issue Oct 26, 2020 · 5 comments
Closed

Search a specific pattern in the image #492

ZioTano opened this issue Oct 26, 2020 · 5 comments

Comments

@ZioTano
Copy link

ZioTano commented Oct 26, 2020

A regex search would be a fantastic feature.
It could allow complex search in images.
it seems that a kind of that is already present in c++ branch of the project.
https://github.com/tesseract-ocr/tesseract/blob/442b5b7/dict/trie.h#L192

@Balearica
Copy link
Member

This is blocked by #613 as it looks like this is an "init only" parameter.

@Balearica
Copy link
Member

Balearica commented Sep 24, 2022

Support for initialization parameters was added as part of #613 to the dev/v4 branch, and will be released with version 4. If you would like to test before then, instructions are in #662.

Regarding the user pattern feature specifically, I confirmed that this is supported using the sample image and pattern file provided in the Tesseract documentation.

Edit: This example is outdated. For an example that works with v5, see this comment below.

Here is a minimal example:

<!DOCTYPE HTML>
<html>
  <head>
    <script src="/dist/tesseract.dev.js"></script>
  </head>
  <body>
    <input type="file" id="uploader">
    <script>
      const recognize = async function(evt){
        const files = evt.target.files;
        const worker = await Tesseract.createWorker({
          corePath: '/tesseract-core-simd.wasm.js',
          workerPath: '/dist/worker.dev.js',
          logger: function(m){console.log(m);},
          cacheMethod: 'none',
        });
        await worker.loadLanguage('eng');
        await worker.writeText("/user_patterns_file", String.raw`\A\A\d\d\d\d\A`);
        await worker.initialize('eng', undefined, "user_patterns_file /user_patterns_file");
        const ret = await worker.recognize(files[0]);
        console.log(ret.data.text);
      }
      const elm = document.getElementById('uploader');
      elm.addEventListener('change', recognize);
    </script>
  </body>
</html>

[Note: at the time of this writing version 4 has not released yet. Therefore, workerPath and corePath must both be manually set to versions of those files from the dev/v4 branch of Tesseract.js and Tesseract.js-core, respectively.]

user_pattern_example

@Balearica
Copy link
Member

Closing as this was added in Version 4.

@jlucaso1
Copy link

How i can do this in newer versions?

@Balearica
Copy link
Member

@jlucaso1 An example that works with the latest version of Tesseract (v5.0.5) is below.

<!DOCTYPE HTML>
<html>
  <head>
    <script src="/dist/tesseract.min.js"></script>
  </head>
  <body>
    <input type="file" id="uploader">
    <script>
      const recognize = async function(evt){
        const files = evt.target.files;
        const worker = await Tesseract.createWorker('eng', 1);

        await worker.writeText("/user_patterns_file", String.raw`\A\A\d\d\d\d\A`);
        await worker.setParameters({user_patterns_file: "/user_patterns_file"})

        await worker.reinitialize('eng', 1, {user_patterns_file: "/user_patterns_file"})

        const ret = await worker.recognize(files[0]);
        console.log(ret.data.text);
      }
      const elm = document.getElementById('uploader');
      elm.addEventListener('change', recognize);
    </script>
  </body>
</html>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants