Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error opening data file /usr/share/tessdata/.traineddata #6

Open
jens1o opened this issue Feb 15, 2019 · 4 comments
Open

Error opening data file /usr/share/tessdata/.traineddata #6

jens1o opened this issue Feb 15, 2019 · 4 comments

Comments

@jens1o
Copy link

jens1o commented Feb 15, 2019

Following example code:

extern crate tesseract;
use tesseract::Tesseract;

fn main() {
    let tesseract_instance = Tesseract::new();

    tesseract_instance.set_lang("deu");
    tesseract_instance.set_image("sample.jpg");
    dbg!(tesseract_instance.get_text());
}

The problem is that I do have tesseract training data installed, but it is named {language_name}.traineddata(for example: deu.traineddata), so this wrapper can't find it. When I manually run tesseract sample.jpg test.txt -l deu, it works seamlessly.

How can I manually change the file naming scheme?

@guygastineau
Copy link

I have this same issue. I just read the source for this crate on crates.io but I couldn't find where it is set.

@ccouzens
Copy link
Collaborator

Hello,

There is an environment variable TESSDATA_PREFIX. Would that help?

There is also a datapath parameter we could set inside set_lang. It's currently set to ptr::null(). I'm looking into exposing that as part of the API. Would that help?

This crate is a wrapper around Tesseract's c api. Do any of the functions there look like they might help? If so, I'll expose them.

@guygastineau
Copy link

Thanks for the reply, I'll mess with your suggestions and see what happens.

ccouzens added a commit that referenced this issue May 24, 2020
To recognise text, you first neeed to call initialize because it is the
only way of getting a TesseractInitialized object.

This avoids a situation where tesseract quits your program because you
forgot to initialize it.

Possibly the extra options will help with
#6.
I do plan on supporting all the options of
https://fossies.org/dox/tesseract-4.1.1/classtesseract_1_1TessBaseAPI.html#a96899e8e5358d96752ab1cfc3bc09f3e
in time.

#16
ccouzens added a commit that referenced this issue Jun 5, 2020
This was done by initializing on the call to new.

This was suggested by the pull request
#20 by @hdevalence.

This may address #6,
as we now expose datapath.
ccouzens added a commit that referenced this issue Jun 5, 2020
* Move unsafe handling code to plumbing module

This has given me the opportunity to review the safety of the unsafe
code - and it was found lacking.

But it's fixed in the plumbing module.

#16

I've tried to not put naming opinions within the plumbing module.
That is, things within it are named to match the c or c++ libraries of
leptonica and tesseract as much as possible.

This addresses #17 at
least within the plumbing module.

* Remove wrinkle directory

I'm not sure what purpose it served

* Use Result type in top level module

This gives users a chance to deal with errors, rather than having
panics.

* Use the builder pattern in the Tesseract struct

As pointed out in #18
it makes the API nicer.

And I'm breaking the API anyway, in order to return result types, so I may
as well make this further change.

* Make it impossible not to initialize tesseract

This was done by initializing on the call to new.

This was suggested by the pull request
#20 by @hdevalence.

This may address #6,
as we now expose datapath.
@GirkovArpa
Copy link

The environment variable TESSDATA_PREFIX can be set to the executable's directory like this:

// build.rs in top level folder (not in src folder)
fn main() {
    // load testdata file from same folder as executable
    println!("cargo:rustc-env=TESSDATA_PREFIX=");
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants