Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

process_boxes() unknown chars and misidentifies chars #33

Open
victoic opened this issue May 15, 2019 · 3 comments
Open

process_boxes() unknown chars and misidentifies chars #33

victoic opened this issue May 15, 2019 · 3 comments

Comments

@victoic
Copy link

victoic commented May 15, 2019

I started training again and noticed many characters not being identified as existing in the codec_rev. The data is from icdar2015, icdar2017 (MLT) and icdar2019 (MLT) and the provided codec.txt is used.

Stranger still is that the same error (unknown char) is showing up for data from icdar2015, which is completely composed of english characters.

unknown-chars

As shown by the image above, the character "थ" is not found in the codec_rev, but is found in GT for image 277 from icdar2015. However that's the GT from image 277:

gt-277

Is there some file enconding for the codec.txt that I must set? Can you provide some information about why is this happening?

@MichalBusta
Copy link
Owner

MichalBusta commented May 16, 2019 via email

@victoic
Copy link
Author

victoic commented May 16, 2019

code is for MLT 2017 version. So Hindi chars are missing.

Right, but as can be seem by the first "Unknown char" message in the first image, the error is given by a image from icdar2017 as well.

There are some naming conventions (relative gt path ...), please read data_gen.py

Ok, I've read it. From what I understand there is a path relevance to how the GT is loaded.
My dataset path looks like this:

  • images/
    • trainMLT.txt
    • icdar-2015-Ch4/
      • Train/
        • (images and gt here)
    • done/
      • icdar-2017-mlt/
        • (images and gt here)
      • icdar-2019-mlt/
        • (images and gt here)

Which seems to be ok by the generator class and the example directory in the repository. Am I understanding it wrong?

@MichalBusta
Copy link
Owner

MichalBusta commented May 16, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants