-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
process_boxes() unknown chars and misidentifies chars #33
Comments
On Wed, 15 May 2019 10:12 Victor Lundgren, ***@***.***> wrote:
I started training again and noticed many characters not being identified
as existing in the codec_rev. The data is from icdar2015, icdar2017 (MLT)
and icdar2019 (MLT) and the provided codec.txt is used.
code is for MLT 2017 version. So Hindi chars are missing.
Stranger still is that the same error (unknown char) is showing up for
data from icdar2015, which is completely composed of english characters.
[image: unknown-chars]
<https://user-images.githubusercontent.com/9040771/57793581-0b0f8e00-7718-11e9-85ad-48666bb3c656.png>
As shown by the image above, the character "थ" is not found in the
codec_rev, but is found in GT for image 277 from icdar2015. However that's
the GT from image 277:
[image: gt-277]
<https://user-images.githubusercontent.com/9040771/57793882-cb957180-7718-11e9-9201-23bc174f427c.png>
Is there some file formatting for the codec.txt that I must set? Can you
provide some information about why is this happening?
There are some naming conventions (relative gt path ...), please read
data_gen.py
… —
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#33?email_source=notifications&email_token=AA7KHMEYKUVA2F7FZXOOM43PVQ77BA5CNFSM4HNFJXQ2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GT7KDJA>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA7KHMGNRPNIG72CF7KPKMTPVQ77BANCNFSM4HNFJXQQ>
.
|
Right, but as can be seem by the first "Unknown char" message in the first image, the error is given by a image from icdar2017 as well.
Ok, I've read it. From what I understand there is a path relevance to how the GT is loaded.
Which seems to be ok by the generator class and the example directory in the repository. Am I understanding it wrong? |
On Wed, 15 May 2019 21:10 Victor Lundgren, ***@***.***> wrote:
code is for MLT 2017 version. So Hindi chars are missing.
Right, but as can be seem by the first "Unknown char" message in the first
image, the error is given by a image from icdar2017 as well.
I have no other explanation that for icdar 2017 it reads gt from mlt, since
the image names are the same. Sorry can't help more - I'm without access
to computer
… There are some naming conventions (relative gt path ...), please read
data_gen.py
Ok, I've read it. From what I understand there is a path relevance to how
the GT is loaded.
My dataset path looks like this:
- images/
- trainMLT.txt
- icdar-2015-Ch4/
- Train/
- (images and gt here)
- done/
- icdar-2017-mlt/
- (images and gt here)
- icdar-2019-mlt/
- (images and gt here)
Which seems to be ok by the generator class and the example directory in
the repository. Am I understand wrong?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#33?email_source=notifications&email_token=AA7KHMDWX6XGHAFXNF4B3QDPVTNDDA5CNFSM4HNFJXQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVQTE7Y#issuecomment-492909183>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA7KHMDUX7SC4TU345WF77DPVTNDDANCNFSM4HNFJXQQ>
.
|
I started training again and noticed many characters not being identified as existing in the codec_rev. The data is from icdar2015, icdar2017 (MLT) and icdar2019 (MLT) and the provided codec.txt is used.
Stranger still is that the same error (unknown char) is showing up for data from icdar2015, which is completely composed of english characters.
As shown by the image above, the character "थ" is not found in the codec_rev, but is found in GT for image 277 from icdar2015. However that's the GT from image 277:
Is there some file enconding for the codec.txt that I must set? Can you provide some information about why is this happening?
The text was updated successfully, but these errors were encountered: