-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
the api of detect_full_text() return incorrect results when the texts is chinese characters #3124
Comments
Hi @HopLiu, I do not think the client library does anything differently with regard to character encodings between Could you provide me with a reproduction case? (Basically, what is an image I can use to observe this behavior?) I expect it is a problem on the vision backend, but I would like to confirm that before passing the report on. |
Thanks @HopLiu! I can test this out real quick @lukesneeringer if you want. |
@daspecster Sure. I have faith in the reproduction case, what I really want to know is whether it is our bug or the backend API's bug. |
@lukesneeringer so I'm still looking into this but for (Pdb) full_text.text
u'Eitute : WIHFW PGA TOUR DRAFX HESO 9: 274400\nFBIE ST9 : 0530-3560885 400-607 1001 (AFE HOiE)\n1$US49: 0530-3560898\n' symbols {
property {
detected_languages {
language_code: "cy"
}
}
bounding_box {
vertices {
x: 141
y: 442
}
vertices {
x: 156
y: 442
}
vertices {
x: 156
y: 462
}
vertices {
x: 141
y: 462
}
}
text: "U"
} Example(Pdb) full_text.pages[0].blocks[0].paragraphs[0].words[0].symbols[0].text
u'E'
(Pdb) full_text.pages[0].blocks[0].paragraphs[0].words[0].symbols[1].text
u'i'
(Pdb) full_text.pages[0].blocks[0].paragraphs[0].words[0].symbols[2].text
u't'
(Pdb) full_text.pages[0].blocks[0].paragraphs[0].words[0].symbols[3].text
u'u'
(Pdb) full_text.pages[0].blocks[0].paragraphs[0].words[0].symbols[4].text
u't' |
Also, I tried passing
I don't get that error if I leave I think we need some backend confirmation of what exactly is supported. |
@gguuss would you have any insight on this? |
This sounds like a backend issue to me. |
The API accepts an optional parameter, the image context, which needs to specify the language. I am going to see if I can determine how to specify this in our Python Cloud client. |
So this may be related to my #3132 issue then. @gguuss do you know if all annotation API's support ImageContext right now? If I need to support ImageContext for all annotation types then I can do that, but if it's only one or two types(as it was in the past) then I tried to make adding that information to the API call more direct. |
The ImageContext is not used by all features. I think the context configuration is used for crop hints, language features, and landmark / entities / labels. Do we currently have a way of setting it? |
I authored a a web-based proof of concept that correctly detects Chinese text so it is definitely not a backend issue. Passing the image context does not appear to have side effects when extra parameters are passed, for example, landmark detection still works with language set to |
@gguuss your example uses this library? |
@gguuss I think I missed adding that. I would have sworn I had that for the I do use @lukesneeringer I can get to work on this if there aren't other priorities? |
Go for it. |
@daspecster My example is Apiary on JavaScript (insert joke about merely being a front-end developer and microservices on Python scaring / frightening me). |
No jokes here! I hail from the frontend as well..but a long time ago in a galaxy far far away. |
Btw, crop hints works well, maybe we can similarly accept an optional parameter on detect_text and detext_fulltext for language. |
now i use detect_full_text() to detect words in sdk way. there are two issues:
since i need the info of each words and bounds, i have to use the first api. Any ideas or is this an exist issue of this api?
The text was updated successfully, but these errors were encountered: