-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question Regarding End Model #66
Comments
Indirectly, you could probably train a sequence recognizer with MapTextSynthesizer. You could generate a static list of captions (phrases) to sample from as if they were words (though I'm not sure whether the spaces would render properly, maybe @arthurhero knows), but the better thing to do would be to choose the random phrase dynamically on the fly, which would require some more substantial modifications to the code. In either case, you could then use the CTCWordBeamSearch module in a multi-word mode to recognize the text (or plain Tensorflow CTC beam search if you don't want a lexicon). Just remember to include a space among the output characters in |
The spacing should be fine. But since phrases tend to be longer than words, pay attention to the hard upper limit of the image width, which can be set in mts_texthelper.cpp at line 562:
Currently the hard limit is 40 times the image height. You might want to set it higher for phrases. |
Perfect, that feedback was exactly what I needed. Thank you both, I will continue to update the code to a cleaner TF2.0. Ideally getting rid of all "tf.compat.vX". As I said before thank you again for having this project public. |
Preface: So my background when coming into machine learning is mainly just understanding programming. Ive gained some solid knowledge about data framing etc but the math of neural networks is still a major weakness.
So my question I believe I already understand but I want to verify before I waste time on these efforts.
1.) The MJSYNTH dataset will only teach the model how to break down/identify/process singular wards?
2.) Assuming yes to (2); Which in result means it needs to be taught how to read sentence and paragraph/spacing structures, correct?
3.) Assuming yes to (2); Is that where your teams work on mapsynth came into play?
4.) Assuming yes to (3); Is the finished mjsynth model then trained on the mapsynth dataset or is it finetuned/scoped to the mapsynth dataset?
5.) Assuming yes to (4); What is the global_step/loss/learning rate etc that is ideal to train it on that dataset?
Yes/No should suffice for all 5. If no maybe a very short rationale. Thank you again for this project being public. Wish your team well on the 2019 ICDAR. I will also post a 1 million step model trained on a single GPU if your team would like a copy on hand or provide to the public.
The text was updated successfully, but these errors were encountered: