-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
often recognize 'u' wrongly #42
Comments
I've done training the model 2**21 steps. |
It's wired though, after I only changed the CNN model to Shi et al.'s CRNN architecture version, then it recognizes 'u'. |
What is the training loss? Validation loss? What values does The default training parameters sometimes can get stuck (quite early in training) in a poor local minimum. I've never investigated specific character-level confusions/probabilities, but I definitely don't see this behavior in my own experiences. To avoid local minima, I have set up an alternative training schedule that starts with a small batch size and increases from 16 to 128 as the step size (no staircase) decreases from .0001 down to .000003. (See Takase et al.) |
Thanks for your reply. test.py shows as follows although I didn't use the entire test set because it's too slow. {'total_num_labels': 144942, 'total_num_sequence_errs': 3892, 'total_num_label_errors': 6711, 'mean_label_error': 0.04630127913234259, 'loss': 1.5078024, 'total_num_sequences': 17837, 'mean_sequence_error': 0.21819812748780623, 'global_step': 2097152} I understand that you've never seen this problem and you think it's a local minimum. |
Those label error rates and sequence error rates seem pretty reasonable. Maybe its not a local minimum. That loss seems a bit high, but I just realized that What's the smoothed training loss (i.e., as reported in tensorboard)? (Say with a smoothing factor of something like 0.95.) My training schedule is as follows:
|
Smoothed training loss is 1.072. |
I've done training the model 2**21 steps. |
@kojit I forgot to add, I set I recommend you read the recent Neural Computation paper I cited above to get a sense of why it's not the number of steps but the batch size that can have an overriding performance impact. |
Thanks. I will try with that and report later. |
Same here, I've trained this for the 1^21 epochs, It is not able to recognise 8 and 9. Is anything there which I've to modify in training hyper parameters settings? |
@sahilbandar Just decay rate=1, as well as the batch size, learning rate, max number of steps, (and tune from) to set the schedule noted above. |
Hello,
I trained your model with mjsynth dataset and default parameter settings over 1000000 steps.
I found that the model often wrongly recognizes character 'u'.
It seems as if there is no 'u' class.
Do you have any thoughts about what the cause might be?
The text was updated successfully, but these errors were encountered: