Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

often recognize 'u' wrongly #42

Closed
kojit opened this issue Oct 6, 2018 · 12 comments
Closed

often recognize 'u' wrongly #42

kojit opened this issue Oct 6, 2018 · 12 comments

Comments

@kojit
Copy link

kojit commented Oct 6, 2018

Hello,

I trained your model with mjsynth dataset and default parameter settings over 1000000 steps.
I found that the model often wrongly recognizes character 'u'.
It seems as if there is no 'u' class.
Do you have any thoughts about what the cause might be?

@kojit
Copy link
Author

kojit commented Oct 9, 2018

I've done training the model 2**21 steps.
But it still cannot recognize 'u' correctly, I also found that it doesn't recognize 'q' at all.
I tested with several images and found that the probability of 'u' and 'q' before the CTC layer are always 0.
Does anyone have such experiences?

@kojit
Copy link
Author

kojit commented Oct 11, 2018

It's wired though, after I only changed the CNN model to Shi et al.'s CRNN architecture version, then it recognizes 'u'.

@weinman
Copy link
Owner

weinman commented Oct 11, 2018

What is the training loss? Validation loss? What values does test.py report on the test data?

The default training parameters sometimes can get stuck (quite early in training) in a poor local minimum. I've never investigated specific character-level confusions/probabilities, but I definitely don't see this behavior in my own experiences.

To avoid local minima, I have set up an alternative training schedule that starts with a small batch size and increases from 16 to 128 as the step size (no staircase) decreases from .0001 down to .000003. (See Takase et al.)

@kojit
Copy link
Author

kojit commented Oct 11, 2018

Thanks for your reply.

test.py shows as follows although I didn't use the entire test set because it's too slow.

{'total_num_labels': 144942, 'total_num_sequence_errs': 3892, 'total_num_label_errors': 6711, 'mean_label_error': 0.04630127913234259, 'loss': 1.5078024, 'total_num_sequences': 17837, 'mean_sequence_error': 0.21819812748780623, 'global_step': 2097152}

I understand that you've never seen this problem and you think it's a local minimum.
I'd like to try changing battch size.

@weinman
Copy link
Owner

weinman commented Oct 11, 2018

Those label error rates and sequence error rates seem pretty reasonable. Maybe its not a local minimum.

That loss seems a bit high, but I just realized that test.py probably reports only the last test batch's loss (rather than a cumulative average, which it should).

What's the smoothed training loss (i.e., as reported in tensorboard)? (Say with a smoothing factor of something like 0.95.)

My training schedule is as follows:

Batch Size Learning Rate Steps (Cumulative)
16 1e-4 2^16
32 3e-5 2^18
64 3e-5 2^19
128 1e-5 2^19 + 2^18
128 3e-6 2^20

@kojit
Copy link
Author

kojit commented Oct 11, 2018

Smoothed training loss is 1.072.

@weinman
Copy link
Owner

weinman commented Oct 11, 2018

Oh yeah that's probably not very good. You want it down around 0.4–0.5.

The colors below indicate the training sessions in the table above.

image

@kojit
Copy link
Author

kojit commented Oct 11, 2018

I've done training the model 2**21 steps.
What was wrong...?

@weinman
Copy link
Owner

weinman commented Oct 11, 2018

@kojit I forgot to add, I set --decay_rate=1.0 so the learning rate was fixed at each stage of training.

I recommend you read the recent Neural Computation paper I cited above to get a sense of why it's not the number of steps but the batch size that can have an overriding performance impact.

@kojit
Copy link
Author

kojit commented Oct 12, 2018

Thanks. I will try with that and report later.

@sahilbandar
Copy link
Contributor

sahilbandar commented Oct 31, 2018

Same here, I've trained this for the 1^21 epochs, It is not able to recognise 8 and 9. Is anything there which I've to modify in training hyper parameters settings?

@weinman
Copy link
Owner

weinman commented Oct 31, 2018

@sahilbandar Just decay rate=1, as well as the batch size, learning rate, max number of steps, (and tune from) to set the schedule noted above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants