Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help Needed #10

Closed
SibtainRazaJamali opened this issue May 2, 2019 · 7 comments
Closed

Help Needed #10

SibtainRazaJamali opened this issue May 2, 2019 · 7 comments
Labels
question Further information is requested

Comments

@SibtainRazaJamali
Copy link

I am training a CRNN model in pytorch
max_seq_length=99
number_of_alphabets=96
batch_size=16
output=CRNN(image)
what should be the expected shape of output?
Secondly, should we apply softmax in CRNN after fully connected layer?
Any help would be appreciated. Thanks

@zhiqwang
Copy link
Owner

zhiqwang commented May 2, 2019

It seems that your image is of different size. The network suppose your image's hight is 32, and the width is multiplies of 8 by default. The CNN backbone of the network compress image's width by 1/4 in arch densenet_cifar and 1/8 in arch densenet_cifar.

So if your image's width is 80, the size of output is (20, 16, 97) (96+1, encoder code 0 is reserved for CTC blank token) in --arch densenet121.

Or you can train the network with --keep-ratio options to keep the image's width.

The nn.CTCLoss implementation suppose it has applied log_softmax operations, you can refer the code snippet in CRNN network here and CTC loss implementation here.

@SibtainRazaJamali
Copy link
Author

Thanks for your quick response.
I have lots of confusions because we pass input to ctc loss as
Width x BatchSize x number of classes
Am i right?
My ouput size
99 x 16 x 97

If i decode this prediction
How many output character should be predicted?
My decoded characters are 16 always equal to batch size.
Why?
sequence length can be upto 97 but it always predicts 16 characters.
Am i doing something wrong?

@zhiqwang
Copy link
Owner

zhiqwang commented May 3, 2019

Do you mean sequence length can be up to 99?

CTC loss introduce a blank token ϵ to get around not knowing the alignment between the input and the output. When you infer an image's content, you should collapse repeats and remove ϵ tokens. So the decoded output size is not fixed. It depends on your input image and your trained networks. You can refer Awni's article here.

If your datasets is small, the mean and std of the datasets is significant, and the --arch parameters also depends on your image datasets (choose densenet_cifar or densenet121), you can also combine a RNN part refer to issue #6 .

@ronghui19
Copy link

ronghui19 commented May 16, 2019

Is there any particular reason to use softmax? i did not see original paper mention it

Never mind. i got it. It served as input for CTClOSS

@zhiqwang
Copy link
Owner

zhiqwang commented May 16, 2019

@ronghui19 Yes, I didn't know what is the result when remove CTC loss from the crnn network, which I mean only combine the log softmax in the procedure of back propagation. I'm testing this.

@ronghui19
Copy link

@zhiqwang In file crnn.py, there is an init_network function, as far as i am concerned, you may forget to set pretrained network frozen. There should be something like this:
for param in model.parameters(): param.requires_grad = False
I could be wrong.

@zhiqwang
Copy link
Owner

@ronghui19 Current, there is not any pre-trained model to be used in the CNN backbone. So I did not set the learning rate of this part.

@zhiqwang zhiqwang added the question Further information is requested label Jun 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants