-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ctc_loss gets inf values and Unknow chars #25
Comments
#4 (in short - gradients are ok, loss can get to inf)
testing - yes, you should add missing chars to codec. |
On 29/03/2019 07:30, RuijieJ wrote:
Thanks a lot! But another problem occurs when I train the model, which
leads to bad model performance. Following is my guessing of the reason:
In "data_gen.py", the original image is inputted into the function
"cut_image()", and the cropped image has a size of
[args.input_size]/2, e.g., 512/512. Meanwhile, the text polys are also
modified to match new coordinates of the cropped image. After this
modification, many text polys' coordinates will either be less than 0
or larger than 512.
Next, in the function "generate_rbox()", each text poly is checked. If
a text poly's coordinates are less than 0 or larger than 512, the
score map and the training masks of this region will be set to 0, that
is to say, we think there are no texts inside this region.
This is an exquisite design. However, in many situations, part of the
text region is in the cropped image. An example is shown in the
following. Obviously, legal text polys should exist, and the score map
should be 1 instead of 0 in these regions. However, when I generate
the training data using "data_gen.py", it seems not to generate the
correct ground truth and labels.
An example:
example
<https://user-images.githubusercontent.com/31977268/55213630-db232e80-522e-11e9-8957-391f98ebec50.jpg>
score_map: all zeros
training_mask: all zeros
gt_out: empty
labels_out: empty
I'm not sure if I correctly understand the codes. I will be very
grateful for any reply.
described situation can slow down training process (note that the
training_mask is zero and there is nothing to learn and there is small
piece of controlling negative(non text) ratio
https://github.com/MichalBusta/E2E-MLT/blob/8dd13e4342a47dd5242b77065f428199cfbb81f6/data_gen.py#L613)
Lazy work around - buy better graphic card (all segmentation models
will benefit from feeding larger images)
Proper: scale image according to text size?
… —
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#25 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AD6jsC8hDkF9KvoPqUggmM2XRtBBuzg-ks5vbbMggaJpZM4cPX37>.
|
Thanks. I finally found my problem: the original configuration is designed for the vertical text poly (height > width) except for the ICDAR-2015 dataset. After I modified this line in "data_gen.py": Thanks a lot to the author's timely reply. Though, I have one last question: The loss value can easily be negative, is this situation normal? Would the negative loss modify the network's parameters in a harmful way? |
On 29/03/2019 13:15, RuijieJ wrote:
Thanks. I finally found my problem: the original configuration is
designed for the vertical text poly (height > width) except for the
ICDAR-2015 dataset. After I modified this line in "data_gen.py":
if is_icdar:
np.roll(pts, 2)
and roll the coordinates for all polys, I succeeded in training the model.
Thanks a lot to the author's timely reply. Though, I have one last
question:
The loss value can easily be negative, is this situation normal? Would
the negative loss modify the network's parameters in a harmful way?
Dice loss is negative, but bounded (-1 for 1 scale) = it is ok.
…
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#25 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AD6jsLQHXULBYTb5hEA1e4EJwZTVa_qZks5vbgPygaJpZM4cPX37>.
|
Hi @RuijieJ, how did you modify it? Can you share your code?? |
@kei6 Of course. In data_gen.py, line 122:
Because the texts in the sample image "sample_train_data/MLT/done/img_5407.jpg" are vertical, so the original code doesn't roll its points. However, if most of your samples only contain horizontal texts, you just need to remove the if term:
|
@RuijieJ Allow me to ask you a question. What is the function of "-ocr_feed_list" in the train.py? And where can I get the cropped image? Thanks |
Hi @RuijieJ #if is_icdar: pts = #### commented condition ### |
When I use the pre-trained model you provide to continue training on the MLT-2019 dataset, the ctc_loss gets inf values at most steps. Is there something wrong with it?
Also, there are some characters that seem not included in the directory. Will this infect the training performance?
The screen prints are look like:
epoch 12[205000], loss: 8.969, bbox_loss: 5.751, seg_loss: -0.604, ang_loss: 3.349, ctc_loss: 9.002, rec: 0.00000 in -0.000
Unknown char: 铣
Unknown char: 綦
epoch 12[205100], loss: 5.823, bbox_loss: 5.086, seg_loss: -0.910, ang_loss: 2.095, ctc_loss: inf, rec: 0.02500 in 0.000
epoch 12[205200], loss: 2.607, bbox_loss: 3.810, seg_loss: -1.047, ang_loss: 0.875, ctc_loss: inf, rec: 0.02069 in -0.000
Unknown char: 铣
epoch 12[205300], loss: 2.268, bbox_loss: 3.619, seg_loss: -0.995, ang_loss: 0.727, ctc_loss: inf, rec: 0.03540 in -0.003
Unknown char: 桷
Unknown char: 灏
Unknown char: 綦
epoch 12[205400], loss: 1.631, bbox_loss: 3.175, seg_loss: -1.080, ang_loss: 0.562, ctc_loss: inf, rec: 0.00775 in -0.000
epoch 12[205500], loss: 1.843, bbox_loss: 3.320, seg_loss: -1.135, ang_loss: 0.659, ctc_loss: inf, rec: 0.02113 in 0.000
Unknown char: 捌
epoch 12[205600], loss: 1.517, bbox_loss: 2.975, seg_loss: -1.124, ang_loss: 0.577, ctc_loss: 7.738, rec: 0.05634 in 0.000
epoch 12[205700], loss: 1.242, bbox_loss: 2.867, seg_loss: -1.183, ang_loss: 0.496, ctc_loss: inf, rec: 0.01439 in 0.000
Unknown char: 綦
epoch 12[205800], loss: 1.263, bbox_loss: 2.826, seg_loss: -1.203, ang_loss: 0.527, ctc_loss: inf, rec: 0.02899 in 0.001
Unknown char: 灏
epoch 12[205900], loss: 1.187, bbox_loss: 2.795, seg_loss: -1.212, ang_loss: 0.501, ctc_loss: inf, rec: 0.05825 in 0.000
epoch 12[206000], loss: 1.216, bbox_loss: 2.815, seg_loss: -1.169, ang_loss: 0.489, ctc_loss: inf, rec: 0.03731 in -0.001
Unknown char: 螃
Unknown char: 捌
Unknown char: 閩
epoch 12[206100], loss: 1.194, bbox_loss: 2.801, seg_loss: -1.191, ang_loss: 0.492, ctc_loss: inf, rec: 0.04032 in -0.006
epoch 12[206200], loss: 0.760, bbox_loss: 2.559, seg_loss: -1.231, ang_loss: 0.356, ctc_loss: inf, rec: 0.04487 in 0.024
The text was updated successfully, but these errors were encountered: