implement ctc loss function #1049

wcshds · 2023-12-04T16:14:27Z

I need ctc loss function in CRNN model. I tried to implement it based on PyTorch implementation, but the results obtained after calling forward() are somewhat different from PyTorch's.

I don't know what went wrong, I'd appreciate it if someone could tell me.

Reference

Checklist

Confirmed that run-checks all script has been executed.

louisfd · 2023-12-04T17:29:32Z

Hi,
I can take a look at it later today

wcshds · 2023-12-05T05:00:28Z

I believe the result is now the same as PyTorch. But the performance of this implementation seems to be less than ideal.

wcshds · 2023-12-08T10:38:25Z

The implementation doesn't work on the NdArray backend because of #1053. It also doesn't work on the LibTorch backend because of #1055.

I believe the current performance bottleneck lies in creating the one-hot. This is because the repeat() method is very slow on the Wgpu backend.

burn/burn-core/src/nn/loss/ctc.rs

Lines 330 to 341 in 7121852

    
           fn one_hot<B: Backend>(tensor: Tensor<B, 2, Int>, num_classes: usize) -> Tensor<B, 3> { 
        
               let device = tensor.device(); 
        
               let shape = tensor.dims(); 
        
               let labels: Tensor<B, 3, Int> = tensor.unsqueeze_dim(2).repeat(2, num_classes); 
        
               let indices = Tensor::<B, 1, Int>::arange_device(0..num_classes, &device) 
        
                   .reshape([1, 1, num_classes]) 
        
                   .repeat(1, shape[1]) 
        
                   .repeat(0, shape[0]); 
        
               labels.equal(indices).float() 
        
           }

louisfd · 2023-12-12T18:48:18Z

Hi @wcshds
I haven't had time like I thought last week and then I was abroad for several days. I'm sorry I said I was gonna look at it last week, but I certainly haven't forgotten you! Glad to see you continued working on it since then. I will definitely take a look real soon

wcshds · 2023-12-12T19:33:51Z

@louisfd Thank you! The current implementation still significantly consumes graphics memory. I believe that separately calculating the alpha values for blanks and letters can significantly reduce the graphics memory usage, but I don't know how to implement it.

louisfd · 2023-12-14T15:10:47Z

@wcshds
I took your word that repeat was the bottleneck in wgpu. This made a lot of sense because we relied on the default implementation which launches as many slice_assign kernels as there are repetitions. For large times argument this is awful.
I wrote a repeat kernel so that only one kernel is launched instead of times: #1068

Please tell me if this is better now

wcshds · 2023-12-14T18:47:02Z

@louisfd Thank you! Now repeat() is much faster.

wcshds · 2023-12-14T18:48:31Z

I tried to use this implementation of ctc loss in the CRNN model, but after the first iteration loss became NaN. I don't know what went wrong. wcshds/crnn-cjk

antimora · 2023-12-27T13:50:35Z

Just noticed 1-e15 magic number. Please refactor to a constant and explain how this number is derived. It would also be preferable if float number precision independent (we use half and full precisions)

wcshds · 2023-12-27T14:31:58Z

@antimora I just need a small value to prevent log(0), so now I think it may not be necessary to use 1e-15; 1e-5 should be small enough.
However, I think CTC Loss may not be suitable for half precision because I previously attempted to use mixed precision training in PyTorch, but PyTorch's CTC Loss does not support fp16. [CTC Loss] CTC Loss not support float16? Perhaps I need to explore the use of half precision training in future practices to see if CTC Loss can work with it.

antimora · 2023-12-27T18:22:42Z

burn-core/src/nn/loss/ctc.rs

@@ -7,6 +7,8 @@ use burn_tensor::{backend::Backend, Element, ElementConversion, Int, Numeric, Te
 use super::Reduction;

 const NEG_INF: f32 = -1e5;
+// a small value used to prevent the occurrence of log(0)
+const DELTA: f32 = -1e-5;


Did you mean to have this number as negative? The literal number you has is positive. In an unlikely event, (l1 - m.clone()).exp() + (l2 - m.clone()).exp() expression could be equal to abs(DELTA) which would still lead to log(0) situation.

Additionally, I suggest we use [https://doc.rust-lang.org/std/primitive.f32.html#associatedconstant.EPSILON](f32's EPSILON) or [f16's EPSILON]https://docs.rs/tract-core/latest/tract_core/prelude/struct.f16.html#associatedconstant.EPSILON constants depending what on Backend's precision settings. @nathanielsimard or @louisfd can suggest on how we can extract this. -1e-5 seems a rather big number for f16 or f32. (probably it may not work for f16 because its epsilon is 4.88e-04. we need to double check it)

I'm sorry, it's a typo. DELTA should be positive.

1e-5 can ensure that the results of the loss are accurate to three decimal places, but 4.88e-4 is a bit large. Perhaps CTC Loss is indeed not suitable for the use of half-precision training.

antimora · 2024-03-26T22:59:31Z

Closing this ticket and linking to an issue ticket so someone else can pick up: #1536

wcshds added 2 commits December 5, 2023 00:00

implement ctc loss function

3365b50

remove vec! macro

f92b945

nathanielsimard requested a review from louisfd December 4, 2023 16:58

wcshds added 2 commits December 5, 2023 12:29

fix wrong indice when assign initial val to alpha

4ddfbb3

update test case

92d694d

wcshds added 5 commits December 6, 2023 13:20

remove batch loop

71fc4df

cache current_target_primes

0813605

make sure tensors are on the same device

dd9cf8b

use clamp_min instead of mask_where

94d52d2

remove squence loop

7121852

wcshds added 5 commits December 8, 2023 20:26

use into_scalar in assertions

eb5357f

reduce the size of log alphas

5c57283

wordaround for slice bug on libtorch backend tracel-ai#1055

7d047cd

reduce the size of one_hot

9599f32

make sure tensors are on the same device

25972ce

louisfd mentioned this pull request Dec 14, 2023

Feat/wgpu/repeat #1068

Merged

2 tasks

wcshds and others added 2 commits December 25, 2023 00:38

Merge branch 'tracel-ai:main' into ctc-loss

08f837f

adapt to burn's new device api

2758b1b

wcshds mentioned this pull request Dec 25, 2023

After using slice_assign, gradient descent cannot track the model parameters #1098

Closed

make sure the argument of the logarithm greater than 0

9315d98

refactor the small value used to prevent log(0) into a constant

08dd4eb

wcshds added 2 commits December 27, 2023 22:34

fix typo

4f009fc

remove unnecessary code

0b99922

antimora reviewed Dec 27, 2023

View reviewed changes

wcshds marked this pull request as draft January 12, 2024 10:31

antimora added the feature The feature request label Jan 31, 2024

antimora added the stale The issue or pr has been open for too long label Feb 24, 2024

antimora mentioned this pull request Mar 26, 2024

Implement ctc loss function #1536

Open

antimora closed this Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement ctc loss function #1049

implement ctc loss function #1049

wcshds commented Dec 4, 2023

louisfd commented Dec 4, 2023

wcshds commented Dec 5, 2023

wcshds commented Dec 8, 2023

louisfd commented Dec 12, 2023

wcshds commented Dec 12, 2023

louisfd commented Dec 14, 2023

wcshds commented Dec 14, 2023

wcshds commented Dec 14, 2023

antimora commented Dec 27, 2023

wcshds commented Dec 27, 2023

antimora Dec 27, 2023

wcshds Dec 28, 2023

antimora commented Mar 26, 2024

implement ctc loss function #1049

implement ctc loss function #1049

Conversation

wcshds commented Dec 4, 2023

Reference

Checklist

louisfd commented Dec 4, 2023

wcshds commented Dec 5, 2023

wcshds commented Dec 8, 2023

louisfd commented Dec 12, 2023

wcshds commented Dec 12, 2023

louisfd commented Dec 14, 2023

wcshds commented Dec 14, 2023

wcshds commented Dec 14, 2023

antimora commented Dec 27, 2023

wcshds commented Dec 27, 2023

antimora Dec 27, 2023

Choose a reason for hiding this comment

wcshds Dec 28, 2023

Choose a reason for hiding this comment

antimora commented Mar 26, 2024