fix the CTC zipformer2 training #1713

KarelVesely84 · 2024-08-12T08:55:33Z

too many supervision tokens
change filtering rule to if (T - 2) < len(tokens): return False
this prevents inf. from appearing in the CTC loss value (empirically tested)

- too many supervision tokens - change filtering rule to `if (T - 2) < len(tokens): return False` - this prevents inf. from appearing in the CTC loss value

KarelVesely84 · 2024-08-12T12:14:03Z

workflow with error: https://github.com/k2-fsa/icefall/actions/runs/10348851808/job/28642009312?pr=1713

fatal: unable to access 'https://huggingface.co/Zengwei/icefall-asr-librispeech-zipformer-2023-05-15/': Recv failure: Connection reset by peer

but the file location https://huggingface.co/Zengwei/icefall-asr-librispeech-zipformer-2023-05-15/ exists...

maybe to many tests at the same time ? (overloaded HuggingFace ?)

KarelVesely84 · 2024-08-14T12:00:25Z

Hi @csukuangfj ,
how about this one ? Is @yaozengwei testing it currently ?

It is solving the issue #1352

My theory is that CTC uses 2 extra symbols at beginning/end of label sequence.
So, the label-length limit needs to be lowered by 2 symbols to accomodate that.

Best regards
Karel

csukuangfj · 2024-08-19T02:13:29Z

Sorry for the late reply.

Could you analyze the wave that causes inf loss?
Is it too short?

Does it contain only a single word or does it contain nothing at all?

KarelVesely84 · 2024-08-26T10:01:42Z

Hi,
the problematic utterance contained many words:
(num_embeddings, supervision_length, difference a-b) = (34, 33, 1)

text:
['▁O', 'f', '▁all', '▁.', '▁P', 'ar', 'li', 'a', 'ment', '▁,', '▁Co', 'un', 'c', 'il', '▁and', '▁Co', 'm', 'm', 'i', 's', 's', 'ion', '▁are', '▁work', 'ing', '▁to', 'ge', 'ther', '▁to', '▁de', 'li', 'ver', '▁.']

It seems like a better set of BPEs could reduce the number of supervision tokens.
Nevertheless, this would only hide the ``inf.'' problem for CTC.

I believe the two extra tokens for the CTC loss are the <bos/eos>
that get (pre-,ap-)pended to the supervision sequence,
hence the (T - 2).

Best regards
Karel

csukuangfj · 2024-08-26T10:29:38Z

the problematic utterance contained many words:

Thanks for sharing! Could you also post the duration of the corresponding wave file?

KarelVesely84 · 2024-08-30T11:16:04Z

This is the corresponding Cut:

MonoCut(id='20180612-0900-PLENARY-3-59', start=557.34, duration=1.44, channel=0, supervisions=[SupervisionSegment(id='20180612-0900-PLENARY-3-59', recording_id='20180612-0900-PLENARY-3', start=0.0, duration=1.44, channel=0, text='Of all . Parliament , Council and Commission are working together to deliver .', language='en', speaker='None', gender='male', custom={'orig_text': 'of all. Parliament, Council and Commission are working together to deliver.'}, alignment=None)], features=Features(type='kaldi-fbank', num_frames=144, num_features=80, frame_shift=0.01, sampling_rate=16000, start=557.34, duration=1.44, storage_type='lilcom_chunky', storage_path='data/fbank/voxpopuli-asr-en-train_feats/feats-59.lca', storage_key='395124474,12987', recording_id='None', channels=0), recording=Recording(id='20180612-0900-PLENARY-3', sources=[AudioSource(type='file', channels=[0], source='/mnt/matylda6/szoke/EU-ASR/DATA/voxpopuli/raw_audios/en/2018/20180612-0900-PLENARY-3_en.ogg')], sampling_rate=16000, num_samples=139896326, duration=8743.520375, channel_ids=[0], transforms=None), custom={'dataloading_info': {'rank': 3, 'world_size': 4, 'worker_id': None}})

It is a 1.44 sec long cut inside a very long recording (2.42 hrs).
And the 1.44 sec is very little to pronounce all the words in the reference text :
"Of all . Parliament , Council and Commission are working together to deliver ."

Definitely a data issue.
And if the Cut is filtered out, and consequently the CTC stops breaking, it sholud be seen as a good thing...

K.

csukuangfj · 2024-08-30T12:18:29Z

yes, I think it should be good to filter out such kind of data.

KarelVesely84 · 2024-09-17T08:31:11Z

Hello, is there something needed for this to merge from my side ?
K.

csukuangfj · 2024-09-17T13:07:43Z

The root cause is due to bad data. Would it be more appropriate to fix it when preparing the data?

The -2 thing is not a constraint for computing the CTC or the transducer loss.

KarelVesely84 · 2024-09-24T15:11:44Z

Well, without that (T - 2) change i was getting inf. value from the CTC loss.
There sholud be no inf. even if the data are prepared badly.

I also did not find any trace of the extra CTC symbols or or similar in the scripts.
The torch.nn.functional.ctc_loss(.) is getting the same set of symbols as transducer loss.

Could you try to reproduce the issue by adding a training example with a very lenghty transcript ?
(or I can create a branch to demonstrate it, say repeating the librispeech transcript 100x, just to make the error appear)

Best regards,
Karel

fix the CTC zipformer2 training

d400bc5

- too many supervision tokens - change filtering rule to `if (T - 2) < len(tokens): return False` - this prevents inf. from appearing in the CTC loss value

csukuangfj requested a review from yaozengwei August 12, 2024 08:56

KarelVesely84 mentioned this pull request Aug 12, 2024

Zipformer2 with CTC is hard to train #1352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix the CTC zipformer2 training #1713

fix the CTC zipformer2 training #1713

KarelVesely84 commented Aug 12, 2024

KarelVesely84 commented Aug 12, 2024 •

edited

Loading

KarelVesely84 commented Aug 14, 2024

csukuangfj commented Aug 19, 2024

KarelVesely84 commented Aug 26, 2024

csukuangfj commented Aug 26, 2024

KarelVesely84 commented Aug 30, 2024 •

edited

Loading

csukuangfj commented Aug 30, 2024

KarelVesely84 commented Sep 17, 2024

csukuangfj commented Sep 17, 2024 •

edited

Loading

KarelVesely84 commented Sep 24, 2024

fix the CTC zipformer2 training #1713

Are you sure you want to change the base?

fix the CTC zipformer2 training #1713

Conversation

KarelVesely84 commented Aug 12, 2024

KarelVesely84 commented Aug 12, 2024 • edited Loading

KarelVesely84 commented Aug 14, 2024

csukuangfj commented Aug 19, 2024

KarelVesely84 commented Aug 26, 2024

csukuangfj commented Aug 26, 2024

KarelVesely84 commented Aug 30, 2024 • edited Loading

csukuangfj commented Aug 30, 2024

KarelVesely84 commented Sep 17, 2024

csukuangfj commented Sep 17, 2024 • edited Loading

KarelVesely84 commented Sep 24, 2024

KarelVesely84 commented Aug 12, 2024 •

edited

Loading

KarelVesely84 commented Aug 30, 2024 •

edited

Loading

csukuangfj commented Sep 17, 2024 •

edited

Loading