You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ponzi is a single word.
Here is the same part of transcription using srt format.
3100:00:08,630 --> 00:00:09,070
greatest
3200:00:09,070 --> 00:00:09,230
Pon
3300:00:09,230 --> 00:00:09,340
zi
3400:00:09,340 --> 00:00:09,670
scheme
3500:00:09,670 --> 00:00:09,780
in
3600:00:09,780 --> 00:00:10,050
human
3700:00:10,050 --> 00:00:10,480
history
Every word/chunk except "zi" has a space before it and it's possible to glue it into correct sentences. Unfortunately csv format doesn't allow to do it.
@alex-bacart
The --max-len 1 means to output maximum 1 token per text segment.
The word " Ponzi" consists of 2 tokens: Pon and zi and therefore it is being split.
CSV format export trims leading spaces and it's an issue.
vtt
andsrt
formats don't do it.Command I use to transcribe audio file:
./main --model ./models/ggml-large.bin --file audio.wav --output-csv --max-len 1
Comment on this line https://github.com/ggerganov/whisper.cpp/pull/340/files#diff-2d3599a9fad195f2c3c60bd06691bc1815325b3560b5feda41a91fa71194e805R344 says every time we get a space we should remove it. It's not true in some cases when words are divided in chunks. An example of such a division:
Ponzi is a single word.
Here is the same part of transcription using
srt
format.Every word/chunk except "zi" has a space before it and it's possible to glue it into correct sentences. Unfortunately
csv
format doesn't allow to do it.An issue follows #340
cc @NielsMayer
The text was updated successfully, but these errors were encountered: