Skip to content

Transcription starts spamming descriptive tags and is no longer responsive to audio input #8

@ghost

Description

Hey, I've been encountering an issue were the model seems to start spamming descriptive tags. This only happens after a few seconds of silence and after some things have already been said. But once it starts going, it doesn't stop and is no longer responsive to audio input. For the whole last part (starting at about 62360), I was talking again. But that did not seem to be picked up anymore.

This happens on the most recent version of the main branch (780e2be, with the supposed fix for incorrectly clearing the audio segment buffer).

I tried messing around with different models and languages, but wasn't able to find a configuration that gets rid of this. I've also tried the same model with the faster-whisper engine in WhisperLiveKit and the issue didn't occur there.

python simulstreaming_whisper_server.py --language en --model_path ./small.pt --task transcribe --host l
ocalhost --port 43001 --log-level WARNING --vad
WARNING Whisper is not warmed up. The first chunk processing may take longer.
1260 1720  And that's it for the
2260 2720  test.
6660 5580  [typing]
8000 8000  [
9340 9340 ty
10340 10340 ping]
11340 11680  So let
12340 12980 's see if it's working at all
13740 13980 .
14740 14980  Oh.
16540 16840  [typing]
22240 21220  [typing]
23660 24080  [typing]
24660 25080  [typing]
28495 31675  [typing]
32075 32695  [typing]
38765 41925  Okay, now it seems to be working.
43535 43995  [typing]
44625 45405  [typing]
47365 48065  At least let's
48365 49065  not running away
49445 50065  with
50445 51495  audio tags
52005 52495 .
53005 53495  [typing
54005 54495 ]
56745 55765  [typing]
58035 57395  [typing]
59265 58395  [typing]
60360 59620  [typing]
61360 61740  [typing]
62360 62740  [typing]
64610 64910  [typing] [typing] [typing] [typing] [typing] [typing]
67520 66540  [ty
68520 70585 ping]
72885 74125  [ty
77180 82560 ping] [
86410 88650 typing] [typing] [typing] [typing] [typing] [typing] [typing] [ty
88650 92900 ping] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [ty
93910 93900 ping] [typing] [typing] [typing] [typing] [
95640 95640 typing] [typing] [typing] [typing] [typing] [ty
96640 97350 ping] [typing] [typing] [typing] [typing] [
100160 100180 typing] [typing] [typing] [typing] [typing]
101160 101180  [typing] [typing] [typing] [typing] [typing
102160 102180 ] [typing]
103830 103180  [typing] [typing] [typing] [typing] [typing
104830 104325 ] [typing] [typing] [typing] [typing] [ty
106975 105325 ping] [typing] [typing] [typing] [typing] [ty
107975 106325 ping] [typing] [typing] [typing] [typing] [ty
108975 107325 ping] [typing] [typing] [typing] [typing] [ty

Btw, love the project :) The perfomance increase is amazing and I would love to get this working.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions