-
Notifications
You must be signed in to change notification settings - Fork 66
Description
Hey, I've been encountering an issue were the model seems to start spamming descriptive tags. This only happens after a few seconds of silence and after some things have already been said. But once it starts going, it doesn't stop and is no longer responsive to audio input. For the whole last part (starting at about 62360), I was talking again. But that did not seem to be picked up anymore.
This happens on the most recent version of the main branch (780e2be, with the supposed fix for incorrectly clearing the audio segment buffer).
I tried messing around with different models and languages, but wasn't able to find a configuration that gets rid of this. I've also tried the same model with the faster-whisper engine in WhisperLiveKit and the issue didn't occur there.
python simulstreaming_whisper_server.py --language en --model_path ./small.pt --task transcribe --host l
ocalhost --port 43001 --log-level WARNING --vad
WARNING Whisper is not warmed up. The first chunk processing may take longer.
1260 1720 And that's it for the
2260 2720 test.
6660 5580 [typing]
8000 8000 [
9340 9340 ty
10340 10340 ping]
11340 11680 So let
12340 12980 's see if it's working at all
13740 13980 .
14740 14980 Oh.
16540 16840 [typing]
22240 21220 [typing]
23660 24080 [typing]
24660 25080 [typing]
28495 31675 [typing]
32075 32695 [typing]
38765 41925 Okay, now it seems to be working.
43535 43995 [typing]
44625 45405 [typing]
47365 48065 At least let's
48365 49065 not running away
49445 50065 with
50445 51495 audio tags
52005 52495 .
53005 53495 [typing
54005 54495 ]
56745 55765 [typing]
58035 57395 [typing]
59265 58395 [typing]
60360 59620 [typing]
61360 61740 [typing]
62360 62740 [typing]
64610 64910 [typing] [typing] [typing] [typing] [typing] [typing]
67520 66540 [ty
68520 70585 ping]
72885 74125 [ty
77180 82560 ping] [
86410 88650 typing] [typing] [typing] [typing] [typing] [typing] [typing] [ty
88650 92900 ping] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [typing] [ty
93910 93900 ping] [typing] [typing] [typing] [typing] [
95640 95640 typing] [typing] [typing] [typing] [typing] [ty
96640 97350 ping] [typing] [typing] [typing] [typing] [
100160 100180 typing] [typing] [typing] [typing] [typing]
101160 101180 [typing] [typing] [typing] [typing] [typing
102160 102180 ] [typing]
103830 103180 [typing] [typing] [typing] [typing] [typing
104830 104325 ] [typing] [typing] [typing] [typing] [ty
106975 105325 ping] [typing] [typing] [typing] [typing] [ty
107975 106325 ping] [typing] [typing] [typing] [typing] [ty
108975 107325 ping] [typing] [typing] [typing] [typing] [ty
Btw, love the project :) The perfomance increase is amazing and I would love to get this working.