Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MM-56540 - Live-captions support #18

Merged
merged 58 commits into from
Mar 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
3d76133
initial live-captions support; wip
cpoile Jan 25, 2024
9ea5bbf
upgrade silero-vad-go
cpoile Jan 26, 2024
c668947
linting
cpoile Jan 26, 2024
82a5f37
tests
cpoile Jan 26, 2024
72a2fa1
update rtcd
cpoile Jan 30, 2024
dd28a95
add config options for LiveCaptions, esp. LiveCaptionsOn; tests
cpoile Feb 2, 2024
1d59d9f
fix test
cpoile Feb 2, 2024
102936d
fix tests
cpoile Feb 2, 2024
c3d911a
close captionDoneCh with other doneCh closers
cpoile Feb 2, 2024
8d8ba6a
make transcription loop more async
cpoile Feb 5, 2024
985c9f8
remove unused field
cpoile Feb 5, 2024
783d5e8
send NewAudioLenMs (a measure of load) in the caption ws event
cpoile Feb 6, 2024
17ed4f5
be explicit that NewAudioLenMs is converted to float64 by marshaling
cpoile Feb 6, 2024
b866ba9
improvements: don't cut off old voice before it's transcribed; better…
cpoile Feb 7, 2024
08da042
param tuning; better defaults: 2 transcribers x 2 threads
cpoile Feb 7, 2024
3f96d0a
fix tests
cpoile Feb 7, 2024
5db394f
return blank transcription if transcription error
cpoile Feb 8, 2024
7dd28f2
tweak debug statement
cpoile Feb 8, 2024
df3cd10
add pressure valve to prevent death spiral on overloaded machines
cpoile Feb 9, 2024
00ea909
tweak pressure valve
cpoile Feb 9, 2024
9f697d1
move structs to calls public; send release valve metrics over
cpoile Feb 9, 2024
de1da40
implement backpressure for the transcriber pool
cpoile Feb 9, 2024
8fad4ac
add backoff for transcriberQueueCh
cpoile Feb 9, 2024
60c5f76
remove unnecessary else
cpoile Feb 9, 2024
991fdf5
fix buffer size calculation
cpoile Feb 9, 2024
2853200
report tickrate metric
cpoile Feb 9, 2024
dbe4c81
report initial tickrate on transcriber start
cpoile Feb 9, 2024
4ea0773
report initial tickrate on transcriber start
cpoile Feb 9, 2024
c043604
revert tickrate metric (not useful); update calls dependency
cpoile Feb 11, 2024
b7aa1f3
Merge remote-tracking branch 'origin/MM-56540-live-captions' into MM-…
cpoile Feb 11, 2024
1c72ac4
Revert tickrate metric; wasn't useful
cpoile Feb 11, 2024
72fbf9e
update calls dependency; rename metric ws events
cpoile Feb 11, 2024
f2aba5f
lower LiveCaptionsNumTranscribersDefault to 1
cpoile Feb 14, 2024
4d86c89
fix useless min(LiveCaptionsNumTranscribersDefault, runtime.NumCPU()/2)
cpoile Feb 27, 2024
022df36
add NumThreadsDefault
cpoile Feb 27, 2024
564f3be
single segment, language = en
cpoile Feb 28, 2024
a4f9284
Revert "single segment, language = en"
cpoile Feb 28, 2024
0d96e6b
language = en
cpoile Feb 28, 2024
c9d24fb
comment out debug statements for now
cpoile Feb 28, 2024
5581852
single segment
cpoile Feb 28, 2024
375be70
remove backoff
cpoile Feb 28, 2024
9cef00f
simplify mutexes and multiple windowing
cpoile Mar 8, 2024
520dd1f
more cleanup
cpoile Mar 13, 2024
5e934a6
Merge branch 'master' into MM-56540-live-captions
cpoile Mar 13, 2024
481bc99
recState -> jobState
cpoile Mar 13, 2024
ae229f9
update rtcd for moving type to JobStateClient
cpoile Mar 14, 2024
8d199b1
complicated algorithm but with clearer code; using normal vad
cpoile Mar 19, 2024
7f63e74
improve quality of transcription with background noise
cpoile Mar 19, 2024
d855144
lint
cpoile Mar 19, 2024
bd6e7dd
tweaking vad and minSpeechLength settings; PR comments
cpoile Mar 22, 2024
e57bae6
close off last segment
cpoile Mar 22, 2024
1c37d35
fix useless for select
cpoile Mar 22, 2024
e206786
add LiveCaptionsLanguage
cpoile Mar 22, 2024
04ad5b5
add LiveCaptionsLanguage debug statement
cpoile Mar 25, 2024
302d2e4
PR comments
cpoile Mar 27, 2024
efce4ad
no need for loop label
cpoile Mar 27, 2024
1d0ce6a
upgrade tagged dependencies
cpoile Mar 27, 2024
b19051a
SendWs -> SendWS; go mod tidy
cpoile Mar 27, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 18 additions & 2 deletions cmd/transcriber/apis/whisper.cpp/context.go
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
package whisper

// #cgo LDFLAGS: -l:libwhisper.a -lm -lstdc++
// #cgo linux LDFLAGS: -l:libwhisper.a -lm -lstdc++
// #cgo darwin LDFLAGS: -lwhisper -lstdc++ -framework Accelerate
// #include <whisper.h>
// #include <stdlib.h>
import "C"
Expand All @@ -22,6 +23,15 @@ type Config struct {
NumThreads int
// Whether or not past transcription should be used as prompt.
NoContext bool
// 512 = a bit more than 10s. Use multiples of 64. Results in a speedup of 3x at 512, b/c whisper was tuned for 30s chunks. See: https://github.com/ggerganov/whisper.cpp/pull/141
// TODO: tests, validation
AudioContext int
// Whether or not to print progress to stdout (default false).
PrintProgress bool
// Language to use (defaults to autodetection).
Language string
// Whether or not to generate a single segment (default false).
SingleSegment bool
}

func (c Config) IsValid() error {
Expand Down Expand Up @@ -72,8 +82,14 @@ func NewContext(cfg Config) (*Context, error) {

c.params = C.whisper_full_default_params(C.WHISPER_SAMPLING_GREEDY)
c.params.no_context = C.bool(c.cfg.NoContext)
c.params.audio_ctx = C.int(c.cfg.AudioContext)
c.params.n_threads = C.int(c.cfg.NumThreads)
c.params.language = C.CString("auto")
if c.cfg.Language == "" {
c.cfg.Language = "auto"
}
c.params.language = C.CString(c.cfg.Language)
c.params.single_segment = C.bool(c.cfg.SingleSegment)
c.params.print_progress = C.bool(c.cfg.PrintProgress)

return &c, nil
}
Expand Down
2 changes: 1 addition & 1 deletion cmd/transcriber/call/job.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ func (t *Transcriber) postJobStatus(status public.JobStatus) error {
defer cancelCtx()
resp, err := t.apiClient.DoAPIRequestBytes(ctx, http.MethodPost, apiURL, payload, "")
if err != nil {
return fmt.Errorf("request failed%w", err)
return fmt.Errorf("request failed: %w", err)
}
defer resp.Body.Close()
cancelCtx()
Expand Down
Loading
Loading