-
Notifications
You must be signed in to change notification settings - Fork 9
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* a lil cleanup * checkin some realtime stuff that works * realtime transcription is mostly working * straighten up a bit * realtime, but inefficient transcription (recalculates entire audio) * clean up concurrency considerations around transcription * publish --------- Co-authored-by: mobius <mobius@mobius.mobius>
- Loading branch information
1 parent
f4ba82a
commit ef6e5e5
Showing
13 changed files
with
157 additions
and
213 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,6 +12,7 @@ uniteai.egg-info/ | |
test.md | ||
test.txt | ||
*.log | ||
debug_transcription.wav | ||
|
||
# VSCode | ||
.vscode/ | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# 021: Efficient Realtime Transcription | ||
|
||
As of recent commits, during a transcription window, the entire audio is saved in memory, and the whole thing is repeatedly transcribed. Inefficient. | ||
|
||
|
||
## Options: | ||
|
||
* Freeze transcription of earlier portions, and only re-recognize the latest portions. Perhaps a sliding window would work, but then the window must overlay with previous windows so that, eg, words aren't cut in half, and there will be some effort needed to properly align the transcribed text with the audio. This seems like a huge ergonomic improvement, but perhaps technically tough. | ||
|
||
* Check the rms energy level of audio chunks to find the start/stop of phrases, and cut out silence | ||
|
||
* Cut out noise? Or perhaps `whisper` was trained on enough noisy data that it already deals well with it, and this would be a significant inefficiency. |
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.