Realtime transcription (#11)

* a lil cleanup * checkin some realtime stuff that works * realtime transcription is mostly working * straighten up a bit * realtime, but inefficient transcription (recalculates entire audio) * clean up concurrency considerations around transcription * publish --------- Co-authored-by: mobius <mobius@mobius.mobius>
freckletonj · Jul 14, 2023 · ef6e5e5 · ef6e5e5
1 parent f4ba82a
commit ef6e5e5
Show file tree

Hide file tree

Showing 13 changed files with 157 additions and 213 deletions.
diff --git a/.gitignore b/.gitignore
@@ -12,6 +12,7 @@ uniteai.egg-info/
 test.md
 test.txt
 *.log
+debug_transcription.wav
 
 # VSCode
 .vscode/

diff --git a/Makefile b/Makefile
@@ -9,7 +9,7 @@ watch-tests:
 			pytest --capture=no; \
 		done
 
-upload:
+publish_pypi:
 	rm -r dist
 	python -m build
 	python -m twine upload dist/*
diff --git a/clients/vscode/package.json b/clients/vscode/package.json
@@ -3,7 +3,7 @@
 	"description": "Use AI in your Editor.",
 	"author": "uniteai",
 	"license": "Apache-2.0",
-	"version": "0.1.11",
+	"version": "0.1.12",
     "icon": "icon.jpeg",
 	"repository": {
 		"type": "git",

diff --git a/clients/vscode/uniteai-0.1.11.vsix b/clients/vscode/uniteai-0.1.11.vsix
diff --git a/clients/vscode/uniteai-0.1.12.vsix b/clients/vscode/uniteai-0.1.12.vsix
diff --git a/pyproject.toml b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 
 [project]
 name = "uniteai"
-version = "0.1.9"
+version = "0.1.10"
 description = "AI, Inside your Editor."
 readme = "README.md"
 license = "Apache-2.0"

diff --git a/todo/021_efficient_realtime_transcription.md b/todo/021_efficient_realtime_transcription.md
@@ -0,0 +1,12 @@
+# 021: Efficient Realtime Transcription
+
+As of recent commits, during a transcription window, the entire audio is saved in memory, and the whole thing is repeatedly transcribed. Inefficient.
+
+
+## Options:
+
+* Freeze transcription of earlier portions, and only re-recognize the latest portions. Perhaps a sliding window would work, but then the window must overlay with previous windows so that, eg, words aren't cut in half, and there will be some effort needed to properly align the transcribed text with the audio. This seems like a huge ergonomic improvement, but perhaps technically tough.
+
+* Check the rms energy level of audio chunks to find the start/stop of phrases, and cut out silence
+
+* Cut out noise? Or perhaps `whisper` was trained on enough noisy data that it already deals well with it, and this would be a significant inefficiency.
diff --git a/todo/CANCELLED 009_add_emacs_marker_for_transcription.md b/todo/CANCELLED 009_add_emacs_marker_for_transcription.md
diff --git a/todo/CANCELLED 010_add_emacs_marker_for_llm.md b/todo/CANCELLED 010_add_emacs_marker_for_llm.md
diff --git a/todo/019_realtime_transcription.md → todo/DONE_019_realtime_transcription.md b/todo/019_realtime_transcription.md → todo/DONE_019_realtime_transcription.md
@@ -3,3 +3,10 @@
 * Is there a library?
 
 * If not, what if we fired off multiple threads to listen at different time-scales, and combine the results? For instance a short timeout could catch every 1 second of audio, and optimistically transcribe that, but then when the longterm timescale listening thread returns, a transcription will likely yield a better result, so we can override previous misses. These audio chunks can be thrown in the same queue, tagged, and we can drain short-timescale chunks off the queue if there's a more recent long-timescale chunk.
+
+
+RESULT:
+
+I've opted for recording the entire audio stream, and not doing processing before `recognize`.
+
+There are definite efficiency gains to still be had, so I'll make a new ticket, but this works well enough for short transcription runs for now.
diff --git a/todo/020_fix_vscode_client.md → todo/DONE_020_fix_vscode_client.md b/todo/020_fix_vscode_client.md → todo/DONE_020_fix_vscode_client.md
diff --git a/uniteai/common.py b/uniteai/common.py
@@ -43,27 +43,6 @@ def mk_logger(name, level):
     return logger
 
 
-
-##################################################
-
-class ThreadSafeCounter:
-    '''
-    A threadsafe incrementable integer.
-    '''
-
-    def __init__(self):
-        self.value = 0
-        self._lock = Lock()
-
-    def increment(self):
-        with self._lock:
-            self.value += 1
-            return self.value
-
-    def get(self):
-        return self.value
-
-
 ##################################################
 # Dict helpers