Skip to content

Commit

Permalink
Realtime transcription (#11)
Browse files Browse the repository at this point in the history
* a lil cleanup

* checkin some realtime stuff that works

* realtime transcription is mostly working

* straighten up a bit

* realtime, but inefficient transcription (recalculates entire audio)

* clean up concurrency considerations around transcription

* publish

---------

Co-authored-by: mobius <mobius@mobius.mobius>
  • Loading branch information
freckletonj and mobius authored Jul 14, 2023
1 parent f4ba82a commit ef6e5e5
Show file tree
Hide file tree
Showing 13 changed files with 157 additions and 213 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ uniteai.egg-info/
test.md
test.txt
*.log
debug_transcription.wav

# VSCode
.vscode/
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ watch-tests:
pytest --capture=no; \
done

upload:
publish_pypi:
rm -r dist
python -m build
python -m twine upload dist/*
2 changes: 1 addition & 1 deletion clients/vscode/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
"description": "Use AI in your Editor.",
"author": "uniteai",
"license": "Apache-2.0",
"version": "0.1.11",
"version": "0.1.12",
"icon": "icon.jpeg",
"repository": {
"type": "git",
Expand Down
Binary file removed clients/vscode/uniteai-0.1.11.vsix
Binary file not shown.
Binary file added clients/vscode/uniteai-0.1.12.vsix
Binary file not shown.
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "hatchling.build"

[project]
name = "uniteai"
version = "0.1.9"
version = "0.1.10"
description = "AI, Inside your Editor."
readme = "README.md"
license = "Apache-2.0"
Expand Down
12 changes: 12 additions & 0 deletions todo/021_efficient_realtime_transcription.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# 021: Efficient Realtime Transcription

As of recent commits, during a transcription window, the entire audio is saved in memory, and the whole thing is repeatedly transcribed. Inefficient.


## Options:

* Freeze transcription of earlier portions, and only re-recognize the latest portions. Perhaps a sliding window would work, but then the window must overlay with previous windows so that, eg, words aren't cut in half, and there will be some effort needed to properly align the transcribed text with the audio. This seems like a huge ergonomic improvement, but perhaps technically tough.

* Check the rms energy level of audio chunks to find the start/stop of phrases, and cut out silence

* Cut out noise? Or perhaps `whisper` was trained on enough noisy data that it already deals well with it, and this would be a significant inefficiency.
71 changes: 0 additions & 71 deletions todo/CANCELLED 009_add_emacs_marker_for_transcription.md

This file was deleted.

7 changes: 0 additions & 7 deletions todo/CANCELLED 010_add_emacs_marker_for_llm.md

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,10 @@
* Is there a library?

* If not, what if we fired off multiple threads to listen at different time-scales, and combine the results? For instance a short timeout could catch every 1 second of audio, and optimistically transcribe that, but then when the longterm timescale listening thread returns, a transcription will likely yield a better result, so we can override previous misses. These audio chunks can be thrown in the same queue, tagged, and we can drain short-timescale chunks off the queue if there's a more recent long-timescale chunk.


RESULT:

I've opted for recording the entire audio stream, and not doing processing before `recognize`.

There are definite efficiency gains to still be had, so I'll make a new ticket, but this works well enough for short transcription runs for now.
File renamed without changes.
21 changes: 0 additions & 21 deletions uniteai/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,27 +43,6 @@ def mk_logger(name, level):
return logger



##################################################

class ThreadSafeCounter:
'''
A threadsafe incrementable integer.
'''

def __init__(self):
self.value = 0
self._lock = Lock()

def increment(self):
with self._lock:
self.value += 1
return self.value

def get(self):
return self.value


##################################################
# Dict helpers

Expand Down
Loading

0 comments on commit ef6e5e5

Please sign in to comment.