Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to vim plugin and LSP server #1144

Merged
merged 36 commits into from
Aug 27, 2023
Merged

Conversation

AustinMroz
Copy link
Contributor

whisper.vim-demo.mp4

While I think it'll need about another week of polish to fix a number of outstanding bugs, work is far enough along to make a fancy demonstration video.

The implementation is divided between a vim plugin and a server that communicate over stdout/stdin by way of Language Server Protocol. Other methods like libcall proved nonviable and this option increases the likelihood that future programs or plugins are able to reuse the work.

The server is able to perform both unguided transcription and guided transcription restricted to a client provided list of commands. Timestamps can be provided in client requests so that the user isn't forced to wait for the previous command to finish processing before providing the next, and I plan to work on an improved VAD to reduce the need for pronounced gaps/minimize reprocessing of voice detection on a sliding window.

Most vim actions built of single keys are functional and action names are able to change based on context (i becomes inside when used as part of a motion), but I've still observed some inconsistencies with less frequent commands such as the replace multiple of 'R'

At present, this is likely only slightly better than feature parity with
the existing whisper.nvim

Known issues:
 Trailing whitespace
 Up to an existing length(5 seconds) of speech may be processed when
  listening is enabled
 CPU cycles are spent processing speech even when not listening.

Fixing these issues is likely dependent upon future efforts to create a
dedicated library instead of wrapping examples/stream
A minor misunderstanding of the whisper.nvim implementation resulted in
a plugin that was functional, but not a drop in replacement as it should
be now.
Libcall is nonviable because the library is immediately freed after a
call is made. Further investigation has shown Language Server Protocol as
a promising alternative that both simplifies the required logic on the
vimscript side and increases the ease with which plugins for other
editors could be made in the future. This is a very large undertaking
and my progress has slowed substantially.

Work is far from being in a usable state, but I wish to keep track of
major refactors for organizational purposes.
One of the defining goals of this venture is allowing consecutive
commands to be rattled off without the existing deadzones of the current
implementation.
The unguided transcription implantation heavily borrows from existing
example implementations and the guided_transcription logic.

A high level pass was done to check that method arguments are accurate
to what inputs are actually required.

A first attempt at cancellation support was added for record keeping,
but will be deleted in a future commit.
Resolves a large number of compilation errors.
No testing has been done yet for execution errors.

Update Makefile and .gitignore
Fix commandset_list being passed by value
Properly register the first token of a multitoken command
I've apparently made an awfully major mistake in thinking that unix time
was in milliseconds and will be changing all timekeeping code to use
standardized methods.

In preparation for this is a number of minor bugfixes.
Output is manually flushed.
An echo method has been added.
registerCommandset now wraps the returned index
Current progress blockers are
 Adding modality awareness to the command processing
  (specifically, motion prompting)
 Improving the VAD to be a little more responsive
  (testing start of activity)
Multiple bug fixes that, crucially, bring the plugin to the point where a
demonstration video is possible

Add better echo messaging so whisper_log isn't required
 Add loading complete message as indicator when listening has started
Insert/append are actually included in command sets
Some more heavy handed corrections to prevent a double exit when leaving
insert mode
As a somewhat hacky fix, the very first space is removed when inserting.
 This cleans up most use cases, but leaves me unsatisfied with the few
 cases it would be desired.
Also remove unnecessary ! to use builtin vim command
A minor scope mistake was causing upper'd inputs to be eaten.
This was fixed and echoing was slightly improved for clarity.
Corrects indentation to 4 spaces as project standard
Slightly better error support for malformed json input
The same library that is used for the llama.cpp server
add lsp to the make clean directive.
remove a redundant params definition.
reorder whisper.vim logging for subtranscriptions
Corrections to unlets (variables of argument scope appear immutable)
Indentation has been changed to 4 spaces.

Unit testing has been set up, I'm opting not to include it in the
repository for now.
It however, has revealed a bug in the state logic where a
subtranscription can be initiated without having a saved command
When this occurs, append is added as a fallback
While work on the improved vad will continue, It's grown to be a little
out of scope. Instead, a future commit will perform multiple detection
passes at substretches of audio when a backlog of audio exists.

To facilitate this, and prevent code duplication, the vad code has been
moved into a subfunction shared by both the unguided and guided
transcription functions.
As the existing VAD implementation only checks for a falling edge at the
end of an audio chunk. It fails to detect voice in cases where the
recorded voice is only at the beginning of the audio.

To ameliorate this, when the timestamp would cause analysis of audio
over a second in length, it is split into 1 second length subchunks
which are individually tested.

Results are promising, but there seems to be a remaining bug with
unguided transcription likely related to saving context
@ggerganov
Copy link
Owner

Very cool work! Looking forward to see this improved further

@Mihaiii
Copy link

Mihaiii commented Aug 6, 2023

Haha, super cool! Love it!

This existing VAD implementation only detects falling edges, which
means any gap in the users speaking is processed for transcription.
This simply establishes a constant maximum length depending on the type
of transcription. Uguided gets a generous 10 seconds and guided, 2.

While quick testing showed that commands are generally around a half a
second to a second, limiting commands to an even second resulted in
extreme degradation of quality. (Seemingly always the same output for a
given commandset)
Unguided transcriptions where not setup to allow for passing of
timestamp data forward, but have been corrected.

No_context is now always set to false. While conceptually desirable for
the quality of guided transcription, It was seemingly responsible for
prior command inputs ghosting in unguided transcription.

Save and Run are now tracked by command number instead of command text.
While command_text was provided for convenience, I wish to keep command
index authoritative. This gives greater consistency and potentially
allows for end users to rename or even translate the spoken versions of
these commands
Previously, mode was reset to 0 unless otherwise set.
In addition to causing some edge cases, this was didn't mesh well with
the existing approach to visual mode.

With this change, initial tests indicate visual mode is functional.
Subtranscriptions use undo as a hack to allow for partial responses to
be displayed. However, scripts don't cause an undo break mid execution
unless specifically instructed to. This meant that multiple
unguided transcriptions from a single session would cause a latter to
undo a former.

This is now fixed and undo should be reasonably usable as a command.
When entering and leavening insert mode with `i`, the cursor shifts one
column to the left. This is remedied by using append instead of insert
for setting these breaks in the undo sequence

`-` was also added to the pronunciation dictionary to be pronounced as
minus as it was causing a particularly high failure rate.
Previously, undo sequence breaks were triggered when there was a command
that caused a move to insert mode. This caused commands that changed
state (like delete or paste) to be bundled together with into the last
command that caused text to be entered.
 Repeat (.) wasn't being tracked properly just like undo and is being
 manually tracked now.

 While efforts have been made to properly handle spaces, it was
 particularly finicky to add a single space when one is needed. A
 special 'space' command has been added to insert a single space and move
 the cursor after it.

 Carrot and Dollar commands have been added for start of line and end of
 line respectively. These are both simple to implement, and just a
 matter of defining a pronunciation.
Not every command in the commandset tokenizes to a single token.
Because of this, it's possible for that two commands could resolve to
the same single token after subsequent tokens are discarded.

This commit adds a simple check for duplicates when a commandset is
registered and returns an error if so.

Additional code will be required later on the vim side to actually
process this error.
This adds a user definable dictionary from spoken keys to strings or
funcrefs. All keys are added to the commandlist and when spoken, trigger
the corresponding function.

Like "save" and "run", these user commands are only available when the
command buffer is empty.
@AustinMroz
Copy link
Contributor Author

whisper.vim-demo2.mp4

Progress is reaching the point where it should be stable enough for other people to play with and the recently implemented user commands setting makes for another flashy demo.

Short of others having difficulty running it, I just want to take another pass at formatting, naming (the c++ code isn't an LSP, it's a language server that implements the protocol), figure out cmake, and add some documentation.

@thiswillbeyourgithub
Copy link

Just chiming in to shout out a related project I heard about : CursorLess It's FOSS and allows to code using your voice.

Here's a video (skip to 1min09) : https://www.youtube.com/watch?v=0ZZb12Qp6-0
Here's a github link : https://github.com/cursorless-dev/cursorless

@ggerganov
Copy link
Owner

@AustinMroz

Let us know when it is ready for review

Area commands (inside word, around sentence...) have been given a
commandset as considered earlier.

Verbose definitions for spoken_dict entries now use dicts instead of
lists. This shortens the definition for most keys that require it and
scales better with the addition of further commandsets
Mark (m) and jump (') have been added.

When a visual selection was executed upon a command that initiated a
subtranscription (change) the area of the visual selection is not
properly tracked which causes the attempt to stream in partial response
to fail. This is solved by disabling partial transcriptions from being
streamed when a subtranscription is started while in visual mode.
From testing on older different versions of vim, the test for
distinguishing an 'R' replace all from an 'r' replace could fail if
ignorecase was set. The comparison has been changed to explicitly
require case matching

Change detection has been moved to the execution section as it was missing the
change+motion case.
There's no logic to prevent doubled register entry, but the functional
result is equivalent to if the same key order was typed into vim.

A minor typo in the readme. I've mismemorized the mnemonic for 't' as 'to'
instead of till., but 'to' can't be used as it's a homophone with '2'.
While there was no mistake in the actual logic, it was misleading to use
'to' in the readme.
@AustinMroz
Copy link
Contributor Author

@ggerganov I believe it's ready for review.

While the scope here is far greater than that of the llama plugin, I was able to install an older version of vim (that's one day off of yours) with nix and all my test cases are still passing on it.

@AustinMroz AustinMroz marked this pull request as ready for review August 26, 2023 02:37
Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't played with this yet, but I think the implementation is very nice and it is a really cool example to add to whisper.cpp. Nice work!

@ggerganov ggerganov merged commit 175ffa6 into ggerganov:master Aug 27, 2023
35 checks passed
jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this pull request Oct 24, 2023
* Initial proof of concept Vim plugin

At present, this is likely only slightly better than feature parity with
the existing whisper.nvim

Known issues:
 Trailing whitespace
 Up to an existing length(5 seconds) of speech may be processed when
  listening is enabled
 CPU cycles are spent processing speech even when not listening.

Fixing these issues is likely dependent upon future efforts to create a
dedicated library instead of wrapping examples/stream

* Support $WHISPER_CPP_HOME environment variable

A minor misunderstanding of the whisper.nvim implementation resulted in
a plugin that was functional, but not a drop in replacement as it should
be now.

* Initial progress on LSP implementation

Libcall is nonviable because the library is immediately freed after a
call is made. Further investigation has shown Language Server Protocol as
a promising alternative that both simplifies the required logic on the
vimscript side and increases the ease with which plugins for other
editors could be made in the future. This is a very large undertaking
and my progress has slowed substantially.

Work is far from being in a usable state, but I wish to keep track of
major refactors for organizational purposes.

* Rewrite audio windowing of guided transcription

One of the defining goals of this venture is allowing consecutive
commands to be rattled off without the existing deadzones of the current
implementation.

* Add unguided_transcription. Cleanup.

The unguided transcription implantation heavily borrows from existing
example implementations and the guided_transcription logic.

A high level pass was done to check that method arguments are accurate
to what inputs are actually required.

A first attempt at cancellation support was added for record keeping,
but will be deleted in a future commit.

* Fix compilation.

Resolves a large number of compilation errors.
No testing has been done yet for execution errors.

Update Makefile and .gitignore

* Functional unguided_transcription

* Functional guided_transcription

Fix commandset_list being passed by value
Properly register the first token of a multitoken command

* Minor changes before time fix

I've apparently made an awfully major mistake in thinking that unix time
was in milliseconds and will be changing all timekeeping code to use
standardized methods.

In preparation for this is a number of minor bugfixes.
Output is manually flushed.
An echo method has been added.
registerCommandset now wraps the returned index

* Swap timekeeping to use std::chrono

* Add work in progress lsp backed whisper.vim plugin

Current progress blockers are
 Adding modality awareness to the command processing
  (specifically, motion prompting)
 Improving the VAD to be a little more responsive
  (testing start of activity)

* Reworked vim plugin command loop

* Fix change inside

Multiple bug fixes that, crucially, bring the plugin to the point where a
demonstration video is possible

Add better echo messaging so whisper_log isn't required
 Add loading complete message as indicator when listening has started
Insert/append are actually included in command sets
Some more heavy handed corrections to prevent a double exit when leaving
insert mode
As a somewhat hacky fix, the very first space is removed when inserting.
 This cleans up most use cases, but leaves me unsatisfied with the few
 cases it would be desired.

* Forcibly set commandset_index to 0 after subinsert

Also remove unnecessary ! to use builtin vim command

* Fix upper

A minor scope mistake was causing upper'd inputs to be eaten.
This was fixed and echoing was slightly improved for clarity.

* Fix formatting

Corrects indentation to 4 spaces as project standard
Slightly better error support for malformed json input

* Remove obsolete vim plugin

* Add json.hpp library

The same library that is used for the llama.cpp server

* Minor cleanups

add lsp to the make clean directive.
remove a redundant params definition.
reorder whisper.vim logging for subtranscriptions
Corrections to unlets (variables of argument scope appear immutable)

* Fix indentation. Fallback for subTranscription

Indentation has been changed to 4 spaces.

Unit testing has been set up, I'm opting not to include it in the
repository for now.
It however, has revealed a bug in the state logic where a
subtranscription can be initiated without having a saved command
When this occurs, append is added as a fallback

* Move audio polling logic to a subfunction

While work on the improved vad will continue, It's grown to be a little
out of scope. Instead, a future commit will perform multiple detection
passes at substretches of audio when a backlog of audio exists.

To facilitate this, and prevent code duplication, the vad code has been
moved into a subfunction shared by both the unguided and guided
transcription functions.

* Test for voice over subchunks if backlog > 1s

As the existing VAD implementation only checks for a falling edge at the
end of an audio chunk. It fails to detect voice in cases where the
recorded voice is only at the beginning of the audio.

To ameliorate this, when the timestamp would cause analysis of audio
over a second in length, it is split into 1 second length subchunks
which are individually tested.

Results are promising, but there seems to be a remaining bug with
unguided transcription likely related to saving context

* Limit the maximum length of audio input.

This existing VAD implementation only detects falling edges, which
means any gap in the users speaking is processed for transcription.
This simply establishes a constant maximum length depending on the type
of transcription. Uguided gets a generous 10 seconds and guided, 2.

While quick testing showed that commands are generally around a half a
second to a second, limiting commands to an even second resulted in
extreme degradation of quality. (Seemingly always the same output for a
given commandset)

* Unguided timestamp tracking, cleanup

Unguided transcriptions where not setup to allow for passing of
timestamp data forward, but have been corrected.

No_context is now always set to false. While conceptually desirable for
the quality of guided transcription, It was seemingly responsible for
prior command inputs ghosting in unguided transcription.

Save and Run are now tracked by command number instead of command text.
While command_text was provided for convenience, I wish to keep command
index authoritative. This gives greater consistency and potentially
allows for end users to rename or even translate the spoken versions of
these commands

* By default, maintain mode.

Previously, mode was reset to 0 unless otherwise set.
In addition to causing some edge cases, this was didn't mesh well with
the existing approach to visual mode.

With this change, initial tests indicate visual mode is functional.

* Add undo breaks before subtranscriptions

Subtranscriptions use undo as a hack to allow for partial responses to
be displayed. However, scripts don't cause an undo break mid execution
unless specifically instructed to. This meant that multiple
unguided transcriptions from a single session would cause a latter to
undo a former.

This is now fixed and undo should be reasonably usable as a command.

* Append instead of insert for new undo sequence

When entering and leavening insert mode with `i`, the cursor shifts one
column to the left. This is remedied by using append instead of insert
for setting these breaks in the undo sequence

`-` was also added to the pronunciation dictionary to be pronounced as
minus as it was causing a particularly high failure rate.

* Move undo sequence breaks to command execution

Previously, undo sequence breaks were triggered when there was a command
that caused a move to insert mode. This caused commands that changed
state (like delete or paste) to be bundled together with into the last
command that caused text to be entered.

* Fix repeat. Add space, carrot, dollar commands

 Repeat (.) wasn't being tracked properly just like undo and is being
 manually tracked now.

 While efforts have been made to properly handle spaces, it was
 particularly finicky to add a single space when one is needed. A
 special 'space' command has been added to insert a single space and move
 the cursor after it.

 Carrot and Dollar commands have been added for start of line and end of
 line respectively. These are both simple to implement, and just a
 matter of defining a pronunciation.

* Return error on duplicate in commandset

Not every command in the commandset tokenizes to a single token.
Because of this, it's possible for that two commands could resolve to
the same single token after subsequent tokens are discarded.

This commit adds a simple check for duplicates when a commandset is
registered and returns an error if so.

Additional code will be required later on the vim side to actually
process this error.

* Add support for user-defined commands

This adds a user definable dictionary from spoken keys to strings or
funcrefs. All keys are added to the commandlist and when spoken, trigger
the corresponding function.

Like "save" and "run", these user commands are only available when the
command buffer is empty.

* Add readme, update cmake

* Add area commandset. Refactor spoken_dict

Area commands (inside word, around sentence...) have been given a
commandset as considered earlier.

Verbose definitions for spoken_dict entries now use dicts instead of
lists. This shortens the definition for most keys that require it and
scales better with the addition of further commandsets

* Add mark, jump. Fix change under visual.

Mark (m) and jump (') have been added.

When a visual selection was executed upon a command that initiated a
subtranscription (change) the area of the visual selection is not
properly tracked which causes the attempt to stream in partial response
to fail. This is solved by disabling partial transcriptions from being
streamed when a subtranscription is started while in visual mode.

* Accommodate ignorecase. Fix change.

From testing on older different versions of vim, the test for
distinguishing an 'R' replace all from an 'r' replace could fail if
ignorecase was set. The comparison has been changed to explicitly
require case matching

Change detection has been moved to the execution section as it was missing the
change+motion case.

* Support registers. Fix README typo

There's no logic to prevent doubled register entry, but the functional
result is equivalent to if the same key order was typed into vim.

A minor typo in the readme. I've mismemorized the mnemonic for 't' as 'to'
instead of till., but 'to' can't be used as it's a homophone with '2'.
While there was no mistake in the actual logic, it was misleading to use
'to' in the readme.
jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this pull request Oct 24, 2023
* Initial proof of concept Vim plugin

At present, this is likely only slightly better than feature parity with
the existing whisper.nvim

Known issues:
 Trailing whitespace
 Up to an existing length(5 seconds) of speech may be processed when
  listening is enabled
 CPU cycles are spent processing speech even when not listening.

Fixing these issues is likely dependent upon future efforts to create a
dedicated library instead of wrapping examples/stream

* Support $WHISPER_CPP_HOME environment variable

A minor misunderstanding of the whisper.nvim implementation resulted in
a plugin that was functional, but not a drop in replacement as it should
be now.

* Initial progress on LSP implementation

Libcall is nonviable because the library is immediately freed after a
call is made. Further investigation has shown Language Server Protocol as
a promising alternative that both simplifies the required logic on the
vimscript side and increases the ease with which plugins for other
editors could be made in the future. This is a very large undertaking
and my progress has slowed substantially.

Work is far from being in a usable state, but I wish to keep track of
major refactors for organizational purposes.

* Rewrite audio windowing of guided transcription

One of the defining goals of this venture is allowing consecutive
commands to be rattled off without the existing deadzones of the current
implementation.

* Add unguided_transcription. Cleanup.

The unguided transcription implantation heavily borrows from existing
example implementations and the guided_transcription logic.

A high level pass was done to check that method arguments are accurate
to what inputs are actually required.

A first attempt at cancellation support was added for record keeping,
but will be deleted in a future commit.

* Fix compilation.

Resolves a large number of compilation errors.
No testing has been done yet for execution errors.

Update Makefile and .gitignore

* Functional unguided_transcription

* Functional guided_transcription

Fix commandset_list being passed by value
Properly register the first token of a multitoken command

* Minor changes before time fix

I've apparently made an awfully major mistake in thinking that unix time
was in milliseconds and will be changing all timekeeping code to use
standardized methods.

In preparation for this is a number of minor bugfixes.
Output is manually flushed.
An echo method has been added.
registerCommandset now wraps the returned index

* Swap timekeeping to use std::chrono

* Add work in progress lsp backed whisper.vim plugin

Current progress blockers are
 Adding modality awareness to the command processing
  (specifically, motion prompting)
 Improving the VAD to be a little more responsive
  (testing start of activity)

* Reworked vim plugin command loop

* Fix change inside

Multiple bug fixes that, crucially, bring the plugin to the point where a
demonstration video is possible

Add better echo messaging so whisper_log isn't required
 Add loading complete message as indicator when listening has started
Insert/append are actually included in command sets
Some more heavy handed corrections to prevent a double exit when leaving
insert mode
As a somewhat hacky fix, the very first space is removed when inserting.
 This cleans up most use cases, but leaves me unsatisfied with the few
 cases it would be desired.

* Forcibly set commandset_index to 0 after subinsert

Also remove unnecessary ! to use builtin vim command

* Fix upper

A minor scope mistake was causing upper'd inputs to be eaten.
This was fixed and echoing was slightly improved for clarity.

* Fix formatting

Corrects indentation to 4 spaces as project standard
Slightly better error support for malformed json input

* Remove obsolete vim plugin

* Add json.hpp library

The same library that is used for the llama.cpp server

* Minor cleanups

add lsp to the make clean directive.
remove a redundant params definition.
reorder whisper.vim logging for subtranscriptions
Corrections to unlets (variables of argument scope appear immutable)

* Fix indentation. Fallback for subTranscription

Indentation has been changed to 4 spaces.

Unit testing has been set up, I'm opting not to include it in the
repository for now.
It however, has revealed a bug in the state logic where a
subtranscription can be initiated without having a saved command
When this occurs, append is added as a fallback

* Move audio polling logic to a subfunction

While work on the improved vad will continue, It's grown to be a little
out of scope. Instead, a future commit will perform multiple detection
passes at substretches of audio when a backlog of audio exists.

To facilitate this, and prevent code duplication, the vad code has been
moved into a subfunction shared by both the unguided and guided
transcription functions.

* Test for voice over subchunks if backlog > 1s

As the existing VAD implementation only checks for a falling edge at the
end of an audio chunk. It fails to detect voice in cases where the
recorded voice is only at the beginning of the audio.

To ameliorate this, when the timestamp would cause analysis of audio
over a second in length, it is split into 1 second length subchunks
which are individually tested.

Results are promising, but there seems to be a remaining bug with
unguided transcription likely related to saving context

* Limit the maximum length of audio input.

This existing VAD implementation only detects falling edges, which
means any gap in the users speaking is processed for transcription.
This simply establishes a constant maximum length depending on the type
of transcription. Uguided gets a generous 10 seconds and guided, 2.

While quick testing showed that commands are generally around a half a
second to a second, limiting commands to an even second resulted in
extreme degradation of quality. (Seemingly always the same output for a
given commandset)

* Unguided timestamp tracking, cleanup

Unguided transcriptions where not setup to allow for passing of
timestamp data forward, but have been corrected.

No_context is now always set to false. While conceptually desirable for
the quality of guided transcription, It was seemingly responsible for
prior command inputs ghosting in unguided transcription.

Save and Run are now tracked by command number instead of command text.
While command_text was provided for convenience, I wish to keep command
index authoritative. This gives greater consistency and potentially
allows for end users to rename or even translate the spoken versions of
these commands

* By default, maintain mode.

Previously, mode was reset to 0 unless otherwise set.
In addition to causing some edge cases, this was didn't mesh well with
the existing approach to visual mode.

With this change, initial tests indicate visual mode is functional.

* Add undo breaks before subtranscriptions

Subtranscriptions use undo as a hack to allow for partial responses to
be displayed. However, scripts don't cause an undo break mid execution
unless specifically instructed to. This meant that multiple
unguided transcriptions from a single session would cause a latter to
undo a former.

This is now fixed and undo should be reasonably usable as a command.

* Append instead of insert for new undo sequence

When entering and leavening insert mode with `i`, the cursor shifts one
column to the left. This is remedied by using append instead of insert
for setting these breaks in the undo sequence

`-` was also added to the pronunciation dictionary to be pronounced as
minus as it was causing a particularly high failure rate.

* Move undo sequence breaks to command execution

Previously, undo sequence breaks were triggered when there was a command
that caused a move to insert mode. This caused commands that changed
state (like delete or paste) to be bundled together with into the last
command that caused text to be entered.

* Fix repeat. Add space, carrot, dollar commands

 Repeat (.) wasn't being tracked properly just like undo and is being
 manually tracked now.

 While efforts have been made to properly handle spaces, it was
 particularly finicky to add a single space when one is needed. A
 special 'space' command has been added to insert a single space and move
 the cursor after it.

 Carrot and Dollar commands have been added for start of line and end of
 line respectively. These are both simple to implement, and just a
 matter of defining a pronunciation.

* Return error on duplicate in commandset

Not every command in the commandset tokenizes to a single token.
Because of this, it's possible for that two commands could resolve to
the same single token after subsequent tokens are discarded.

This commit adds a simple check for duplicates when a commandset is
registered and returns an error if so.

Additional code will be required later on the vim side to actually
process this error.

* Add support for user-defined commands

This adds a user definable dictionary from spoken keys to strings or
funcrefs. All keys are added to the commandlist and when spoken, trigger
the corresponding function.

Like "save" and "run", these user commands are only available when the
command buffer is empty.

* Add readme, update cmake

* Add area commandset. Refactor spoken_dict

Area commands (inside word, around sentence...) have been given a
commandset as considered earlier.

Verbose definitions for spoken_dict entries now use dicts instead of
lists. This shortens the definition for most keys that require it and
scales better with the addition of further commandsets

* Add mark, jump. Fix change under visual.

Mark (m) and jump (') have been added.

When a visual selection was executed upon a command that initiated a
subtranscription (change) the area of the visual selection is not
properly tracked which causes the attempt to stream in partial response
to fail. This is solved by disabling partial transcriptions from being
streamed when a subtranscription is started while in visual mode.

* Accommodate ignorecase. Fix change.

From testing on older different versions of vim, the test for
distinguishing an 'R' replace all from an 'r' replace could fail if
ignorecase was set. The comparison has been changed to explicitly
require case matching

Change detection has been moved to the execution section as it was missing the
change+motion case.

* Support registers. Fix README typo

There's no logic to prevent doubled register entry, but the functional
result is equivalent to if the same key order was typed into vim.

A minor typo in the readme. I've mismemorized the mnemonic for 't' as 'to'
instead of till., but 'to' can't be used as it's a homophone with '2'.
While there was no mistake in the actual logic, it was misleading to use
'to' in the readme.
vonstring pushed a commit to vonstring/whisper.cpp that referenced this pull request Nov 7, 2023
* Initial proof of concept Vim plugin

At present, this is likely only slightly better than feature parity with
the existing whisper.nvim

Known issues:
 Trailing whitespace
 Up to an existing length(5 seconds) of speech may be processed when
  listening is enabled
 CPU cycles are spent processing speech even when not listening.

Fixing these issues is likely dependent upon future efforts to create a
dedicated library instead of wrapping examples/stream

* Support $WHISPER_CPP_HOME environment variable

A minor misunderstanding of the whisper.nvim implementation resulted in
a plugin that was functional, but not a drop in replacement as it should
be now.

* Initial progress on LSP implementation

Libcall is nonviable because the library is immediately freed after a
call is made. Further investigation has shown Language Server Protocol as
a promising alternative that both simplifies the required logic on the
vimscript side and increases the ease with which plugins for other
editors could be made in the future. This is a very large undertaking
and my progress has slowed substantially.

Work is far from being in a usable state, but I wish to keep track of
major refactors for organizational purposes.

* Rewrite audio windowing of guided transcription

One of the defining goals of this venture is allowing consecutive
commands to be rattled off without the existing deadzones of the current
implementation.

* Add unguided_transcription. Cleanup.

The unguided transcription implantation heavily borrows from existing
example implementations and the guided_transcription logic.

A high level pass was done to check that method arguments are accurate
to what inputs are actually required.

A first attempt at cancellation support was added for record keeping,
but will be deleted in a future commit.

* Fix compilation.

Resolves a large number of compilation errors.
No testing has been done yet for execution errors.

Update Makefile and .gitignore

* Functional unguided_transcription

* Functional guided_transcription

Fix commandset_list being passed by value
Properly register the first token of a multitoken command

* Minor changes before time fix

I've apparently made an awfully major mistake in thinking that unix time
was in milliseconds and will be changing all timekeeping code to use
standardized methods.

In preparation for this is a number of minor bugfixes.
Output is manually flushed.
An echo method has been added.
registerCommandset now wraps the returned index

* Swap timekeeping to use std::chrono

* Add work in progress lsp backed whisper.vim plugin

Current progress blockers are
 Adding modality awareness to the command processing
  (specifically, motion prompting)
 Improving the VAD to be a little more responsive
  (testing start of activity)

* Reworked vim plugin command loop

* Fix change inside

Multiple bug fixes that, crucially, bring the plugin to the point where a
demonstration video is possible

Add better echo messaging so whisper_log isn't required
 Add loading complete message as indicator when listening has started
Insert/append are actually included in command sets
Some more heavy handed corrections to prevent a double exit when leaving
insert mode
As a somewhat hacky fix, the very first space is removed when inserting.
 This cleans up most use cases, but leaves me unsatisfied with the few
 cases it would be desired.

* Forcibly set commandset_index to 0 after subinsert

Also remove unnecessary ! to use builtin vim command

* Fix upper

A minor scope mistake was causing upper'd inputs to be eaten.
This was fixed and echoing was slightly improved for clarity.

* Fix formatting

Corrects indentation to 4 spaces as project standard
Slightly better error support for malformed json input

* Remove obsolete vim plugin

* Add json.hpp library

The same library that is used for the llama.cpp server

* Minor cleanups

add lsp to the make clean directive.
remove a redundant params definition.
reorder whisper.vim logging for subtranscriptions
Corrections to unlets (variables of argument scope appear immutable)

* Fix indentation. Fallback for subTranscription

Indentation has been changed to 4 spaces.

Unit testing has been set up, I'm opting not to include it in the
repository for now.
It however, has revealed a bug in the state logic where a
subtranscription can be initiated without having a saved command
When this occurs, append is added as a fallback

* Move audio polling logic to a subfunction

While work on the improved vad will continue, It's grown to be a little
out of scope. Instead, a future commit will perform multiple detection
passes at substretches of audio when a backlog of audio exists.

To facilitate this, and prevent code duplication, the vad code has been
moved into a subfunction shared by both the unguided and guided
transcription functions.

* Test for voice over subchunks if backlog > 1s

As the existing VAD implementation only checks for a falling edge at the
end of an audio chunk. It fails to detect voice in cases where the
recorded voice is only at the beginning of the audio.

To ameliorate this, when the timestamp would cause analysis of audio
over a second in length, it is split into 1 second length subchunks
which are individually tested.

Results are promising, but there seems to be a remaining bug with
unguided transcription likely related to saving context

* Limit the maximum length of audio input.

This existing VAD implementation only detects falling edges, which
means any gap in the users speaking is processed for transcription.
This simply establishes a constant maximum length depending on the type
of transcription. Uguided gets a generous 10 seconds and guided, 2.

While quick testing showed that commands are generally around a half a
second to a second, limiting commands to an even second resulted in
extreme degradation of quality. (Seemingly always the same output for a
given commandset)

* Unguided timestamp tracking, cleanup

Unguided transcriptions where not setup to allow for passing of
timestamp data forward, but have been corrected.

No_context is now always set to false. While conceptually desirable for
the quality of guided transcription, It was seemingly responsible for
prior command inputs ghosting in unguided transcription.

Save and Run are now tracked by command number instead of command text.
While command_text was provided for convenience, I wish to keep command
index authoritative. This gives greater consistency and potentially
allows for end users to rename or even translate the spoken versions of
these commands

* By default, maintain mode.

Previously, mode was reset to 0 unless otherwise set.
In addition to causing some edge cases, this was didn't mesh well with
the existing approach to visual mode.

With this change, initial tests indicate visual mode is functional.

* Add undo breaks before subtranscriptions

Subtranscriptions use undo as a hack to allow for partial responses to
be displayed. However, scripts don't cause an undo break mid execution
unless specifically instructed to. This meant that multiple
unguided transcriptions from a single session would cause a latter to
undo a former.

This is now fixed and undo should be reasonably usable as a command.

* Append instead of insert for new undo sequence

When entering and leavening insert mode with `i`, the cursor shifts one
column to the left. This is remedied by using append instead of insert
for setting these breaks in the undo sequence

`-` was also added to the pronunciation dictionary to be pronounced as
minus as it was causing a particularly high failure rate.

* Move undo sequence breaks to command execution

Previously, undo sequence breaks were triggered when there was a command
that caused a move to insert mode. This caused commands that changed
state (like delete or paste) to be bundled together with into the last
command that caused text to be entered.

* Fix repeat. Add space, carrot, dollar commands

 Repeat (.) wasn't being tracked properly just like undo and is being
 manually tracked now.

 While efforts have been made to properly handle spaces, it was
 particularly finicky to add a single space when one is needed. A
 special 'space' command has been added to insert a single space and move
 the cursor after it.

 Carrot and Dollar commands have been added for start of line and end of
 line respectively. These are both simple to implement, and just a
 matter of defining a pronunciation.

* Return error on duplicate in commandset

Not every command in the commandset tokenizes to a single token.
Because of this, it's possible for that two commands could resolve to
the same single token after subsequent tokens are discarded.

This commit adds a simple check for duplicates when a commandset is
registered and returns an error if so.

Additional code will be required later on the vim side to actually
process this error.

* Add support for user-defined commands

This adds a user definable dictionary from spoken keys to strings or
funcrefs. All keys are added to the commandlist and when spoken, trigger
the corresponding function.

Like "save" and "run", these user commands are only available when the
command buffer is empty.

* Add readme, update cmake

* Add area commandset. Refactor spoken_dict

Area commands (inside word, around sentence...) have been given a
commandset as considered earlier.

Verbose definitions for spoken_dict entries now use dicts instead of
lists. This shortens the definition for most keys that require it and
scales better with the addition of further commandsets

* Add mark, jump. Fix change under visual.

Mark (m) and jump (') have been added.

When a visual selection was executed upon a command that initiated a
subtranscription (change) the area of the visual selection is not
properly tracked which causes the attempt to stream in partial response
to fail. This is solved by disabling partial transcriptions from being
streamed when a subtranscription is started while in visual mode.

* Accommodate ignorecase. Fix change.

From testing on older different versions of vim, the test for
distinguishing an 'R' replace all from an 'r' replace could fail if
ignorecase was set. The comparison has been changed to explicitly
require case matching

Change detection has been moved to the execution section as it was missing the
change+motion case.

* Support registers. Fix README typo

There's no logic to prevent doubled register entry, but the functional
result is equivalent to if the same key order was typed into vim.

A minor typo in the readme. I've mismemorized the mnemonic for 't' as 'to'
instead of till., but 'to' can't be used as it's a homophone with '2'.
While there was no mistake in the actual logic, it was misleading to use
'to' in the readme.
landtanin pushed a commit to landtanin/whisper.cpp that referenced this pull request Dec 16, 2023
* Initial proof of concept Vim plugin

At present, this is likely only slightly better than feature parity with
the existing whisper.nvim

Known issues:
 Trailing whitespace
 Up to an existing length(5 seconds) of speech may be processed when
  listening is enabled
 CPU cycles are spent processing speech even when not listening.

Fixing these issues is likely dependent upon future efforts to create a
dedicated library instead of wrapping examples/stream

* Support $WHISPER_CPP_HOME environment variable

A minor misunderstanding of the whisper.nvim implementation resulted in
a plugin that was functional, but not a drop in replacement as it should
be now.

* Initial progress on LSP implementation

Libcall is nonviable because the library is immediately freed after a
call is made. Further investigation has shown Language Server Protocol as
a promising alternative that both simplifies the required logic on the
vimscript side and increases the ease with which plugins for other
editors could be made in the future. This is a very large undertaking
and my progress has slowed substantially.

Work is far from being in a usable state, but I wish to keep track of
major refactors for organizational purposes.

* Rewrite audio windowing of guided transcription

One of the defining goals of this venture is allowing consecutive
commands to be rattled off without the existing deadzones of the current
implementation.

* Add unguided_transcription. Cleanup.

The unguided transcription implantation heavily borrows from existing
example implementations and the guided_transcription logic.

A high level pass was done to check that method arguments are accurate
to what inputs are actually required.

A first attempt at cancellation support was added for record keeping,
but will be deleted in a future commit.

* Fix compilation.

Resolves a large number of compilation errors.
No testing has been done yet for execution errors.

Update Makefile and .gitignore

* Functional unguided_transcription

* Functional guided_transcription

Fix commandset_list being passed by value
Properly register the first token of a multitoken command

* Minor changes before time fix

I've apparently made an awfully major mistake in thinking that unix time
was in milliseconds and will be changing all timekeeping code to use
standardized methods.

In preparation for this is a number of minor bugfixes.
Output is manually flushed.
An echo method has been added.
registerCommandset now wraps the returned index

* Swap timekeeping to use std::chrono

* Add work in progress lsp backed whisper.vim plugin

Current progress blockers are
 Adding modality awareness to the command processing
  (specifically, motion prompting)
 Improving the VAD to be a little more responsive
  (testing start of activity)

* Reworked vim plugin command loop

* Fix change inside

Multiple bug fixes that, crucially, bring the plugin to the point where a
demonstration video is possible

Add better echo messaging so whisper_log isn't required
 Add loading complete message as indicator when listening has started
Insert/append are actually included in command sets
Some more heavy handed corrections to prevent a double exit when leaving
insert mode
As a somewhat hacky fix, the very first space is removed when inserting.
 This cleans up most use cases, but leaves me unsatisfied with the few
 cases it would be desired.

* Forcibly set commandset_index to 0 after subinsert

Also remove unnecessary ! to use builtin vim command

* Fix upper

A minor scope mistake was causing upper'd inputs to be eaten.
This was fixed and echoing was slightly improved for clarity.

* Fix formatting

Corrects indentation to 4 spaces as project standard
Slightly better error support for malformed json input

* Remove obsolete vim plugin

* Add json.hpp library

The same library that is used for the llama.cpp server

* Minor cleanups

add lsp to the make clean directive.
remove a redundant params definition.
reorder whisper.vim logging for subtranscriptions
Corrections to unlets (variables of argument scope appear immutable)

* Fix indentation. Fallback for subTranscription

Indentation has been changed to 4 spaces.

Unit testing has been set up, I'm opting not to include it in the
repository for now.
It however, has revealed a bug in the state logic where a
subtranscription can be initiated without having a saved command
When this occurs, append is added as a fallback

* Move audio polling logic to a subfunction

While work on the improved vad will continue, It's grown to be a little
out of scope. Instead, a future commit will perform multiple detection
passes at substretches of audio when a backlog of audio exists.

To facilitate this, and prevent code duplication, the vad code has been
moved into a subfunction shared by both the unguided and guided
transcription functions.

* Test for voice over subchunks if backlog > 1s

As the existing VAD implementation only checks for a falling edge at the
end of an audio chunk. It fails to detect voice in cases where the
recorded voice is only at the beginning of the audio.

To ameliorate this, when the timestamp would cause analysis of audio
over a second in length, it is split into 1 second length subchunks
which are individually tested.

Results are promising, but there seems to be a remaining bug with
unguided transcription likely related to saving context

* Limit the maximum length of audio input.

This existing VAD implementation only detects falling edges, which
means any gap in the users speaking is processed for transcription.
This simply establishes a constant maximum length depending on the type
of transcription. Uguided gets a generous 10 seconds and guided, 2.

While quick testing showed that commands are generally around a half a
second to a second, limiting commands to an even second resulted in
extreme degradation of quality. (Seemingly always the same output for a
given commandset)

* Unguided timestamp tracking, cleanup

Unguided transcriptions where not setup to allow for passing of
timestamp data forward, but have been corrected.

No_context is now always set to false. While conceptually desirable for
the quality of guided transcription, It was seemingly responsible for
prior command inputs ghosting in unguided transcription.

Save and Run are now tracked by command number instead of command text.
While command_text was provided for convenience, I wish to keep command
index authoritative. This gives greater consistency and potentially
allows for end users to rename or even translate the spoken versions of
these commands

* By default, maintain mode.

Previously, mode was reset to 0 unless otherwise set.
In addition to causing some edge cases, this was didn't mesh well with
the existing approach to visual mode.

With this change, initial tests indicate visual mode is functional.

* Add undo breaks before subtranscriptions

Subtranscriptions use undo as a hack to allow for partial responses to
be displayed. However, scripts don't cause an undo break mid execution
unless specifically instructed to. This meant that multiple
unguided transcriptions from a single session would cause a latter to
undo a former.

This is now fixed and undo should be reasonably usable as a command.

* Append instead of insert for new undo sequence

When entering and leavening insert mode with `i`, the cursor shifts one
column to the left. This is remedied by using append instead of insert
for setting these breaks in the undo sequence

`-` was also added to the pronunciation dictionary to be pronounced as
minus as it was causing a particularly high failure rate.

* Move undo sequence breaks to command execution

Previously, undo sequence breaks were triggered when there was a command
that caused a move to insert mode. This caused commands that changed
state (like delete or paste) to be bundled together with into the last
command that caused text to be entered.

* Fix repeat. Add space, carrot, dollar commands

 Repeat (.) wasn't being tracked properly just like undo and is being
 manually tracked now.

 While efforts have been made to properly handle spaces, it was
 particularly finicky to add a single space when one is needed. A
 special 'space' command has been added to insert a single space and move
 the cursor after it.

 Carrot and Dollar commands have been added for start of line and end of
 line respectively. These are both simple to implement, and just a
 matter of defining a pronunciation.

* Return error on duplicate in commandset

Not every command in the commandset tokenizes to a single token.
Because of this, it's possible for that two commands could resolve to
the same single token after subsequent tokens are discarded.

This commit adds a simple check for duplicates when a commandset is
registered and returns an error if so.

Additional code will be required later on the vim side to actually
process this error.

* Add support for user-defined commands

This adds a user definable dictionary from spoken keys to strings or
funcrefs. All keys are added to the commandlist and when spoken, trigger
the corresponding function.

Like "save" and "run", these user commands are only available when the
command buffer is empty.

* Add readme, update cmake

* Add area commandset. Refactor spoken_dict

Area commands (inside word, around sentence...) have been given a
commandset as considered earlier.

Verbose definitions for spoken_dict entries now use dicts instead of
lists. This shortens the definition for most keys that require it and
scales better with the addition of further commandsets

* Add mark, jump. Fix change under visual.

Mark (m) and jump (') have been added.

When a visual selection was executed upon a command that initiated a
subtranscription (change) the area of the visual selection is not
properly tracked which causes the attempt to stream in partial response
to fail. This is solved by disabling partial transcriptions from being
streamed when a subtranscription is started while in visual mode.

* Accommodate ignorecase. Fix change.

From testing on older different versions of vim, the test for
distinguishing an 'R' replace all from an 'r' replace could fail if
ignorecase was set. The comparison has been changed to explicitly
require case matching

Change detection has been moved to the execution section as it was missing the
change+motion case.

* Support registers. Fix README typo

There's no logic to prevent doubled register entry, but the functional
result is equivalent to if the same key order was typed into vim.

A minor typo in the readme. I've mismemorized the mnemonic for 't' as 'to'
instead of till., but 'to' can't be used as it's a homophone with '2'.
While there was no mistake in the actual logic, it was misleading to use
'to' in the readme.
iThalay pushed a commit to iThalay/whisper.cpp that referenced this pull request Sep 23, 2024
* Initial proof of concept Vim plugin

At present, this is likely only slightly better than feature parity with
the existing whisper.nvim

Known issues:
 Trailing whitespace
 Up to an existing length(5 seconds) of speech may be processed when
  listening is enabled
 CPU cycles are spent processing speech even when not listening.

Fixing these issues is likely dependent upon future efforts to create a
dedicated library instead of wrapping examples/stream

* Support $WHISPER_CPP_HOME environment variable

A minor misunderstanding of the whisper.nvim implementation resulted in
a plugin that was functional, but not a drop in replacement as it should
be now.

* Initial progress on LSP implementation

Libcall is nonviable because the library is immediately freed after a
call is made. Further investigation has shown Language Server Protocol as
a promising alternative that both simplifies the required logic on the
vimscript side and increases the ease with which plugins for other
editors could be made in the future. This is a very large undertaking
and my progress has slowed substantially.

Work is far from being in a usable state, but I wish to keep track of
major refactors for organizational purposes.

* Rewrite audio windowing of guided transcription

One of the defining goals of this venture is allowing consecutive
commands to be rattled off without the existing deadzones of the current
implementation.

* Add unguided_transcription. Cleanup.

The unguided transcription implantation heavily borrows from existing
example implementations and the guided_transcription logic.

A high level pass was done to check that method arguments are accurate
to what inputs are actually required.

A first attempt at cancellation support was added for record keeping,
but will be deleted in a future commit.

* Fix compilation.

Resolves a large number of compilation errors.
No testing has been done yet for execution errors.

Update Makefile and .gitignore

* Functional unguided_transcription

* Functional guided_transcription

Fix commandset_list being passed by value
Properly register the first token of a multitoken command

* Minor changes before time fix

I've apparently made an awfully major mistake in thinking that unix time
was in milliseconds and will be changing all timekeeping code to use
standardized methods.

In preparation for this is a number of minor bugfixes.
Output is manually flushed.
An echo method has been added.
registerCommandset now wraps the returned index

* Swap timekeeping to use std::chrono

* Add work in progress lsp backed whisper.vim plugin

Current progress blockers are
 Adding modality awareness to the command processing
  (specifically, motion prompting)
 Improving the VAD to be a little more responsive
  (testing start of activity)

* Reworked vim plugin command loop

* Fix change inside

Multiple bug fixes that, crucially, bring the plugin to the point where a
demonstration video is possible

Add better echo messaging so whisper_log isn't required
 Add loading complete message as indicator when listening has started
Insert/append are actually included in command sets
Some more heavy handed corrections to prevent a double exit when leaving
insert mode
As a somewhat hacky fix, the very first space is removed when inserting.
 This cleans up most use cases, but leaves me unsatisfied with the few
 cases it would be desired.

* Forcibly set commandset_index to 0 after subinsert

Also remove unnecessary ! to use builtin vim command

* Fix upper

A minor scope mistake was causing upper'd inputs to be eaten.
This was fixed and echoing was slightly improved for clarity.

* Fix formatting

Corrects indentation to 4 spaces as project standard
Slightly better error support for malformed json input

* Remove obsolete vim plugin

* Add json.hpp library

The same library that is used for the llama.cpp server

* Minor cleanups

add lsp to the make clean directive.
remove a redundant params definition.
reorder whisper.vim logging for subtranscriptions
Corrections to unlets (variables of argument scope appear immutable)

* Fix indentation. Fallback for subTranscription

Indentation has been changed to 4 spaces.

Unit testing has been set up, I'm opting not to include it in the
repository for now.
It however, has revealed a bug in the state logic where a
subtranscription can be initiated without having a saved command
When this occurs, append is added as a fallback

* Move audio polling logic to a subfunction

While work on the improved vad will continue, It's grown to be a little
out of scope. Instead, a future commit will perform multiple detection
passes at substretches of audio when a backlog of audio exists.

To facilitate this, and prevent code duplication, the vad code has been
moved into a subfunction shared by both the unguided and guided
transcription functions.

* Test for voice over subchunks if backlog > 1s

As the existing VAD implementation only checks for a falling edge at the
end of an audio chunk. It fails to detect voice in cases where the
recorded voice is only at the beginning of the audio.

To ameliorate this, when the timestamp would cause analysis of audio
over a second in length, it is split into 1 second length subchunks
which are individually tested.

Results are promising, but there seems to be a remaining bug with
unguided transcription likely related to saving context

* Limit the maximum length of audio input.

This existing VAD implementation only detects falling edges, which
means any gap in the users speaking is processed for transcription.
This simply establishes a constant maximum length depending on the type
of transcription. Uguided gets a generous 10 seconds and guided, 2.

While quick testing showed that commands are generally around a half a
second to a second, limiting commands to an even second resulted in
extreme degradation of quality. (Seemingly always the same output for a
given commandset)

* Unguided timestamp tracking, cleanup

Unguided transcriptions where not setup to allow for passing of
timestamp data forward, but have been corrected.

No_context is now always set to false. While conceptually desirable for
the quality of guided transcription, It was seemingly responsible for
prior command inputs ghosting in unguided transcription.

Save and Run are now tracked by command number instead of command text.
While command_text was provided for convenience, I wish to keep command
index authoritative. This gives greater consistency and potentially
allows for end users to rename or even translate the spoken versions of
these commands

* By default, maintain mode.

Previously, mode was reset to 0 unless otherwise set.
In addition to causing some edge cases, this was didn't mesh well with
the existing approach to visual mode.

With this change, initial tests indicate visual mode is functional.

* Add undo breaks before subtranscriptions

Subtranscriptions use undo as a hack to allow for partial responses to
be displayed. However, scripts don't cause an undo break mid execution
unless specifically instructed to. This meant that multiple
unguided transcriptions from a single session would cause a latter to
undo a former.

This is now fixed and undo should be reasonably usable as a command.

* Append instead of insert for new undo sequence

When entering and leavening insert mode with `i`, the cursor shifts one
column to the left. This is remedied by using append instead of insert
for setting these breaks in the undo sequence

`-` was also added to the pronunciation dictionary to be pronounced as
minus as it was causing a particularly high failure rate.

* Move undo sequence breaks to command execution

Previously, undo sequence breaks were triggered when there was a command
that caused a move to insert mode. This caused commands that changed
state (like delete or paste) to be bundled together with into the last
command that caused text to be entered.

* Fix repeat. Add space, carrot, dollar commands

 Repeat (.) wasn't being tracked properly just like undo and is being
 manually tracked now.

 While efforts have been made to properly handle spaces, it was
 particularly finicky to add a single space when one is needed. A
 special 'space' command has been added to insert a single space and move
 the cursor after it.

 Carrot and Dollar commands have been added for start of line and end of
 line respectively. These are both simple to implement, and just a
 matter of defining a pronunciation.

* Return error on duplicate in commandset

Not every command in the commandset tokenizes to a single token.
Because of this, it's possible for that two commands could resolve to
the same single token after subsequent tokens are discarded.

This commit adds a simple check for duplicates when a commandset is
registered and returns an error if so.

Additional code will be required later on the vim side to actually
process this error.

* Add support for user-defined commands

This adds a user definable dictionary from spoken keys to strings or
funcrefs. All keys are added to the commandlist and when spoken, trigger
the corresponding function.

Like "save" and "run", these user commands are only available when the
command buffer is empty.

* Add readme, update cmake

* Add area commandset. Refactor spoken_dict

Area commands (inside word, around sentence...) have been given a
commandset as considered earlier.

Verbose definitions for spoken_dict entries now use dicts instead of
lists. This shortens the definition for most keys that require it and
scales better with the addition of further commandsets

* Add mark, jump. Fix change under visual.

Mark (m) and jump (') have been added.

When a visual selection was executed upon a command that initiated a
subtranscription (change) the area of the visual selection is not
properly tracked which causes the attempt to stream in partial response
to fail. This is solved by disabling partial transcriptions from being
streamed when a subtranscription is started while in visual mode.

* Accommodate ignorecase. Fix change.

From testing on older different versions of vim, the test for
distinguishing an 'R' replace all from an 'r' replace could fail if
ignorecase was set. The comparison has been changed to explicitly
require case matching

Change detection has been moved to the execution section as it was missing the
change+motion case.

* Support registers. Fix README typo

There's no logic to prevent doubled register entry, but the functional
result is equivalent to if the same key order was typed into vim.

A minor typo in the readme. I've mismemorized the mnemonic for 't' as 'to'
instead of till., but 'to' can't be used as it's a homophone with '2'.
While there was no mistake in the actual logic, it was misleading to use
'to' in the readme.
@yyy33
Copy link

yyy33 commented Nov 12, 2024

Pretty, but it looks like there's some delay

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants