Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support/FR: Put transcription in kill ring #14

Open
pataquets opened this issue Nov 9, 2023 · 2 comments
Open

Support/FR: Put transcription in kill ring #14

pataquets opened this issue Nov 9, 2023 · 2 comments

Comments

@pataquets
Copy link

Is it possible to transcribe text and get it as if it was yanked?

@natrys
Copy link
Owner

natrys commented Nov 14, 2023

Not by default, the range of things people want is too varied for me to accommodate.

But it's possible if you add custom logic to post process hook. Please load the latest commit and try this:

(setq whisper-insert-text-at-point nil) ;; optional but sensible to disable live marker

(add-hook 'whisper-post-process-hook
          (lambda ()
            (kill-region (point-min) (point-max))))

@liar666
Copy link

liar666 commented Nov 27, 2023

For those interested, I've written the following (Bash) script to do exactly what OP wants, from anywhere in the OS (Linux) :)
How it works:

  • When first called, it records (using sox) into a file in /tmp/
  • When second called, it stops recording & uses main from whisper.cpp to transcribe (I've tried with stream but with worse results and more difficulties parsing the ever changing output).
  • It puts the resulting text into Xorg's cut/paste buffer (the "middle mouse click" buffer).
  • It uses /tmp/READER to get the language (fr/en for me). It uses it in a perculiar way, because I use this same file to switch language also for an other script that I use for TTS using pico2wave).

Very handy to add "VoiceTyping" to Firefox, LibreOffice, Emacs, Vim, etc.

Unfortunately, it is very rough and might fail in some cases (best solution in this case is to remove all my temp files from /tmp).

Unfortunately, I don't have time to improve it. But feel free to use it, modify/rewrite it (in a better language/format), make a beautiful GUI (Gnome/Palsma applet?), packaging/distributing it in a better format (GitHub, package, etc.), Just mention this issue if you do so that people know where it comes from :) Actual work is done in Whisper itself, so let's say my work is worth a C0 license ;), but if it could help/inspire others, you're welcome!

#!/usr/bin/env bash

# Load functions "show_message" & "debug"to display feedback to user in logs & OSD
show_message(){
    echo "$1"
    notify-send -t 800 -c INFO "$1"
}

debug(){
    echo "$1"
    # notify-send -t 1000 -c INFO "$1"
}


export DISPLAY=$(w | grep i3 | awk '{ print $3 }')
export XAUTHORITY=$HOME/.Xauthority

# tiny[-q5_0] / base[-q5_0] / small[-q5_0] /
MODEL=base-q5_0

TMP_WAV_FILE=/tmp/whisper_wav_name
MARKER_FILE=/tmp/whisper_pid
CONTENT_FILE=/tmp/whisper_content

# Set language
if [[ -e /tmp/READER ]]; then
    . /tmp/READER
fi
# Assign default values if they were not defined by /tmp/READER
: ${RECORD_LANG:="fr"} # "en"

# Examples of commands to run Whisper in continuous recognition mode
# DIR=~/.doom.d/packages/whisper.cpp/
# $DIR/stream -m $DIR//models/ggml-base-q5_0.bin -t 8 --step 500 --length 5000
# $DIR/stream -m $DIR//models/ggml-small-q5_0.bin -t 6 --step 0 --length 30000 -vth 0.6
WHISPER_DIR=~/.doom.d/packages/whisper.cpp
RECORD_COMMAND='rec --rate 16k'
WHISPER_COMMAND="${WHISPER_DIR}/main -m ${WHISPER_DIR}/models/ggml-${MODEL}.bin -l ${RECORD_LANG} -f $(cat ${TMP_WAV_FILE})"


if [[ -e ${MARKER_FILE} ]] ; then
    show_message "STOP recording"
    # Kill running 'sox::rec' process
    debug "kill $(cat ${MARKER_FILE})" 
    kill "$(cat ${MARKER_FILE})" || pkill rec
    sleep 1   # wait to be sure .wav file is closed before starting whisper, otherwise it endlessy waits for the file/stream to be closed
    # Run whisper.cpp
    debug "${WHISPER_COMMAND}"
    eval "${WHISPER_COMMAND}" > "${CONTENT_FILE}"
    # Extract text recognized by Whisper
\s*//g'tent=$(grep '^[[]' "${CONTENT_FILE}" | sed 's/^[^]]*][[:space:]]*//g' | tr '\n' ' ') #  's/^[^]]*][[:space:]]*//g'  | sed -r 's/\s*
    # Put it in the Xorg buffer
    echo "${content}" | tee /tmp/whisper_debug
    echo "${content}" | xsel -i
    debug "Extracted content: ${content}"
    show_message "FINISHED parsing"
    # Remove temp files
    \rm -f "${MARKER_FILE}" "${CONTENT_FILE}" "${TMP_WAV_FILE}"
else
    show_message "START Recording - Using with model [${MODEL}] in language [${RECORD_LANG}]"
    # Start Whisper
    (
	# Create filename for recording
	echo "$(mktemp --dry-run).wav" > "${TMP_WAV_FILE}"
	# Expand command
	RECORD_CMD="${RECORD_COMMAND} $(cat "${TMP_WAV_FILE}")"
        debug "${RECORD_CMD}"
        eval "${RECORD_CMD}" &
        # TODO Get its process ID
        WHISPER_PID=$(pgrep -f "${RECORD_CMD}")
        # Store it for later
        debug "${WHISPER_PID}" > "${MARKER_FILE}" # mark the starting of Whisper recording & store PID
    )
fi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants