Skip to content

black-da-bull/web-whisperer

Repository files navigation

WEB WHISPER

🎶 Convert any audio to text 📝


A user interface for OpenAI's Whisper right into your browser!

This is a small personal project I am using to learn Golang and Svelte. It is a light web frontend for OpenAI's whisper.

Contents:

✨ Features:

  • Record and transcribe audio right from your browser.
  • Upload any media file (video, audio) in any format and transcribe it.
    • Option to cut audio to X seconds before transcription.
    • Option to disable file uploads.
  • Select input audio language
  • Translate input audio transcription to english.
  • Download .srt subtitle file generated from audio.
  • Choose the Whisper model you want to use (tiny, base, small...)
  • Lightweight and beautiful UI.
  • Self-hosted. No 3rd parties.
  • Docker compose for easy self-hosting
  • Privacy respecting:
    • All happens locally. No third parties involved.
    • Audio files are deleted immediately after processing.
  • Backend written in Go
  • Frontend written with Svelte and Tailwind CSS.
  • Uses C++ whisper version from whisper.cpp.
    • You don't need a GPU, uses CPU.
    • No need for complex installations.

🧭 Roadmap:

  • Allow to upload any file (video, audio) in any format and transcribe it.
    • Limit max file size for server hosting.
    • Allow to cut audio to X seconds before transcription.
    • Option to disable file uploads.
  • Transcription history / save snippets
    • Publish to some pastebin-like service.

🧪 Test it!

You can easily self host your own instance with docker (locally or in a server).

Also, I have made testing instance available at: https://whisper.r3d.red

Note that this instance is limited:

  • Maximum of 10 seconds audio recordings
  • File uploads are disabled.
  • Uses the base model.

Screenshots

*Logo generated with Stable Diffusion*

Main page

Recording

Transcription Options

Processing

Result

Other information

How fast is this?

Whisper.cpp usually provides faster results than the python implementation. Although it will highly depend on your machine resources, the length of the media source and the file size. Here is a little benchmark:

CPU: i7
RAM: 16
Input format: webm audio
File size: 7MB
Audio length: 30m

Total elapsed time: 7m 38s

Similar projects

  • Whisper WASM - If you want to run Whisper directly in your browser without the need of a server, you can use this project. Note that performance for this version is not very good.