Skip to content
@speechcatcher-asr

speechcatcher-asr

Speechcatcher

Speechcatcher is an open source toolbox for transcribing and translating speech from media files (audio/video). Speechcatcher models are trained using whisper as teacher and offer compact and small ASR models that run fast on CPUs too:

Speechcatcher Teacher/student training

Speechcatcher CLI

You can find the command line interface here. It can transcribe any media file and can also be used for live transcription with your microphone. In this repository, there is also an overview of all available speechcatcher models.

Data

Scripts to replicate the data gathering can be found in: speechcatcher-data. There also instructions on how to replicate the training procedure with espnet.

Webgui

Speechcatcher also comes with an easy to use webgui. It supports multiple ASR engines: speechcatcher (CPU), subtitle2go (CPU) or whisper (GPU).

Benchmarks

By using models that target a single language, Speechcatcher models aim to be much faster than single-model transcribe systems for multiple languages such as whisper.

See our results here.

Currently the focus is on transcribing German speech. Later, more languages might be added. If you would like to help to expand Speechcatcher, please get in touch!

Citation

If you use speechcatcher models in your research, for now just cite this repository:

@misc{milde2023speechcatcher,
  author = {Milde, Benjamin},
  title = {Speechcatcher},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/speechcatcher-asr/speechcatcher}},
}

Sponsors

Speechcatcher is gracefully funded by

Media Tech Lab by Media Lab Bayern (@media-tech-lab)

Popular repositories Loading

  1. speechcatcher speechcatcher Public

    Python 39 8

  2. speechcatcher-data speechcatcher-data Public

    Python 9

  3. speechcatcher-webgui speechcatcher-webgui Public

    Python 4

  4. espnet_streaming_decoder espnet_streaming_decoder Public

    An espnet streaming decoder with a smaller footprint than the entire espnet project

    Python 1 1

  5. .github .github Public

  6. espnet espnet Public

    Forked from espnet/espnet

    End-to-End Speech Processing Toolkit

    Python

Repositories

Showing 8 of 8 repositories
  • speechcatcher-asr/speechcatcher-data’s past year of commit activity
    Python 9 MIT 0 0 0 Updated Dec 23, 2024
  • speechcatcher Public
    speechcatcher-asr/speechcatcher’s past year of commit activity
    Python 39 MIT 8 4 0 Updated Dec 13, 2024
  • espnet_streaming_decoder Public

    An espnet streaming decoder with a smaller footprint than the entire espnet project

    speechcatcher-asr/espnet_streaming_decoder’s past year of commit activity
    Python 1 Apache-2.0 1 0 0 Updated Sep 17, 2024
  • espnet_model_zoo Public Forked from espnet/espnet_model_zoo

    ESPnet Model Zoo

    speechcatcher-asr/espnet_model_zoo’s past year of commit activity
    Python 0 Apache-2.0 40 0 0 Updated Dec 13, 2023
  • espnet Public Forked from espnet/espnet

    End-to-End Speech Processing Toolkit

    speechcatcher-asr/espnet’s past year of commit activity
    Python 0 Apache-2.0 2,209 0 0 Updated Apr 29, 2023
  • speechcatcher-asr/speechcatcher-webgui’s past year of commit activity
    Python 4 MIT 0 0 0 Updated Apr 14, 2023
  • .github Public
    speechcatcher-asr/.github’s past year of commit activity
    0 MIT 0 0 0 Updated Apr 6, 2023
  • test_kaldi_io_soundfile_flac Public

    Tests kaldiio + writing out a flac

    speechcatcher-asr/test_kaldi_io_soundfile_flac’s past year of commit activity
    Python 0 MIT 0 0 0 Updated Feb 1, 2023

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…