Skip to content

namastexlabs/hello

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hello 🎙️

Hello is a project aimed at democratizing audio processing and transcription services. It provides an automated solution for monitoring folders, processing audio files, and generating transcriptions using either Faster Whisper (self-hosted) or Groq Whisper.

🚀 Features

  • 📁 Automatic folder monitoring for new audio files
  • 🔄 Real-time audio processing and transcription
  • 🗄️ SQLite database for storing processed files and transcriptions
  • 📊 Performance tracking and statistics
  • 🌐 FastAPI server for status updates and file searching
  • 🔌 Support for multiple transcription providers (Faster Whisper and Groq Whisper)
  • 📄 CSV export of transcription data

🛠️ Installation

Click to expand installation instructions

Prerequisites

NVIDIA Library Installation

Option 1: Use Docker

The libraries are pre-installed in official NVIDIA CUDA Docker images:

  • nvidia/cuda:12.0.0-runtime-ubuntu20.04
  • nvidia/cuda:12.0.0-runtime-ubuntu22.04
Option 2: Install with pip (Linux only)
pip install nvidia-cublas-cu12 nvidia-cudnn-cu12
export LD_LIBRARY_PATH=`python3 -c 'import os; import nvidia.cublas.lib; import nvidia.cudnn.lib; print(os.path.dirname(nvidia.cublas.lib.__file__) + ":" + os.path.dirname(nvidia.cudnn.lib.__file__))'`

Note: Ensure you're using cuDNN 8, as version 9+ may cause issues.

Option 3: Download from Purfview's repository (Windows & Linux)

Download the required NVIDIA libraries from Purfview's whisper-standalone-win repository. Extract the archive and add the library directory to your system's PATH.

For detailed installation instructions, refer to the official NVIDIA documentation.

Steps

  1. Clone the repository:

    git clone https://github.com/namastexlabs/hello.git
    cd hello
  2. Create a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
  3. Install the required packages:

    pip install -r requirements.txt
  4. Set up environment variables: Create a .env file in the project root and add the following:

    GROQ_API_KEYS=your_groq_api_key1,your_groq_api_key2
    RECORDINGS_PATH=./recordings
    

🚀 Usage

Run the main script with desired options:

python main.py --language pt --provider faster_whisper --model_size large-v3
Click to see available command-line arguments
  • --language: Language code for transcription (default: pt)
  • --provider: Transcription provider (choices: groq, faster_whisper; default: faster_whisper)
  • --model_size: Model size for Faster Whisper (default: large-v3)
  • --device: Device for Faster Whisper (default: cuda)
  • --compute_type: Compute type for Faster Whisper (default: float16)
  • --log-level: Set the logging level (choices: DEBUG, INFO, WARNING, ERROR, CRITICAL; default: INFO)
  • --clean-stats: Clean the transcription stats database
  • --stats-db: Path to the stats database (default: transcription_stats.db)
  • --database: Path to the main database (default: processed_files.db)

For a full list of Faster Whisper-specific options, run:

python main.py --help

🌐 API Endpoints

  • /healthz: Health check endpoint
  • /status: Get current processing status
  • /search-files: Search processed files with optional filters
  • /api-key-status: Check the status of API keys

📊 Example Project

Click to see example project details

The example project (TODO) will showcase an end-to-end solution that:

  1. Captures office activity
  2. Transcribes recordings every few minutes
  3. Saves timestamped database records
  4. Provides API access to transcriptions

This setup aims to facilitate easier access to transcriptions for agent systems.

🗺️ Roadmap

  • Implement MONITOR_FOLDER environment variable for dynamic folder monitoring
  • Develop a user interface for easier management and visualization
  • Implement real-time audio streaming and transcription
  • Optimize performance for large-scale deployments
  • Develop plugins for popular audio recording software

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

❓ FAQ

Known Errors

Error: Could not load library libcudnn_ops_infer.so.8

This error occurs when the environment does not have the CUDA toolkit installed or properly configured. Ensure that you have the CUDA toolkit installed and that your environment variables are correctly set up.

You can download the CUDA toolkit from the NVIDIA website.

🌟 Community

Join our Discord community to discuss the project, get help, and contribute: https://discord.gg/MXa5GsVcCB

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgements

  • Faster Whisper for the efficient transcription engine
  • Groq for their Whisper API
  • All contributors and supporters of the project

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages