Transform PDF documents into audiobooks effortlessly using advanced text extraction and text-to-speech technology. This tool is optimized for efficiency, making it ideal for low-VRAM systems and immersive reading experiences.
Below is a preview of the PDF to Audiobook conversion process:
- 🔍 Intelligent PDF Text Extraction: Removes headers, footers, and page numbers.
- 📖 Chapter or Full-Book Conversion: Extract based on Table of Contents (TOC) or the entire document.
- 🎙️ Customizable Text-to-Speech Settings: Supports multiple TTS models and speaker configurations.
- 💻 Low-Resource Processing: Adjusts chunk sizes dynamically for lower-VRAM systems.
- 🎨 User-Friendly GUI with Theme Customization: Switch themes easily and save preferences.
- Python 3.8+
- FFmpeg: Required for audio processing.
- Piper TTS: For high-quality text-to-speech conversion.
-
Clone the repository
git clone https://github.com/mateogon/pdf-narrator.git cd pdf-narrator
-
Create a virtual environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
-
Install FFmpeg
- Ubuntu/Debian:
sudo apt-get install ffmpeg
- macOS:
brew install ffmpeg
- Windows: Download and install from the FFmpeg official site.
- Ubuntu/Debian:
-
Download Recommended Voice Model
- Visit the Hugging Face Piper Voices page.
- Download the
en_US-libritts-high.onnx
model and its correspondingen_US-libritts-high.onnx.json
file. - Place the downloaded files in the
models/en/
directory:mkdir -p models/en mv /path/to/en_US-libritts-high.onnx models/en/ mv /path/to/en_US-libritts-high.onnx.json models/en/
-
Speaker ID and Voice Samples
- Use Speaker ID 8 for the recommended voice.
- Listen to available voice samples for customization options.
-
Launch the App
python main.py
-
Select PDF
- Browse and select a PDF file.
- Choose extraction mode: chapters or full book.
-
Configure Audio Settings
- Select a Piper TTS model.
- Choose speaker IDs (optional).
- Adjust chunk size for compatibility with your system.
- Select audio output format (
.wav
or.mp3
).
-
Generate Audiobook
- Click Start Process.
- Monitor progress in real-time with progress bars and logs.
- Automatic Cleaning: Removes headers, footers, page numbers, and excessive whitespace.
- TOC Segmentation: If the document has a TOC, chapters are split accordingly.
- Full-Book Mode: Extracts the entire document if no TOC is available or desired.
- Chunk-Based Processing: Splits text into manageable chunks for processing.
- Multi-Speaker Support: Customize speaker IDs for varied narration.
- VRAM-Friendly: Dynamically adjusts chunk size based on system performance.
- Use GPU (CUDA): Select GPU as the processing device for faster TTS generation.
- Adjust Chunk Size: Reduce the chunk size if your system has limited VRAM.
- Experiment with Models: Piper's performance depends on the selected model and speaker IDs.
- PDF Quality: Extraction accuracy depends on the structure of the source PDF.
- TTS Quality: The quality of the generated audiobook may vary depending on the Piper model.
- Processing Time: Large documents may require significant time to process.
- Academia: Convert research papers and textbooks into audiobooks.
- Books: Create audiobooks from eBooks or PDFs.
- Documentation: Narrate user manuals or technical documents.
- Education: Enhance learning with immersive reading.
- Select a PDF file.
- Choose TOC-based or full-book extraction.
- Configure Piper TTS settings (model, speaker, chunk size).
- Generate the audiobook.
- Use the audiobook for immersive reading or standalone listening.
Contributions are welcome! If you have ideas for improvements or new features, please open an issue or create a pull request.
- Piper TTS: High-quality text-to-speech engine.
- PyMuPDF: PDF parsing and text extraction library.
- ttkbootstrap: Modern GUI framework for Python applications.