Welcome! This project is an educational tool designed to show how modern AI and classic programming techniques can be combined to create a useful application. It started as a simple desktop app and evolved into a powerful, web-based AI tool hosted on Hugging Face.
The application has two main functions:
- A Syllable Splitter: It takes an English word, splits it into its phonetic syllables, and lets you hear them spoken.
- A Direct Text-to-Speech (TTS ) Tool: It can take any English text and convert it into high-quality speech using an AI model.
This repository is perfect for learners interested in Python, GUI development, and how to use AI models in a practical project.
- Live Demo
- Features
- How It Works: The Technology
- How to Run This Project
- Project Evolution: From Desktop App to AI Web App
- AI Transparency: A Note on Collaboration
You can try out the final, web-based version of this application live on Hugging Face Spaces:
AI Syllable and Text-to-Speech Tool Live
- Two Tools in One: A dedicated tool for syllable analysis and a general-purpose Text-to-Speech engine.
- AI-Powered Speech: Uses the high-quality Coqui TTS model for natural and clear audio generation.
- Interactive Interface: Built with Gradio, the interface is user-friendly and allows for editing the syllabified text before generating audio.
- Web-Based and Accessible: As a Hugging Face Space, the tool requires no installation and can be used by anyone with a web browser.
This project combines several key libraries to achieve its functionality:
- Gradio: Used to build and host the interactive web interface. It's a fantastic Python library for creating demos for machine learning models.
- Pyphen: A library for splitting words into syllables. It uses dictionary-based rules to ensure phonetic accuracy.
- Coqui TTS (
🐸 TTS): A powerful, open-source library for Text-to-Speech. We use one of its pre-trained English models to convert text into spoken audio. - PyTorch: The underlying machine learning framework that runs the Coqui TTS model.
The application is structured with a clear, two-step workflow for the syllable tool, making it easy to see the intermediate result before hearing the final audio.
Since the final version is a web app, the easiest way to use it is via the Live Demo link.
However, if you wish to run the project on your own computer to experiment with the code, Hugging Face makes this very simple:
- Go to the project's AI Syllable and Text-to-Speech Tool Live.
- Click on the three dots ( • • • ) menu icon at the top-right of the page.
- Select "Run locally".
- You can follow the instructions to run it locally on your machine. This process handles the dependencies and setup for you.
This project didn't start as a web app. Its journey is a great lesson in software development:
- Initial Goal: Create a simple Python script to split syllables.
- Desktop App: We first built a desktop application using
Tkinterand a basic, offline TTS engine (pyttsx3). - The Limitation: We discovered that the basic Windows TTS voices struggled to pronounce isolated syllables correctly (e.g., reading "cor" as "C-O-R").
- Check the desktop version here: Syllable-Splitter-and-Speaker
- The Pivot to AI: To solve this, we decided to use a modern AI-powered TTS model. We chose to host it on Hugging Face to avoid requiring users to have powerful hardware.
- Final Version: The project evolved into a full-fledged Gradio web application, which is more powerful, accessible, and provides much higher-quality results than the original desktop app.
This evolution shows how encountering limitations can lead to better, more modern solutions.
This project was developed collaboratively between a human developer and Manus, an AI agent from the Manus team.