This project allows users to upload text or PDF files, process the raw text using Groq's API for structuring and cleaning, and then convert the processed text into speech using the Web Speech API. The goal is to create a fully functional audiobook-like experience by transforming raw text into narrated audio.
- Text Upload: Upload
.txt
or.pdf
files. - Groq Integration: Use Groq's API to structure and clean the text for audiobook narration.
- Speech Synthesis: Convert the cleaned text into speech with customizable voice, volume, rate, and pitch.
- File Processing: PDFs are processed using
pdfjs-dist
, with optional OCR extraction if needed.
- Node.js installed (v14.x or higher).
- Git installed (for cloning the repository).
-
Clone the Repository:
git clone https://github.com/rachan2005/pdf_uploader.git cd pdf_uploader
-
Install Dependencies:
Install the required npm packages by running the following command:
npm install
-
Set Up the
.env
File:Create a
.env
file in the root directory and add the following:VITE_GROQ_API_KEY=your-groq-api-key
Replace
your-groq-api-key
with your actual Groq API key. -
Start the Development Server:
To run the application locally, use the following command:
npm run dev
The app will be available at
http://localhost:3000
.
-
Upload a File:
- Upload a
.txt
or.pdf
file using the file input field. - The system will process the file, extract the text, and use Groq API to clean and structure it.
- Upload a
-
Text to Speech:
- Use the text box to either type or edit the text.
- Select a voice from the dropdown list and adjust the volume, rate, and pitch using the respective sliders.
- Click the play button to start the narration.
-
Control Playback:
- Pause, resume, or reset the narration as needed using the respective buttons.
The project interacts with the Groq API using the key stored in the .env
file.
processTextWithGroq(rawText)
: Sends the raw text to the Groq API to clean and structure it for audiobook narration.renderPdfPageToImage(pdfData, pageNum)
: Renders each page of the PDF as an image for OCR (if necessary).extractTextWithOCRFromPdf(pdfData)
: Usespdfjs-dist
andTesseract.js
for OCR to extract text from PDF files.
Ensure that your .env
file contains your Groq API key:
VITE_GROQ_API_KEY=your-groq-api-key
- React - Frontend framework for building the user interface.
- Vite - Fast build tool and development server.
- Groq API - Text structuring and cleaning API.
- Web Speech API - Used for converting text to speech.
- pdfjs-dist - Library for parsing PDF files.
- Tesseract.js (Optional) - For OCR if the PDF is image-based.
- If you face issues with the API key, make sure the
.env
file is correctly configured and located in the root directory. - If the text-to-speech feature doesn't work, check the browser's support for the Web Speech API.
- Ensure all dependencies are installed correctly by running
npm install
again.