|
| 1 | +--- |
| 2 | +slug: announcing-speech-to-text |
| 3 | +title: "Announcing speech-to-text: CLI Audio Transcription Using Gemini" |
| 4 | +authors: [gingerhendrix] |
| 5 | +tags: [] |
| 6 | +--- |
| 7 | + |
| 8 | +I've just released `@autocode2/speech-to-text`, a Node.js library and CLI tool that makes it easy to transcribe speech using Google's Gemini API. |
| 9 | + |
| 10 | +<!-- truncate --> |
| 11 | + |
| 12 | +Gemini is extremely cheap (or free) for audio processing, 1 minute of audio is 1,500 tokens, or 90k tokens/hour. Google's free tier (if you don't mind sharing your audio with Google) offers: |
| 13 | +- **1.5 Flash**: 1 million free tokens per minute |
| 14 | +- **1.5 Pro**: 32k free tokens per minute (though only 50 requests per day) |
| 15 | + |
| 16 | +Either option allows for a lot of transcription for free, and the quality of Flash is quite reasonable (my accent is hard to parse for many speech-to-text systems, but Flash does a good job). |
| 17 | + |
| 18 | +If you'd rather pay for your tokens (and keep your audio private): |
| 19 | +- **1.5 Flash**: $0.075 per million tokens, or 0.01125 cents per minute, or less than 1 cent per hour |
| 20 | +- **1.5 Pro**: $1.25 per million tokens, or 0.1875 cents per minute, or 11.25 cents per hour |
| 21 | + |
| 22 | +## Quick Start |
| 23 | + |
| 24 | +The quickest way to try it out is with npx: |
| 25 | + |
| 26 | +```bash |
| 27 | +npx @autocode2/speech-to-text --api-key YOUR_API_KEY |
| 28 | +``` |
| 29 | + |
| 30 | +This will record from your microphone until you press Enter, then transcribe the audio using Gemini's flash model. |
| 31 | + |
| 32 | +## Features |
| 33 | + |
| 34 | +- **Simple CLI Interface**: Record and transcribe with a single command |
| 35 | +- **File Support**: Transcribe existing audio files or save recordings |
| 36 | +- **Flexible Output**: Text or JSON output, perfect for scripting |
| 37 | +- **Library Integration**: Easy to integrate into Node.js projects |
| 38 | +- **Custom Prompts**: Fine-tune transcription behavior |
| 39 | + |
| 40 | +## Use Cases |
| 41 | + |
| 42 | +You can use it for: |
| 43 | +- Quick voice notes |
| 44 | +- Transcribing meetings or lectures |
| 45 | +- Processing audio files in batch |
| 46 | +- Building transcription into your own tools |
| 47 | +- Experimenting with Gemini's audio capabilities |
| 48 | + |
| 49 | +## Technical Details |
| 50 | + |
| 51 | +The tool uses: |
| 52 | +- `sox` for high-quality audio recording |
| 53 | +- Gemini's flash model for fast, cost-effective transcription |
| 54 | +- Node.js streams for efficient processing |
| 55 | + |
| 56 | +## Getting Started |
| 57 | + |
| 58 | +Check out the [GitHub repository](https://github.com/autocode2/speech-to-text/) for: |
| 59 | +- Full installation instructions |
| 60 | +- Detailed API documentation |
| 61 | +- Usage examples |
| 62 | +- Configuration options |
| 63 | + |
| 64 | +## Future Plans |
| 65 | + |
| 66 | +This is just the beginning - next I want to add realtime transcription. |
| 67 | + |
| 68 | +Try it out and let me know what you think! Feedback and contributions welcome. |
0 commit comments