Skip to content

Commit 1130568

Browse files
committed
Blog: Announce speech-to-text
1 parent 5eb4f0b commit 1130568

File tree

1 file changed

+68
-0
lines changed

1 file changed

+68
-0
lines changed
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
---
2+
slug: announcing-speech-to-text
3+
title: "Announcing speech-to-text: CLI Audio Transcription Using Gemini"
4+
authors: [gingerhendrix]
5+
tags: []
6+
---
7+
8+
I've just released `@autocode2/speech-to-text`, a Node.js library and CLI tool that makes it easy to transcribe speech using Google's Gemini API.
9+
10+
<!-- truncate -->
11+
12+
Gemini is extremely cheap (or free) for audio processing, 1 minute of audio is 1,500 tokens, or 90k tokens/hour. Google's free tier (if you don't mind sharing your audio with Google) offers:
13+
- **1.5 Flash**: 1 million free tokens per minute
14+
- **1.5 Pro**: 32k free tokens per minute (though only 50 requests per day)
15+
16+
Either option allows for a lot of transcription for free, and the quality of Flash is quite reasonable (my accent is hard to parse for many speech-to-text systems, but Flash does a good job).
17+
18+
If you'd rather pay for your tokens (and keep your audio private):
19+
- **1.5 Flash**: $0.075 per million tokens, or 0.01125 cents per minute, or less than 1 cent per hour
20+
- **1.5 Pro**: $1.25 per million tokens, or 0.1875 cents per minute, or 11.25 cents per hour
21+
22+
## Quick Start
23+
24+
The quickest way to try it out is with npx:
25+
26+
```bash
27+
npx @autocode2/speech-to-text --api-key YOUR_API_KEY
28+
```
29+
30+
This will record from your microphone until you press Enter, then transcribe the audio using Gemini's flash model.
31+
32+
## Features
33+
34+
- **Simple CLI Interface**: Record and transcribe with a single command
35+
- **File Support**: Transcribe existing audio files or save recordings
36+
- **Flexible Output**: Text or JSON output, perfect for scripting
37+
- **Library Integration**: Easy to integrate into Node.js projects
38+
- **Custom Prompts**: Fine-tune transcription behavior
39+
40+
## Use Cases
41+
42+
You can use it for:
43+
- Quick voice notes
44+
- Transcribing meetings or lectures
45+
- Processing audio files in batch
46+
- Building transcription into your own tools
47+
- Experimenting with Gemini's audio capabilities
48+
49+
## Technical Details
50+
51+
The tool uses:
52+
- `sox` for high-quality audio recording
53+
- Gemini's flash model for fast, cost-effective transcription
54+
- Node.js streams for efficient processing
55+
56+
## Getting Started
57+
58+
Check out the [GitHub repository](https://github.com/autocode2/speech-to-text/) for:
59+
- Full installation instructions
60+
- Detailed API documentation
61+
- Usage examples
62+
- Configuration options
63+
64+
## Future Plans
65+
66+
This is just the beginning - next I want to add realtime transcription.
67+
68+
Try it out and let me know what you think! Feedback and contributions welcome.

0 commit comments

Comments
 (0)