Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 38 additions & 1 deletion docs/adapters/gemini.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ title: Gemini Adapter
id: gemini-adapter
---

The Google Gemini adapter provides access to Google's Gemini models, including text generation, embeddings, and image generation with Imagen.
The Google Gemini adapter provides access to Google's Gemini models, including text generation, embeddings, image generation with Imagen, and experimental text-to-speech.

## Installation

Expand Down Expand Up @@ -75,6 +75,10 @@ const adapter = createGeminiText(process.env.GEMINI_API_KEY!, config);
- `imagen-3.0-generate-002` - Imagen 3.0
- `gemini-2.0-flash-preview-image-generation` - Gemini with image generation

### Text-to-Speech Models (Experimental)

- `gemini-2.5-flash-preview-tts` - Gemini TTS

## Example: Chat Completion

```typescript
Expand Down Expand Up @@ -269,6 +273,27 @@ const result = await ai({
});
```

## Text-to-Speech (Experimental)

> **Note:** Gemini TTS is experimental and may require the Live API for full functionality.

Generate speech from text:

```typescript
import { ai } from "@tanstack/ai";
import { geminiTTS } from "@tanstack/ai-gemini";

const adapter = geminiTTS();

const result = await ai({
adapter,
model: "gemini-2.5-flash-preview-tts",
text: "Hello from Gemini TTS!",
});

console.log(result.audio); // Base64 encoded audio
```

## Environment Variables

Set your API key in environment variables:
Expand Down Expand Up @@ -340,6 +365,18 @@ Creates a Gemini image generation adapter with an explicit API key.

**Returns:** A Gemini image adapter instance.

### `geminiTTS(config?)`

Creates a Gemini TTS adapter using environment variables.

**Returns:** A Gemini TTS adapter instance.

### `createGeminiTTS(apiKey, config?)`

Creates a Gemini TTS adapter with an explicit API key.

**Returns:** A Gemini TTS adapter instance.

## Next Steps

- [Getting Started](../getting-started/quick-start) - Learn the basics
Expand Down
115 changes: 114 additions & 1 deletion docs/adapters/openai.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ title: OpenAI Adapter
id: openai-adapter
---

The OpenAI adapter provides access to OpenAI's models, including GPT-4o, GPT-5, embeddings, and image generation (DALL-E).
The OpenAI adapter provides access to OpenAI's models, including GPT-4o, GPT-5, embeddings, image generation (DALL-E), text-to-speech (TTS), and audio transcription (Whisper).

## Installation

Expand Down Expand Up @@ -77,6 +77,18 @@ const adapter = createOpenaiText(process.env.OPENAI_API_KEY!, config);
- `gpt-image-1` - Latest image generation model
- `dall-e-3` - DALL-E 3

### Text-to-Speech Models

- `tts-1` - Standard TTS (fast)
- `tts-1-hd` - High-definition TTS
- `gpt-4o-audio-preview` - GPT-4o with audio output

### Transcription Models

- `whisper-1` - Whisper large-v2
- `gpt-4o-transcribe` - GPT-4o transcription
- `gpt-4o-mini-transcribe` - GPT-4o Mini transcription

## Example: Chat Completion

```typescript
Expand Down Expand Up @@ -267,6 +279,83 @@ const result = await ai({
});
```

## Text-to-Speech

Generate speech from text:

```typescript
import { ai } from "@tanstack/ai";
import { openaiTTS } from "@tanstack/ai-openai";

const adapter = openaiTTS();

const result = await ai({
adapter,
model: "tts-1",
text: "Hello, welcome to TanStack AI!",
voice: "alloy",
format: "mp3",
});

// result.audio contains base64-encoded audio
console.log(result.format); // "mp3"
```

### TTS Voices

Available voices: `alloy`, `echo`, `fable`, `onyx`, `nova`, `shimmer`, `ash`, `ballad`, `coral`, `sage`, `verse`

### TTS Provider Options

```typescript
const result = await ai({
adapter: openaiTTS(),
model: "tts-1-hd",
text: "High quality speech",
providerOptions: {
speed: 1.0, // 0.25 to 4.0
},
});
```

## Transcription

Transcribe audio to text:

```typescript
import { ai } from "@tanstack/ai";
import { openaiTranscription } from "@tanstack/ai-openai";

const adapter = openaiTranscription();

const result = await ai({
adapter,
model: "whisper-1",
audio: audioFile, // File object or base64 string
language: "en",
});

console.log(result.text); // Transcribed text
```

### Transcription Provider Options

```typescript
const result = await ai({
adapter: openaiTranscription(),
model: "whisper-1",
audio: audioFile,
providerOptions: {
response_format: "verbose_json", // Get timestamps
temperature: 0,
prompt: "Technical terms: API, SDK",
},
});

// Access segments with timestamps
console.log(result.segments);
```

## Environment Variables

Set your API key in environment variables:
Expand Down Expand Up @@ -331,6 +420,30 @@ Creates an OpenAI image generation adapter with an explicit API key.

**Returns:** An OpenAI image adapter instance.

### `openaiTTS(config?)`

Creates an OpenAI TTS adapter using environment variables.

**Returns:** An OpenAI TTS adapter instance.

### `createOpenaiTTS(apiKey, config?)`

Creates an OpenAI TTS adapter with an explicit API key.

**Returns:** An OpenAI TTS adapter instance.

### `openaiTranscription(config?)`

Creates an OpenAI transcription adapter using environment variables.

**Returns:** An OpenAI transcription adapter instance.

### `createOpenaiTranscription(apiKey, config?)`

Creates an OpenAI transcription adapter with an explicit API key.

**Returns:** An OpenAI transcription adapter instance.

## Next Steps

- [Getting Started](../getting-started/quick-start) - Learn the basics
Expand Down
8 changes: 8 additions & 0 deletions docs/config.json
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,14 @@
{
"label": "Per-Model Type Safety",
"to": "guides/per-model-type-safety"
},
{
"label": "Text-to-Speech",
"to": "guides/text-to-speech"
},
{
"label": "Transcription",
"to": "guides/transcription"
}
]
},
Expand Down
Loading
Loading