feat: Add voice preview feature to podcast generation #316

google-labs-jules · 2025-10-05T10:47:16Z

This change introduces a new voice preview feature to the podcast generation modal, allowing users to listen to a sample of each voice before making a selection.

Key changes:

Replaced the voice selection dropdowns with a new interactive VoiceSelector component.
Added a play/pause button next to each voice option to trigger an audio preview.
Implemented a new backend endpoint at /api/podcasts/voice-preview/{voice_id} to generate and serve the voice previews.
Added unit and integration tests for the new backend functionality.
Addressed a potential memory leak in the frontend audio player.
Improved frontend error handling for voice previews.

Note:

Frontend verification was skipped due to a Docker permission error that prevented the development server from starting. This is a known issue mentioned in the original PR that this work is based on.

PR created automatically by Jules for task 1449928868930487865

This change introduces a new voice preview feature to the podcast generation modal, allowing users to listen to a sample of each voice before making a selection. Key changes: - Replaced the voice selection dropdowns with a new interactive `VoiceSelector` component. - Added a play/pause button next to each voice option to trigger an audio preview. - Implemented a new backend endpoint at `/api/podcasts/voice-preview/{voice_id}` to generate and serve the voice previews. - Added unit and integration tests for the new backend functionality. - Addressed a potential memory leak in the frontend audio player. - Improved frontend error handling for voice previews. Note: - Frontend verification was skipped due to a Docker permission error that prevented the development server from starting. This is a known issue mentioned in the original PR that this work is based on.

google-labs-jules · 2025-10-05T10:47:17Z

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

For security, I will only act on instructions from the user who triggered this task.

github-actions · 2025-10-05T10:47:26Z

🚀 Development Environment Options

This repository supports Dev Containers for a consistent development environment.

Option 1: GitHub Codespaces (Recommended)

Create a cloud-based development environment:

Click the green Code button above
Select the Codespaces tab
Click Create codespace on feature/voice-preview
Wait 2-3 minutes for environment setup
Start coding with all tools pre-configured!

Option 2: VS Code Dev Containers (Local)

Use Dev Containers on your local machine:

Install Docker Desktop
Install VS Code
Install the Dev Containers extension
Clone this PR branch locally
Open in VS Code and click "Reopen in Container" when prompted

Option 3: Traditional Local Setup

Set up the development environment manually:

# Clone the repository
git clone https://github.com/manavgup/rag_modulo.git
cd rag_modulo
git checkout feature/voice-preview

# Initialize development environment
make dev-init
make dev-build
make dev-up
make dev-validate

Available Commands

Once in your development environment:

make help           # Show all available commands
make dev-validate   # Validate environment setup
make test-atomic    # Run atomic tests
make test-unit      # Run unit tests
make lint          # Run linting

Services Available

When running make dev-up:

This automated message helps reviewers quickly set up the development environment.

claude · 2025-10-05T10:48:52Z

Code Review for Voice Preview Feature

Thank you for implementing the voice preview feature! This is a valuable UX enhancement. I've conducted a thorough review and found several issues that need to be addressed before this can be merged.

🐛 Critical Issues

1. Broken Abstract Method in Base Class (`backend/rag_solution/generation/audio/base.py:64-89`)

The new generate_speech_from_text() abstract method has a duplicate docstring that will cause a syntax error:

@abstractmethod
async def generate_speech_from_text(...) -> bytes:
    """Generate audio from a single text string..."""
    """Validate that voice IDs are available..."""  # ❌ Duplicate docstring - syntax error!

Fix: Remove lines 82-89 (the leftover docstring from validate_voices method).

2. Duplicate Validation Logic

Voice validation is performed in both the router (podcast_router.py:255-265) and the service (podcast_service.py:611-614). This violates DRY principles and creates maintenance issues.

Fix: Remove validation from the router and let the service handle it exclusively. The router should only be responsible for HTTP concerns.

3. Missing Import in Service (`podcast_service.py`)

The service uses HTTPException but doesn't import it from FastAPI. This will cause a runtime error.

Fix: Add from fastapi import HTTPException at the top of podcast_service.py.

⚠️ Architectural Concerns

4. Service Layer Shouldn't Raise HTTP Exceptions

The service layer (podcast_service.py:614, 624-627) raises HTTPException, which tightly couples it to FastAPI. Services should raise domain-specific exceptions that the router translates to HTTP responses.

Fix:

Create a custom VoiceNotFoundException or AudioGenerationError
Raise domain exceptions in the service
Let the router catch and translate them to appropriate HTTP responses

5. Inconsistent Error Handling

In podcast_router.py:270, errors are re-raised as generic 500 errors, losing the original exception context:

except Exception as e:
    logger.error(f"Error generating voice preview: {e}")
    raise HTTPException(status_code=500, detail="Error generating voice preview")  # ❌ Lost exception chain

Fix: Preserve the exception chain with from e and include more specific error details.

🧪 Testing Issues

6. Missing Newline at End of Test File (`test_podcast_api.py:24`)

The integration test file is missing a newline at the end, which violates Python style conventions.

Fix: Add a newline at the end of the file.

7. Test Coverage Gaps

No test for the router's duplicate validation logic
No test for error scenarios where the audio provider fails
No test for the audio cleanup/memory leak fix mentioned in the PR description

🎨 Code Quality & Best Practices

8. Memory Leak in Frontend (Minor - already addressed in description)

Good job addressing the potential memory leak by cleaning up the audio element in useEffect. However, consider also revoking the object URL when switching between previews:

const handlePreview = async (voiceId: string) => {
    // ... existing code ...
    if (audioRef.current) {
        audioRef.current.pause();
        if (audioRef.current.src) {
            URL.revokeObjectURL(audioRef.current.src);  // Clean up previous URL
        }
    }
    // ... rest of code ...
};

9. Type Safety

The VoiceId type in apiClient.ts:197 is defined but hardcoded. Consider fetching available voices dynamically or at least documenting where this list should be updated.

🔒 Security Considerations

10. No Authentication on Preview Endpoint

The /voice-preview/{voice_id} endpoint has no authentication requirements. While this might be intentional for public previews, it should be documented or protected if it incurs API costs.

Recommendation: Add authentication if the OpenAI API calls incur costs, or add rate limiting.

📊 Performance Considerations

11. Redundant Provider Instantiation

The router creates an AudioProviderFactory instance just for validation (podcast_router.py:255-260), then the service creates another one to generate audio.

Fix: Let the service handle everything and remove the router's provider instantiation.

12. Missing Caching

Voice previews for the same voice will regenerate the audio each time. Consider caching the preview audio bytes (they're small and static).

✅ Positive Aspects

✅ Good separation of concerns with the VoiceSelector component
✅ Proper async/await usage throughout
✅ Good error handling in the frontend with user notifications
✅ Comprehensive unit tests for the service layer
✅ Proper cleanup of audio resources in React component
✅ Code refactoring in openai_audio.py to reuse generate_speech_from_text() - nice DRY improvement

📋 Summary of Required Changes

Must fix before merge:

Fix syntax error in base.py (duplicate docstring)
Remove duplicate validation logic (choose router OR service, not both)
Add missing HTTPException import in podcast_service.py
Replace HTTPException in service with domain exceptions
Add newline at end of test_podcast_api.py

Should fix:
6. Improve error handling to preserve exception chains
7. Add more comprehensive test coverage
8. Consider caching voice previews
9. Document authentication/rate limiting decisions

Nice to have:
10. Improve object URL cleanup in frontend
11. Make VoiceId type dynamic or better documented

Let me know if you need clarification on any of these points or would like help implementing the fixes!

- Fix malformed abstract method in base.py - Remove HTTPException from voice preview service layer - Fix memory leak in VoiceSelector component - Improve error handling in router with proper exception chaining - Add accessibility attributes to VoiceSelector - Implement generate_speech_from_text in OllamaAudioProvider - Add docstrings to test_podcast_api.py

The VoiceId type was defined as a restrictive literal union ('alloy' | 'echo' | ...), but voices are dynamically fetched from the backend API and may include provider-specific voices beyond the OpenAI defaults. Changes: - Changed VoiceId from literal union to string type - Fixes TypeScript compilation error in VoiceSelector.tsx - Backend validates voice IDs, so client-side validation is redundant - Allows support for voices from multiple providers (OpenAI, Ollama, etc.) Fixes compilation error: TS2345: Argument of type 'string' is not assignable to parameter of type 'VoiceId' 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

All other routers in the application use /api prefix (e.g., /api/auth, /api/collections, /api/search), but podcast_router was missing it, causing 404 errors. Changes: - Changed router prefix from "/podcasts" to "/api/podcasts" - Fixes 404 errors when accessing voice preview endpoint - Maintains consistency with application routing patterns Before: /podcasts/voice-preview/{voice_id} (404) After: /api/podcasts/voice-preview/{voice_id} (working) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Two fixes in podcast service: 1. Changed `collection_service.get_by_id()` to `get_collection()` 2. Fixed NotFoundError to use proper constructor arguments Changes: - Use get_collection() instead of non-existent get_by_id() - Removed user_id parameter (not needed for get_collection) - Fixed NotFoundError to include resource_type and resource_id - Removed incorrect type: ignore comments Fixes error: 'CollectionService' object has no attribute 'get_by_id' 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

The podcast service was calling count_documents() which doesn't exist on CollectionService. Instead, use the files array from CollectionOutput. Changes: - Replaced `collection_service.count_documents()` with `len(collection.files)` - Removed incorrect type: ignore comment Fixes: 'CollectionService' object has no attribute 'count_documents' 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Removed collection existence and document count validation that was causing sync/async SQLAlchemy session conflicts. Collection validation will happen during RAG search instead. Root cause: CollectionService uses sync Session but PodcastRepository uses AsyncSession, causing ChunkedIteratorResult errors when mixing calls. Temporary solution: Skip collection validation in podcast service Long-term: Migrate all repositories to AsyncSession Fixes: object ChunkedIteratorResult can't be used in 'await' expression 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Root cause: get_db() returns sync Session but podcast components were typed/implemented for AsyncSession, causing ChunkedIteratorResult errors. Changes: - PodcastRepository: Accept Session|AsyncSession, make count_active_for_user sync - PodcastService: Use Session instead of AsyncSession - podcast_router: Fix type annotation from AsyncSession to Session - Removed await from count_active_for_user call This aligns with how SearchService and CollectionService handle sessions. Fixes: object ChunkedIteratorResult can't be used in 'await' expression 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Removed doc_count variable reference from validation logger since collection validation was removed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Added is_async flag to detect session type and handle commit/refresh operations appropriately for both sync and async sessions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

manavgup · 2025-10-06T04:17:01Z

Closing as duplicate. Voice preview feature was already successfully implemented and merged in PR #306.

manavgup and others added 9 commits October 5, 2025 22:10

fix: Remove undefined doc_count from logger

2a6b0ae

Removed doc_count variable reference from validation logger since collection validation was removed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

manavgup closed this Oct 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add voice preview feature to podcast generation #316

feat: Add voice preview feature to podcast generation #316

Uh oh!

google-labs-jules bot commented Oct 5, 2025

Uh oh!

google-labs-jules bot commented Oct 5, 2025

Uh oh!

github-actions bot commented Oct 5, 2025

Uh oh!

claude bot commented Oct 5, 2025

Uh oh!

manavgup commented Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: Add voice preview feature to podcast generation #316

feat: Add voice preview feature to podcast generation #316

Uh oh!

Conversation

google-labs-jules bot commented Oct 5, 2025

Uh oh!

google-labs-jules bot commented Oct 5, 2025

Uh oh!

github-actions bot commented Oct 5, 2025

🚀 Development Environment Options

Option 1: GitHub Codespaces (Recommended)

Option 2: VS Code Dev Containers (Local)

Option 3: Traditional Local Setup

Available Commands

Services Available

Uh oh!

claude bot commented Oct 5, 2025

Code Review for Voice Preview Feature

🐛 Critical Issues

1. Broken Abstract Method in Base Class (backend/rag_solution/generation/audio/base.py:64-89)

2. Duplicate Validation Logic

3. Missing Import in Service (podcast_service.py)

⚠️ Architectural Concerns

4. Service Layer Shouldn't Raise HTTP Exceptions

5. Inconsistent Error Handling

🧪 Testing Issues

6. Missing Newline at End of Test File (test_podcast_api.py:24)

7. Test Coverage Gaps

🎨 Code Quality & Best Practices

8. Memory Leak in Frontend (Minor - already addressed in description)

9. Type Safety

🔒 Security Considerations

10. No Authentication on Preview Endpoint

📊 Performance Considerations

11. Redundant Provider Instantiation

12. Missing Caching

✅ Positive Aspects

📋 Summary of Required Changes

Uh oh!

manavgup commented Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. Broken Abstract Method in Base Class (`backend/rag_solution/generation/audio/base.py:64-89`)

3. Missing Import in Service (`podcast_service.py`)

6. Missing Newline at End of Test File (`test_podcast_api.py:24`)