feat: Implement script word count validation with adaptive retry

## 📋 Description

The LLM can generate podcast scripts that are significantly shorter or longer than requested, but the system accepts whatever comes back without validation or retry. This leads to podcasts with wildly incorrect durations.

**Example:**
- User requests 15-minute podcast (target: 2,250 words at 150 WPM)
- LLM generates only 500 words (ignores instruction)
- System accepts it → Result: 3-minute podcast instead of 15 minutes

## 🎯 Goals

1. **Validate script word count** after LLM generation
2. **Implement retry mechanism** if word count is way off
3. **Use adaptive prompts** that learn from previous failures
4. **Track retry attempts** for debugging and metrics

## 📍 Current State

**Current Flow:**
```
1. Calculate target word count (e.g., 2,250 for 15 min)
2. Ask LLM to generate script with word count instruction
3. LLM returns script (could be 500 words or 5,000 words!)
4. ❌ Accept whatever comes back - NO VALIDATION
5. Generate audio from potentially wrong-length script
```

**Evidence:**
- See `backend/tests/unit/test_podcast_duration_control_unit.py`:
  - `test_llm_generates_too_short_script_no_validation`
  - `test_llm_generates_too_long_script_no_validation`
  - `test_no_retry_mechanism_for_short_script`
  - `test_no_adaptive_prompt_based_on_previous_attempts`

## ✅ Acceptance Criteria

### Phase 1: Validation
- [ ] Count words in generated script
- [ ] Validate against target word count
- [ ] If < 80% of target OR > 120% of target, mark as failed
- [ ] Log word count mismatch details

### Phase 2: Retry Mechanism
- [ ] If validation fails, retry up to 3 times
- [ ] Track retry count and reasons
- [ ] Each retry uses adjusted prompt
- [ ] After 3 failures, mark podcast as `FAILED` with reason

### Phase 3: Adaptive Prompts
- [ ] First attempt: Standard prompt with target word count
- [ ] If too short: "Previous attempt was only X words. Generate EXACTLY Y words with more detail."
- [ ] If too long: "Previous attempt was X words. Generate EXACTLY Y words, be more concise."
- [ ] Include previous failure context in retry prompts

### Phase 4: Voice Speed Consideration (Future)
- [ ] Adjust word count calculation based on voice speed setting
- [ ] At 1.5x speed: target_words = duration * 150 * 1.5
- [ ] Store speed-adjusted target in metadata

## 🛠️ Implementation Strategy

### Script Validation Logic

```python
def _validate_script_word_count(
    self,
    script: str,
    target_word_count: int,
    min_word_count: int,
    max_word_count: int
) -> tuple[bool, int, str | None]:
    """Validate script word count is within acceptable range.
    
    Returns:
        (is_valid, actual_count, error_message)
    """
    actual_count = len(script.split())
    
    if actual_count < min_word_count:
        error = (
            f"Script too short: {actual_count} words "
            f"(need at least {min_word_count}, target {target_word_count})"
        )
        return False, actual_count, error
    
    if actual_count > max_word_count:
        error = (
            f"Script too long: {actual_count} words "
            f"(max {max_word_count}, target {target_word_count})"
        )
        return False, actual_count, error
    
    return True, actual_count, None
```

### Retry Logic with Adaptive Prompts

```python
async def _generate_script_with_retry(
    self,
    podcast_input: PodcastGenerationInput,
    rag_results: str,
    max_retries: int = 3
) -> str:
    """Generate script with validation and retry."""
    
    target_word_count = self._calculate_target_word_count(podcast_input.duration)
    min_word_count = int(target_word_count * 0.8)  # 80% of target
    max_word_count = int(target_word_count * 1.2)  # 120% of target
    
    previous_attempts: list[dict] = []
    
    for attempt in range(max_retries):
        # Generate prompt (adaptive based on previous failures)
        prompt_context = self._build_adaptive_prompt_context(
            previous_attempts, target_word_count
        )
        
        # Generate script
        script = await self._generate_script(
            podcast_input, rag_results, prompt_context
        )
        
        # Validate word count
        is_valid, actual_count, error = self._validate_script_word_count(
            script, target_word_count, min_word_count, max_word_count
        )
        
        if is_valid:
            logger.info(
                f"Script generated successfully: {actual_count} words "
                f"(target: {target_word_count}) on attempt {attempt + 1}"
            )
            return script
        
        # Track failed attempt
        previous_attempts.append({
            "attempt": attempt + 1,
            "actual_count": actual_count,
            "target_count": target_word_count,
            "error": error
        })
        
        logger.warning(f"Attempt {attempt + 1} failed: {error}")
    
    # All retries failed
    raise PodcastGenerationError(
        f"Failed to generate script with correct length after {max_retries} attempts. "
        f"Last attempt: {previous_attempts[-1]['actual_count']} words "
        f"(target: {target_word_count})"
    )

def _build_adaptive_prompt_context(
    self,
    previous_attempts: list[dict],
    target_word_count: int
) -> str:
    """Build adaptive prompt context based on previous failures."""
    
    if not previous_attempts:
        return ""
    
    last_attempt = previous_attempts[-1]
    last_count = last_attempt["actual_count"]
    
    if last_count < target_word_count:
        # Previous attempt was too short
        return (
            f"\n\nIMPORTANT: Your previous attempt was only {last_count} words, "
            f"which is too short. This time, generate EXACTLY {target_word_count} words "
            f"by adding more detail, examples, and explanations."
        )
    else:
        # Previous attempt was too long
        return (
            f"\n\nIMPORTANT: Your previous attempt was {last_count} words, "
            f"which is too long. This time, generate EXACTLY {target_word_count} words "
            f"by being more concise and focused."
        )
```

### Schema Updates

**Add to `PodcastGenerationOutput`:**
```python
class PodcastGenerationOutput(BaseModel):
    # ... existing fields ...
    
    # NEW FIELDS:
    script_word_count: int | None = None
    script_generation_attempts: int = 1
    script_validation_warnings: list[str] | None = None
```

## 🧪 Testing

### Unit Tests
- [ ] Test word count calculation
- [ ] Test validation (too short, too long, just right)
- [ ] Test retry mechanism (succeeds on 2nd attempt)
- [ ] Test retry exhaustion (fails after 3 attempts)
- [ ] Test adaptive prompt generation
- [ ] Test voice speed adjustment (future)

### Integration Tests
- [ ] Generate podcast with LLM that returns short script
- [ ] Verify retry happens
- [ ] Verify adaptive prompt is used
- [ ] Verify success after retry

## 📊 Metrics to Track

- Script generation attempts histogram (1, 2, 3, >3)
- Word count accuracy distribution
- Retry success rate
- Common failure patterns (always too short? always too long?)

## 🔧 Configuration

Add to `Settings`:
```python
# Podcast script validation
podcast_min_word_count_percentage: float = 0.8  # 80% of target
podcast_max_word_count_percentage: float = 1.2  # 120% of target
podcast_max_script_retries: int = 3
```

## 🔗 Related Files

- `backend/rag_solution/services/podcast_service.py:440-515` (_generate_script)
- `backend/rag_solution/schemas/podcast_schema.py`
- `backend/tests/unit/test_podcast_duration_control_unit.py`
- `backend/tests/PODCAST_DURATION_CONTROL_ANALYSIS.md`

## 🏷️ Labels

`enhancement`, `podcast`, `quality`, `llm`, `validation`, `retry-logic`

## 📚 References

- Related to issue #362 (audio duration measurement)
- Documented in PR #360 test files
- Analysis in `PODCAST_DURATION_CONTROL_ANALYSIS.md`

## 💡 Future Enhancements

- Machine learning to predict optimal word count based on content type
- A/B testing different prompts for word count accuracy
- User feedback loop: "Was this podcast too short/long?"
- Automatic word count adjustment based on historical accuracy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Implement script word count validation with adaptive retry #363

📋 Description

🎯 Goals

📍 Current State

✅ Acceptance Criteria

Phase 1: Validation

Phase 2: Retry Mechanism

Phase 3: Adaptive Prompts

Phase 4: Voice Speed Consideration (Future)

🛠️ Implementation Strategy

Script Validation Logic

Retry Logic with Adaptive Prompts

Schema Updates

🧪 Testing

Unit Tests

Integration Tests

📊 Metrics to Track

🔧 Configuration

🔗 Related Files

🏷️ Labels

📚 References

💡 Future Enhancements

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat: Implement script word count validation with adaptive retry #363

Description

📋 Description

🎯 Goals

📍 Current State

✅ Acceptance Criteria

Phase 1: Validation

Phase 2: Retry Mechanism

Phase 3: Adaptive Prompts

Phase 4: Voice Speed Consideration (Future)

🛠️ Implementation Strategy

Script Validation Logic

Retry Logic with Adaptive Prompts

Schema Updates

🧪 Testing

Unit Tests

Integration Tests

📊 Metrics to Track

🔧 Configuration

🔗 Related Files

🏷️ Labels

📚 References

💡 Future Enhancements

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions