Skip to content

Agent start refactor #43

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
e994bbe
Enhance realtime agent app with support engagement selection and inte…
May 4, 2025
0449f06
Integrate InterviewAgent component and enhance interview handling in …
May 4, 2025
fa249d0
Refactor agent configurations and improve interview mode handling. Re…
May 4, 2025
8ff872b
Add real-time transcript saving functionality to interview agent
May 4, 2025
523921f
Add interview completion functionality with status and timestamp
May 4, 2025
ae1c314
Add default interview questions template for new interviews
May 4, 2025
9ed06bb
Refactor engagement data handling and improve image rendering in sele…
May 4, 2025
65c4301
Enhance application with new dependencies and improve Tailwind CSS co…
May 4, 2025
c2e0c67
Enhance interview functionality by adding public access routes for in…
May 5, 2025
b422a2a
Refactor question matching logic in InterviewExperience component to …
May 5, 2025
acc4b07
Implement enhanced link sharing functionality in InterviewList compon…
May 5, 2025
39f3b59
Refactor error handling in InterviewList component to simplify catch …
May 5, 2025
f68806e
Implement interview completion feature and enhance question handling …
May 6, 2025
f788e6c
Add magic link authentication option in LoginForm component and updat…
May 6, 2025
6edcc53
Refactor App component to improve client secret handling by ensuring …
May 6, 2025
c039f5e
Update interview agent configuration to only include the first name o…
May 6, 2025
fe5c7b6
Enhance interview completion handling by updating App and InterviewSe…
May 7, 2025
efc069e
Refactor GET route in interviews API to resolve parameters asynchrono…
May 9, 2025
5ea6d48
Refine agent/user speech visualization and status indicators
May 10, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -37,4 +37,7 @@ yarn-error.log*
# typescript
*.tsbuildinfo
next-env.d.ts
todo.md
todo.md

# MCP
mcp.json
138 changes: 138 additions & 0 deletions Interview_agent_dev_spec.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
# Interview Agent Development Specification

## Overview

The OpenAI Realtime Agents Interview Application is a Next.js-based web application that leverages OpenAI's Realtime API to create interactive voice-based agents for conducting structured interviews. The application specializes in qualitative research and feedback collection for startup support engagements, featuring an agent-based system that can conduct interviews following predefined conversation flows.

## Core Technologies

- **Frontend**: Next.js, React, TypeScript, Tailwind CSS
- **Backend**: Next.js API routes
- **Database**: Supabase (PostgreSQL)
- **Real-time Communication**: WebSockets, RTCDataChannel
- **AI**: OpenAI Realtime API
- **Voice Processing**: WebRTC for audio streaming

## System Architecture

### Client-Server Model
- **Client**: React application that manages WebRTC connections and user interactions
- **Server**: Next.js API routes for database operations and OpenAI API interactions

### Database Structure
- **Tables**:
- `interviews`: Stores interview metadata and session information
- `questions`: Stores interview questions with ordinal positions
- `answers`: Stores participant responses to questions
- `companies`: Reference table for organization information
- `people`: Reference table for interviewee information
- `support_engagements`: Reference table for specific support instances

## Key Features

### 1. Agent Management
- Pre-configured agent templates with customizable conversation flows
- Dynamic agent configuration based on interview context
- Voice customization (using "shimmer" voice)
- Speech playback optimization (1.25x speed)

### 2. Interview Process
- Structured conversation states with transition rules
- Context-aware questioning based on interviewee responses
- Active listening with follow-up question generation
- Real-time transcription of conversation

### 3. Realtime Voice Interaction
- Push-to-talk functionality
- Real-time voice streaming
- Voice activity detection (semantic_vad with high eagerness)
- Audio playback controls

### 4. Data Persistence
- Interview session recording and storage
- Question and answer tracking
- Contextual metadata storage
- Support engagement linking

### 5. User Interface
- Transcript visualization
- Event logging and monitoring
- Session control and management
- Interview creation and management

## Agent Configuration

The application supports configurable interview agents with:

1. **Personality & Tone**: Professional yet friendly researcher persona
2. **Core Objectives**: Context-specific interview goals
3. **Engagement Context**: Dynamic fields for company and support information
4. **Conversation Flow**: Sequential question progression
5. **Conversation States**: Structured interview phases with transition rules
- Introduction
- Context questions
- Challenge identification
- Impact assessment
- Conclusion

## Data Flow

1. **Interview Setup**:
- Agent configuration loaded with contextual information
- WebRTC connection established with OpenAI Realtime API
- Session metadata stored in Supabase

2. **Interview Execution**:
- Voice data streamed bidirectionally
- Conversation transcribed in real-time
- Responses processed by agent logic
- Follow-up questions generated contextually

3. **Data Persistence**:
- Interview responses stored in database
- Metadata updated throughout session
- Full transcript preserved

## API Endpoints

### Interview Management
- `GET /api/interviews`: Retrieve all interviews with associated questions
- `POST /api/interviews/create`: Create a new interview with questions
- `GET /api/interviews/connect`: Connect to specific interview data

### Session Management
- `GET /api/session`: Generate ephemeral keys for OpenAI Realtime API

### Data Access
- Endpoints for companies, people, and support engagements

## Deployment Considerations

- Environment variables for API keys and database connections
- WebRTC compatibility considerations
- Audio processing requirements
- Database migration scripts for schema updates

## Security

- Ephemeral key management for OpenAI API
- Server-side data validation
- Secure database access patterns
- Client-side security measures

## Future Enhancement Areas

1. **Enhanced Analytics**: Interview data visualization and insights
2. **Improved Agent Intelligence**: More contextual awareness and natural conversation
3. **Multi-language Support**: Internationalization for global usage
4. **Integration Capabilities**: API endpoints for external system connections
5. **Advanced Question Generation**: Dynamic question creation based on previous responses

## Development Guidelines

1. Follow existing code conventions in the repository
2. Maintain agent configuration patterns for consistency
3. Use TypeScript interfaces for data validation
4. Implement proper error handling for API endpoints
5. Test WebRTC functionality across different environments
6. Document new agent configurations thoroughly
17 changes: 17 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,23 @@ You should be able to use this repo to prototype your own multi-agent realtime v
- Start the server with `npm run dev`
- Open your browser to [http://localhost:3000](http://localhost:3000) to see the app. It should automatically connect to the `simpleExample` Agent Set.

## Supabase Integration

This demo app includes integration with Supabase for retrieving real support engagement data. To use this functionality:

1. Create a Supabase project at [https://supabase.com](https://supabase.com)
2. Set up your database with the following tables:
- `companies` - Information about companies receiving support
- `support_engagements` - Details about support engagements

3. Update your `.env.development.local` file with your Supabase credentials:
```
NEXT_PUBLIC_SUPABASE_URL=https://your-project-id.supabase.co
NEXT_PUBLIC_SUPABASE_ANON_KEY=your_supabase_anon_key
```

4. The app will automatically fetch and use real data when you select the "startupInterviewer" agent.

## Configuring Agents
Configuration in `src/app/agentConfigs/simpleExample.ts`
```javascript
Expand Down
21 changes: 21 additions & 0 deletions components.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{
"$schema": "https://ui.shadcn.com/schema.json",
"style": "new-york",
"rsc": true,
"tsx": true,
"tailwind": {
"config": "tailwind.config.ts",
"css": "src/app/globals.css",
"baseColor": "neutral",
"cssVariables": true,
"prefix": ""
},
"aliases": {
"components": "@/components",
"utils": "@/lib/utils",
"ui": "@/components/ui",
"lib": "@/lib",
"hooks": "@/hooks"
},
"iconLibrary": "lucide"
}
73 changes: 73 additions & 0 deletions interviewee_ux_spec.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Interviewee UX – Proof of Concept Specification

_Last updated: {{DATE}}_

## 1. Access & Security

| Item | Decision |
|------|----------|
| Link format | `https://<domain>/i/{invite_token}` – token in path segment |
| Auth | Route `/i/*` and `/app?candidate=1` exempt from auth middleware |
| Token validity | Link works as long as associated interview status is **not** `completed` |
| Re-use | Multiple openings allowed until marked completed |
| Expiration | none for POC |

## 2. Entry Flow
1. Candidate clicks the invite link (`/i/{token}`).
2. Server resolves `invite_token` → `interview.id`.
3. If interview status is `completed`: redirect → `/invite-completed` (future page).
If token not found: redirect → `/invite-not-found` (future page).
4. Otherwise, redirect → `/app?interviewId={id}&candidate=1`.

_No additional onboarding or mic-check screens for POC._

## 3. Candidate UI inside `/app`

| Element | Behaviour / Notes |
|---------|-------------------|
| Header | Minimal: "Interview Session" + product logo. No scenario/agent selectors. |
| Main area | `InterviewExperience` component reused. Shows:<br>• Current question (medium size, centred left column)<br>• Progress text: "Question N of M"<br>• Horizontal progress bar<br>• Audio-wave visualisation canvas below<br>• Status pill (Live/Connecting) |
| Agent state indicator | `Agent is speaking…` (green pulse) vs `Agent is listening…` (grey) |
| Transcript & Events panes | **Hidden** in candidate view |
| Bottom toolbar | Temporarily left visible for dev controls; will be hidden in prod. |
| Typing fallback | Not implemented for POC |

## 4. Completion & Thank-You

Trigger: Client detects assistant's **final** message + session disconnect → sets `sessionStatus = DISCONNECTED`.

Action:
* After 500 ms debounce, if `isCandidateView && isInterviewMode && sessionStatus === DISCONNECTED` → `router.push("/i/thank-you")`.

_New in v2 – agent-driven completion_

* When the agent reaches its `wrap_up` conversation state it calls the function `markInterviewCompleted` with `{ "interview_id": <uuid> }`.
* The front-end handles this function call (via `toolLogic`) which hits `POST /api/interviews/complete` and updates the DB.
* The subsequent disconnect triggers the existing redirect logic above.

### Thank-You screen (`/i/thank-you`)
* Large headline: "Thank you for your time!"
* Sub-text: "You may now close this tab or return to Volta."
* Button `Return to Volta` → `https://voltaeffect.com` (opens new tab)
* No other navigation.
* Refreshing this page keeps the user on thank-you screen (static route).

## 5. Edge Cases / Out-of-Scope
* Mic permission failures → **not** handled (risk accepted).
* Session resume after refresh during interview → deferred.
* One-time / time-limited tokens → deferred.
* Manual "Finish" button for candidate → deferred.
* Legal/privacy blurb → delivered verbally by AI, no UI display for POC.

## 6. Implementation Notes
* `middleware.ts` updated: public routes `/i` & `/app` bypass auth.
* `/i/[token]/page.tsx` handles token resolution & redirect logic.
* Candidate mode detected via query param `candidate=1`.
* UI conditional logic in `App.tsx`:
* Hides transcript/events
* Hides bottom toolbar once ready for prod
* Tracks agent-speaking state via transcript items
* Thank-You page implemented at `src/app/i/thank-you/page.tsx`.

---
**Ready for developer hand-off.**
10 changes: 9 additions & 1 deletion next.config.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,15 @@
import type { NextConfig } from "next";

const nextConfig: NextConfig = {
/* config options here */
images: {
remotePatterns: [
{
protocol: "https",
hostname: "*.supabase.co",
pathname: "/storage/v1/object/public/**",
},
],
},
};

export default nextConfig;
Loading