fix: Restore Gemini Live API integration after Google model deprecation #186

Harshalzarikar · 2025-12-24T08:55:02Z

Summary

Cheating Daddy stopped working after Google deprecated the old Gemini model. This PR fixes the integration and adds major improvements.

What Was Broken

❌ App failed to connect - Google removed gemini-2.0-flash-exp model
❌ No voice activity detection (VAD) configuration
❌ High latency responses

Fixes & Improvements

🔧 Core Fixes

Updated to new Gemini model - gemini-2.0-flash-live-001 (the working Live API model)
Fixed Google GenAI SDK integration - Proper GoogleGenAI + Modality imports
Fixed WebSocket connection - Correct Live API handshake

🎤 Audio Improvements

Stereo to mono conversion - Proper 16kHz PCM format for Gemini
25ms audio chunks - Low-latency streaming

⚡ Latency Optimization

VAD settings tuned for speed:
- startOfSpeechSensitivity: HIGH
- endOfSpeechSensitivity: HIGH
- silenceDurationMs: 100ms
- prefixPaddingMs: 50ms
- turnCoverage: TURN_INCLUDES_ONLY_ACTIVITY

Technical Changes

src/utils/gemini.js - Complete rewrite of Gemini integration
src/renderer/index.html - Settings UI
src/renderer/renderer.js - Profile/language handling
src/preload.js - IPC bridge updates

Testing

Text input → Fast response ✅
VAD detecting speech ✅
Audio response latency (inherent Gemini limitation)

Notes

Audio responses have ~2-3s latency which is a Gemini Live API limitation, not our code. Text input is instant.

Summary by CodeRabbit

New Features
- AI model selection added to settings with available models and persisted choice (applies next session).
Bug Fixes
- Clearer, user-facing error messages for API, quota, and network issues.
- Improved Windows system-audio capture with reliable fallbacks.
- Real-time transcription visible during input.
Performance & Improvements
- Audio processing optimized for lower latency and adjusted sampling for broader compatibility.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-24T08:55:14Z

📝 Walkthrough

Walkthrough

Adds an AI model selection feature (UI, storage, IPC) and integrates dynamic model choice into the Gemini client. Also changes audio capture defaults to 16 kHz with smaller chunking, improves audio capture fallbacks, and extends error handling and logging across realtime flows.

Changes

Cohort / File(s)	Summary
Submodule Update `cheating-daddy`	Submodule commit hash update; no functional changes.
Storage — Core `src/storage.js`	Added `AVAILABLE_MODELS`, `getAvailableModels()` and `getSelectedModel()` getters; added `selectedModel` default to `DEFAULT_PREFERENCES`; exported new getters.
IPC — Main `src/index.js`	Added IPC handlers `storage:get-available-models` and `storage:get-selected-model` to expose model data with error handling.
UI — Customize View `src/components/views/CustomizeView.js`	Added `selectedModel` and `availableModels` properties; new `handleModelSelect()` and `renderModelSection()`; added Model tab and dropdown integration into sidebar rendering and storage sync.
Gemini Client / Realtime `src/utils/gemini.js`	Uses `getSelectedModel()` for dynamic model selection; imported `StartSensitivity`, `EndSensitivity`; expanded error handling, improved realtime transcription updates, logging, interruption handling, and adjusted audio usage to align with 16 kHz.
Renderer / Audio & Storage API `src/utils/renderer.js`	Audio constants changed (SAMPLE_RATE 24000→16000, AUDIO_CHUNK_DURATION 0.1→0.05, BUFFER_SIZE 4096→2048); improved Windows loopback/mic fallbacks; audio IPC calls made non-blocking; added async `getAvailableModels()` and `getSelectedModel()` renderer helpers; updated MIME types.
Git configuration `.gitignore`	Added ignore rule for `windows.txt`.

Sequence Diagram(s)

sequenceDiagram
    participant User as User
    participant UI as CustomizeView (UI)
    participant IPC as Main IPC
    participant Storage as Storage
    participant Gemini as Gemini Client

    User->>UI: Open Customize → Model tab
    UI->>IPC: request available models (storage:get-available-models)
    IPC->>Storage: getAvailableModels()
    Storage-->>IPC: models list
    IPC-->>UI: available models
    UI->>UI: render dropdown

    User->>UI: Select model
    UI->>IPC: set-selected-model (storage:set-selected-model)
    IPC->>Storage: persist selectedModel
    Storage-->>IPC: success
    IPC-->>UI: confirm

    Note over UI,Gemini: Model takes effect next session

    User->>Gemini: Start session
    Gemini->>IPC: request selected model (storage:get-selected-model)
    IPC->>Storage: getSelectedModel()
    Storage-->>IPC: model id
    IPC-->>Gemini: model id
    Gemini->>Gemini: initialize session with selected model
    Gemini-->>User: session ready

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 I hopped through prefs and IPC,
Picked a model just for thee,
Sixteen kHz sings soft and small,
Transcripts dance and errors fall,
Gemini waits — selection's key.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 18.18% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately reflects the main objective of the changeset: adding AI model selection functionality to restore Gemini Live API integration after deprecation.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

src/utils/gemini.js (1)

364-426: Remove or extract the commented-out configuration block.

This 60+ line commented block adds noise to the codebase. If it's intended as an alternative configuration for specific scenarios, consider:

Extracting it to a separate configuration file or documentation

Using a config flag to switch between modes

Removing it entirely if no longer needed
src/utils/renderer.js (1)
408-414: Consider logging errors in development mode instead of silently swallowing them.

The fire-and-forget pattern is good for latency, but completely silent error handling can make debugging difficult. Consider logging errors conditionally.
🔎 Proposed improvement
             // Don't await - fire and forget for minimal latency
             ipcRenderer
                 .invoke('send-mic-audio-content', {
                     data: base64Data,
                     mimeType: 'audio/pcm;rate=16000',
                 })
-                .catch(() => {}); // Silently ignore errors
+                .catch(err => {
+                    if (process.env.NODE_ENV === 'development') {
+                        console.debug('[Audio] Mic send error:', err.message);
+                    }
+                });

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae5ff68 and 70c580b.

📒 Files selected for processing (8)

cheating-daddy
src/assets/SystemAudioDump
src/components/views/CustomizeView.js
src/index.js
src/storage.js
src/utils/gemini.js
src/utils/renderer.js
windows.txt

🧰 Additional context used

🧬 Code graph analysis (3)

src/components/views/CustomizeView.js (2)

src/utils/renderer.js (2)

prefs (168-168)

prefs (1019-1019)

src/utils/gemini.js (1)

model (680-680)

src/utils/renderer.js (1)

src/utils/gemini.js (2)

SAMPLE_RATE (593-593)

audioBuffer (598-598)

src/index.js (1)

src/utils/renderer.js (1)

storage (43-132)

🪛 LanguageTool

windows.txt

[style] ~492-~492: ‘with respect to’ might be wordy. Consider a shorter alternative.
Context: ... new concept that has basically come up with respect to middleware like built-in middleware, cu...

(EN_WORDINESS_PREMIUM_WITH_RESPECT_TO)

🔇 Additional comments (13)

cheating-daddy (1)

1-1: Clarification needed: This appears to be a git submodule reference, not reviewable source code.

The file contains only a submodule commit hash (90b53a1b5b47175ed107570d658920248cd8b7e5), but the AI-generated summary references extensive changes across multiple files:

src/storage.js (AVAILABLE_MODELS, getAvailableModels, getSelectedModel)

src/index.js (IPC handlers)

src/utils/gemini.js (model selection integration)

src/utils/renderer.js (async storage API updates)

src/components/views/CustomizeView.js (Model section UI)

These implementation files are not provided for review.

Please clarify:

Should the actual implementation files listed above be included for review?

If this is an external submodule, confirm the commit hash 90b53a1b5b47175ed107570d658920248cd8b7e5 is from the intended repository.

Consider providing the actual changed files from the submodule so a comprehensive code review can be performed on the features described (AI model selection, Gemini integration, audio configuration, error handling, transcription updates, and Windows audio capture improvements).

src/index.js (1)

224-241: LGTM!

The new model IPC handlers follow the established error handling pattern and response shape consistently with other storage handlers in this file.

src/storage.js (2)

32-40: LGTM - model configuration is well-documented.

The comment explaining that older models were shut down on Dec 9, 2025 provides useful context. The structure allows for easy addition of new models in the future.

460-462: LGTM - clean export pattern.

The inline arrow function exports are concise and appropriate for these simple getters.

src/utils/gemini.js (4)

218-224: LGTM - dynamic model selection integration.

Model selection is cleanly retrieved from storage and logged before use. The integration with the live.connect call is straightforward.

283-301: Good improvement: user-friendly error messages.

The enhanced error mapping translates API error codes into actionable guidance for users. This significantly improves the debugging experience.

592-593: LGTM - sample rate aligned with API requirements.

The 16kHz sample rate is consistent with the Gemini Live API requirements for PCM audio input.

1-1: No changes required. StartSensitivity and EndSensitivity are valid exports from @google/genai v1.2.0 and the imports on line 1 are correct. These enums are properly used for VAD configuration in the Live API.

src/components/views/CustomizeView.js (2)

768-781: LGTM - proper async loading with fallback defaults.

The parallel Promise.all for fetching preferences, keybinds, and available models is efficient. The fallback defaults ensure graceful degradation if storage calls fail.

1225-1251: LGTM - clean Model section implementation.

The section follows the established UI patterns and provides clear user guidance about when model changes take effect.

src/utils/renderer.js (3)

10-31: LGTM - well-documented audio configuration.

The constants are clearly documented with their purpose and the trade-offs explained. The 16kHz sample rate aligns with the Gemini Live API requirements.

123-131: LGTM - consistent storage API extension.

The new model methods follow the established pattern with proper fallback values.

299-371: Good improvement: robust Windows audio capture with fallback.

The refactored code properly handles the case when system audio isn't available and provides helpful status messages to the user. The fallback to microphone-only mode is a sensible degradation strategy.

windows.txt

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (5)

src/utils/gemini.js (5)
605-605: Use the SAMPLE_RATE constant in the MIME type string.

The MIME type string is hardcoded with rate=16000, but the sample rate is defined as a constant at line 531. If the sample rate changes, the MIME type must be manually updated.
🔎 Proposed fix
-                mimeType: 'audio/pcm;rate=16000',
+                mimeType: `audio/pcm;rate=${SAMPLE_RATE}`,
609-613: Consider more nuanced error handling for audio send failures.

The code silently ignores errors containing 'closed' or 'CANCELLED', which may hide legitimate issues if unexpected session closures occur. While these are often benign during normal shutdown, suppressing them entirely can complicate debugging.
🔎 Alternative approach
     } catch (error) {
-        // Only log actual errors, not routine issues
-        if (!error.message?.includes('closed') && !error.message?.includes('CANCELLED')) {
-            console.error('[Gemini] Audio send error:', error.message);
-        }
+        // Log routine closure issues at debug level
+        if (error.message?.includes('closed') || error.message?.includes('CANCELLED')) {
+            console.debug('[Gemini] Audio send on closed session:', error.message);
+        } else {
+            console.error('[Gemini] Audio send error:', error.message);
+        }
     }
284-301: Consider extracting the error message mapping to reduce duplication.

The error-to-user-message mapping logic appears in both the onerror callback (lines 284-301) and the initialization catch block (lines 378-397). Extracting this to a helper function would improve maintainability and ensure consistent error messages.
🔎 Suggested extraction
+function getErrorMessage(error) {
+    const errorStr = error?.message || String(error);
+    
+    if (errorStr.includes('not found') || errorStr.includes('NOT_FOUND')) {
+        return 'Model not available. Try updating the app or check API access.';
+    } else if (errorStr.includes('RESOURCE_EXHAUSTED') || errorStr.includes('quota')) {
+        return 'API rate limit reached. Wait a few minutes or upgrade your API plan.';
+    } else if (errorStr.includes('INVALID_ARGUMENT') || errorStr.includes('API key')) {
+        return 'Invalid API key. Please check your Gemini API key in settings.';
+    } else if (errorStr.includes('PERMISSION_DENIED')) {
+        return 'API permission denied. Ensure your API key has access to the Live API.';
+    } else if (errorStr.includes('UNAVAILABLE') || errorStr.includes('network')) {
+        return 'Connection lost. Check your internet connection.';
+    }
+    
+    return errorStr;
+}
Then use it in both locations:
 onerror: function (e) {
     console.error('Session error:', e.message, e);
-    // Parse and provide user-friendly error messages
-    let userMessage = 'Error: ' + e.message;
-    
-    if (e.message?.includes('not found') || e.message?.includes('NOT_FOUND')) {
-        userMessage = 'Model not available. Try updating the app or check API access.';
-    } else if (e.message?.includes('RESOURCE_EXHAUSTED') || e.message?.includes('quota')) {
-        userMessage = 'API rate limit reached. Wait a few minutes or upgrade your API plan.';
-    } else if (e.message?.includes('INVALID_ARGUMENT') || e.message?.includes('API key')) {
-        userMessage = 'Invalid API key. Please check your Gemini API key in settings.';
-    } else if (e.message?.includes('PERMISSION_DENIED')) {
-        userMessage = 'API permission denied. Ensure your API key has access to the Live API.';
-    } else if (e.message?.includes('UNAVAILABLE') || e.message?.includes('network')) {
-        userMessage = 'Connection lost. Check your internet connection.';
-    }
-    
+    const userMessage = 'Error: ' + getErrorMessage(e);
     sendToRenderer('update-status', userMessage);
 },
Also applies to: 378-397

322-324: Consider toning down the absolute claim in the comment.

The comment states "Never Fails," which is an overconfident claim for any system that depends on network connectivity, API availability, and user input. Consider rephrasing to something like "Optimized for Speed and Accuracy" to set more realistic expectations.
🔎 Suggested rewording
-                // ═══════════════════════════════════════════════════════════
-                // ULTIMATE INTERVIEW CONFIG - Fast, Accurate, Never Fails
-                // ═══════════════════════════════════════════════════════════
+                // ═══════════════════════════════════════════════════════════
+                // INTERVIEW CONFIG - Optimized for Speed and Accuracy
+                // ═══════════════════════════════════════════════════════════
337-350: Acknowledge that this aggressive VAD configuration is intentional, but consider whether the trade-off is right for interviews.

The "MAXIMUM SPEED CONFIG" header confirms this is by design. However, at silenceDurationMs: 100, combined with START_SENSITIVITY_HIGH and END_SENSITIVITY_HIGH, the configuration is at the aggressive end of the spectrum. Gemini's best practices recommend 500–1000ms silence duration for conversational applications where users pause to think, not 100ms.

For interviews where candidates naturally pause to formulate answers, this setup risks cutting off responses mid-thought. Verify in user testing that the speed gain justifies the risk of premature cutoffs, or make silence duration configurable based on application mode.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6be03a0 and 93f669c.

📒 Files selected for processing (1)

src/utils/gemini.js

🔇 Additional comments (8)

src/utils/gemini.js (8)

244-246: LGTM!

The real-time transcription feature provides good user feedback by sending live updates to the UI as speech is transcribed.

277-281: LGTM!

The interruption handling correctly clears the message buffer when the user interrupts the AI response, preventing stale content from being displayed.

231-235: LGTM!

The logging optimizations reduce console noise while preserving essential debugging information. The early return for empty text (line 252) is a good efficiency improvement.

Also applies to: 252-252, 262-262

227-228: LGTM!

The session connection logging is clear and consistent with the codebase's logging conventions.

685-698: LGTM with a note!

The IPC handler error logging improvements provide better diagnostics. The error suppression for 'closed' messages is consistent with the pattern used elsewhere (though see the earlier comment about more nuanced handling).

Also applies to: 701-714, 739-754

726-726: LGTM!

The enhanced logging for image buffer validation provides useful diagnostic information.

1-1: No action needed. The @google/genai v1.2.0 library fully supports StartSensitivity and EndSensitivity enums. The imports are correct and available for the VAD configuration.

530-531: The sample rate of 16000 Hz is correct for Gemini Live API audio input.

The Gemini Live API input audio specification requires "raw little-endian 16-bit PCM, nominally 16 kHz," and the code implements this correctly with the SAMPLE_RATE set to 16000. The comment accurately reflects the API requirements.

coderabbitai · 2025-12-24T09:41:07Z

src/utils/gemini.js

+    // Get selected model from preferences (default to latest if not set)
+    const selectedModel = getSelectedModel();
+    console.log('Using Gemini model:', selectedModel);
+


🛠️ Refactor suggestion | 🟠 Major

Add validation for the selected model.

The code calls getSelectedModel() without validating the return value. If the function returns null, undefined, or an invalid model identifier, the session initialization will fail with a potentially unclear error message.

🔎 Proposed validation

// Get selected model from preferences (default to latest if not set) const selectedModel = getSelectedModel(); +if (!selectedModel || typeof selectedModel !== 'string') { + const errorMsg = 'Invalid model selection. Please check your settings.'; + console.error(errorMsg); + sendToRenderer('update-status', errorMsg); + isInitializingSession = false; + if (!isReconnect) { + sendToRenderer('session-initializing', false); + } + return null; +} console.log('Using Gemini model:', selectedModel);

🤖 Prompt for AI Agents

In src/utils/gemini.js around lines 218 to 221, getSelectedModel() is used without validation which can return null/undefined or an unsupported id and break session initialization; validate the result and fallback to a known default: check if selectedModel is truthy and matches an allowedModels list (or a regex/enum used elsewhere), if not set selectedModel = DEFAULT_LATEST_MODEL, log a warning indicating fallback, and use that validated value for session initialization (update the console.log to print the final validated model).

Removed Windows output files from .gitignore

sathwik13198 · 2025-12-27T05:56:33Z

Hey, I Have tested your branch it is awesome, Here the probable fix suggestions:

When I want to send typed the messages, like the server is receiving and replying back the messages and the API is also providing the response, but the in the UI component it is not displaying the messages.
I tried asking Code in Java or some other language like I am getting the response from the server but the response is not getting displayed in the UI box.

factscosmos07-cmyk · 2025-12-27T14:06:37Z

Hey, I Have tested your branch it is awesome, Here the probable fix suggestions:

When I want to send typed the messages, like the server is receiving and replying back the messages and the API is also providing the response, but the in the UI component it is not displaying the messages.

I tried asking Code in Java or some other language like I am getting the response from the server but the response is not getting displayed in the UI box.

it only giving 40 answers how to increase it unlimited

Fixed the issue

70c580b

coderabbitai bot reviewed Dec 24, 2025

View reviewed changes

windows.txt Outdated Show resolved Hide resolved

Harshalzarikar added 2 commits December 24, 2025 14:48

remmoved log file

6be03a0

chages in utils file

93f669c

coderabbitai bot reviewed Dec 24, 2025

View reviewed changes

Harshalzarikar changed the title ~~Fixed the issue~~ Restore Gemini Live API integration after Google model deprecation Dec 24, 2025

Harshalzarikar changed the title ~~Restore Gemini Live API integration after Google model deprecation~~ fix: Restore Gemini Live API integration after Google model deprecation Dec 24, 2025

Update .gitignore to remove Windows output entries

bf27fe9

Removed Windows output files from .gitignore

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Restore Gemini Live API integration after Google model deprecation #186

fix: Restore Gemini Live API integration after Google model deprecation #186

Harshalzarikar commented Dec 24, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Dec 24, 2025 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 24, 2025

Uh oh!

sathwik13198 commented Dec 27, 2025

Uh oh!

factscosmos07-cmyk commented Dec 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: Restore Gemini Live API integration after Google model deprecation #186

Are you sure you want to change the base?

fix: Restore Gemini Live API integration after Google model deprecation #186

Conversation

Harshalzarikar commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What Was Broken

Fixes & Improvements

🔧 Core Fixes

🎤 Audio Improvements

⚡ Latency Optimization

Technical Changes

Testing

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

sathwik13198 commented Dec 27, 2025

Uh oh!

factscosmos07-cmyk commented Dec 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Harshalzarikar commented Dec 24, 2025 •

edited

Loading

coderabbitai bot commented Dec 24, 2025 •

edited

Loading