Skip to content

Conversation

@Harshalzarikar
Copy link

@Harshalzarikar Harshalzarikar commented Dec 24, 2025

Summary

Cheating Daddy stopped working after Google deprecated the old Gemini model. This PR fixes the integration and adds major improvements.

What Was Broken

  • ❌ App failed to connect - Google removed gemini-2.0-flash-exp model
  • ❌ No voice activity detection (VAD) configuration
  • ❌ High latency responses

Fixes & Improvements

🔧 Core Fixes

  • Updated to new Gemini model - gemini-2.0-flash-live-001 (the working Live API model)
  • Fixed Google GenAI SDK integration - Proper GoogleGenAI + Modality imports
  • Fixed WebSocket connection - Correct Live API handshake

🎤 Audio Improvements

  • Stereo to mono conversion - Proper 16kHz PCM format for Gemini
  • 25ms audio chunks - Low-latency streaming

⚡ Latency Optimization

  • VAD settings tuned for speed:
    • startOfSpeechSensitivity: HIGH
    • endOfSpeechSensitivity: HIGH
    • silenceDurationMs: 100ms
    • prefixPaddingMs: 50ms
    • turnCoverage: TURN_INCLUDES_ONLY_ACTIVITY

Technical Changes

  • src/utils/gemini.js - Complete rewrite of Gemini integration
  • src/renderer/index.html - Settings UI
  • src/renderer/renderer.js - Profile/language handling
  • src/preload.js - IPC bridge updates

Testing

  • Text input → Fast response ✅
  • VAD detecting speech ✅
  • Audio response latency (inherent Gemini limitation)

Notes

Audio responses have ~2-3s latency which is a Gemini Live API limitation, not our code. Text input is instant.

Summary by CodeRabbit

  • New Features

    • AI model selection added to settings with available models and persisted choice (applies next session).
  • Bug Fixes

    • Clearer, user-facing error messages for API, quota, and network issues.
    • Improved Windows system-audio capture with reliable fallbacks.
    • Real-time transcription visible during input.
  • Performance & Improvements

    • Audio processing optimized for lower latency and adjusted sampling for broader compatibility.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 24, 2025

📝 Walkthrough

Walkthrough

Adds an AI model selection feature (UI, storage, IPC) and integrates dynamic model choice into the Gemini client. Also changes audio capture defaults to 16 kHz with smaller chunking, improves audio capture fallbacks, and extends error handling and logging across realtime flows.

Changes

Cohort / File(s) Summary
Submodule Update
cheating-daddy
Submodule commit hash update; no functional changes.
Storage — Core
src/storage.js
Added AVAILABLE_MODELS, getAvailableModels() and getSelectedModel() getters; added selectedModel default to DEFAULT_PREFERENCES; exported new getters.
IPC — Main
src/index.js
Added IPC handlers storage:get-available-models and storage:get-selected-model to expose model data with error handling.
UI — Customize View
src/components/views/CustomizeView.js
Added selectedModel and availableModels properties; new handleModelSelect() and renderModelSection(); added Model tab and dropdown integration into sidebar rendering and storage sync.
Gemini Client / Realtime
src/utils/gemini.js
Uses getSelectedModel() for dynamic model selection; imported StartSensitivity, EndSensitivity; expanded error handling, improved realtime transcription updates, logging, interruption handling, and adjusted audio usage to align with 16 kHz.
Renderer / Audio & Storage API
src/utils/renderer.js
Audio constants changed (SAMPLE_RATE 24000→16000, AUDIO_CHUNK_DURATION 0.1→0.05, BUFFER_SIZE 4096→2048); improved Windows loopback/mic fallbacks; audio IPC calls made non-blocking; added async getAvailableModels() and getSelectedModel() renderer helpers; updated MIME types.
Git configuration
.gitignore
Added ignore rule for windows.txt.

Sequence Diagram(s)

sequenceDiagram
    participant User as User
    participant UI as CustomizeView (UI)
    participant IPC as Main IPC
    participant Storage as Storage
    participant Gemini as Gemini Client

    User->>UI: Open Customize → Model tab
    UI->>IPC: request available models (storage:get-available-models)
    IPC->>Storage: getAvailableModels()
    Storage-->>IPC: models list
    IPC-->>UI: available models
    UI->>UI: render dropdown

    User->>UI: Select model
    UI->>IPC: set-selected-model (storage:set-selected-model)
    IPC->>Storage: persist selectedModel
    Storage-->>IPC: success
    IPC-->>UI: confirm

    Note over UI,Gemini: Model takes effect next session

    User->>Gemini: Start session
    Gemini->>IPC: request selected model (storage:get-selected-model)
    IPC->>Storage: getSelectedModel()
    Storage-->>IPC: model id
    IPC-->>Gemini: model id
    Gemini->>Gemini: initialize session with selected model
    Gemini-->>User: session ready
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 I hopped through prefs and IPC,
Picked a model just for thee,
Sixteen kHz sings soft and small,
Transcripts dance and errors fall,
Gemini waits — selection's key.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.18% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main objective of the changeset: adding AI model selection functionality to restore Gemini Live API integration after deprecation.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/utils/gemini.js (1)

364-426: Remove or extract the commented-out configuration block.

This 60+ line commented block adds noise to the codebase. If it's intended as an alternative configuration for specific scenarios, consider:

  1. Extracting it to a separate configuration file or documentation
  2. Using a config flag to switch between modes
  3. Removing it entirely if no longer needed
src/utils/renderer.js (1)

408-414: Consider logging errors in development mode instead of silently swallowing them.

The fire-and-forget pattern is good for latency, but completely silent error handling can make debugging difficult. Consider logging errors conditionally.

🔎 Proposed improvement
             // Don't await - fire and forget for minimal latency
             ipcRenderer
                 .invoke('send-mic-audio-content', {
                     data: base64Data,
                     mimeType: 'audio/pcm;rate=16000',
                 })
-                .catch(() => {}); // Silently ignore errors
+                .catch(err => {
+                    if (process.env.NODE_ENV === 'development') {
+                        console.debug('[Audio] Mic send error:', err.message);
+                    }
+                });
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae5ff68 and 70c580b.

📒 Files selected for processing (8)
  • cheating-daddy
  • src/assets/SystemAudioDump
  • src/components/views/CustomizeView.js
  • src/index.js
  • src/storage.js
  • src/utils/gemini.js
  • src/utils/renderer.js
  • windows.txt
🧰 Additional context used
🧬 Code graph analysis (3)
src/components/views/CustomizeView.js (2)
src/utils/renderer.js (2)
  • prefs (168-168)
  • prefs (1019-1019)
src/utils/gemini.js (1)
  • model (680-680)
src/utils/renderer.js (1)
src/utils/gemini.js (2)
  • SAMPLE_RATE (593-593)
  • audioBuffer (598-598)
src/index.js (1)
src/utils/renderer.js (1)
  • storage (43-132)
🪛 LanguageTool
windows.txt

[style] ~492-~492: ‘with respect to’ might be wordy. Consider a shorter alternative.
Context: ... new concept that has basically come up with respect to middleware like built-in middleware, cu...

(EN_WORDINESS_PREMIUM_WITH_RESPECT_TO)

🔇 Additional comments (13)
cheating-daddy (1)

1-1: Clarification needed: This appears to be a git submodule reference, not reviewable source code.

The file contains only a submodule commit hash (90b53a1b5b47175ed107570d658920248cd8b7e5), but the AI-generated summary references extensive changes across multiple files:

  • src/storage.js (AVAILABLE_MODELS, getAvailableModels, getSelectedModel)
  • src/index.js (IPC handlers)
  • src/utils/gemini.js (model selection integration)
  • src/utils/renderer.js (async storage API updates)
  • src/components/views/CustomizeView.js (Model section UI)

These implementation files are not provided for review.

Please clarify:

  1. Should the actual implementation files listed above be included for review?
  2. If this is an external submodule, confirm the commit hash 90b53a1b5b47175ed107570d658920248cd8b7e5 is from the intended repository.
  3. Consider providing the actual changed files from the submodule so a comprehensive code review can be performed on the features described (AI model selection, Gemini integration, audio configuration, error handling, transcription updates, and Windows audio capture improvements).
src/index.js (1)

224-241: LGTM!

The new model IPC handlers follow the established error handling pattern and response shape consistently with other storage handlers in this file.

src/storage.js (2)

32-40: LGTM - model configuration is well-documented.

The comment explaining that older models were shut down on Dec 9, 2025 provides useful context. The structure allows for easy addition of new models in the future.


460-462: LGTM - clean export pattern.

The inline arrow function exports are concise and appropriate for these simple getters.

src/utils/gemini.js (4)

218-224: LGTM - dynamic model selection integration.

Model selection is cleanly retrieved from storage and logged before use. The integration with the live.connect call is straightforward.


283-301: Good improvement: user-friendly error messages.

The enhanced error mapping translates API error codes into actionable guidance for users. This significantly improves the debugging experience.


592-593: LGTM - sample rate aligned with API requirements.

The 16kHz sample rate is consistent with the Gemini Live API requirements for PCM audio input.


1-1: No changes required. StartSensitivity and EndSensitivity are valid exports from @google/genai v1.2.0 and the imports on line 1 are correct. These enums are properly used for VAD configuration in the Live API.

src/components/views/CustomizeView.js (2)

768-781: LGTM - proper async loading with fallback defaults.

The parallel Promise.all for fetching preferences, keybinds, and available models is efficient. The fallback defaults ensure graceful degradation if storage calls fail.


1225-1251: LGTM - clean Model section implementation.

The section follows the established UI patterns and provides clear user guidance about when model changes take effect.

src/utils/renderer.js (3)

10-31: LGTM - well-documented audio configuration.

The constants are clearly documented with their purpose and the trade-offs explained. The 16kHz sample rate aligns with the Gemini Live API requirements.


123-131: LGTM - consistent storage API extension.

The new model methods follow the established pattern with proper fallback values.


299-371: Good improvement: robust Windows audio capture with fallback.

The refactored code properly handles the case when system audio isn't available and provides helpful status messages to the user. The fallback to microphone-only mode is a sensible degradation strategy.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
src/utils/gemini.js (5)

605-605: Use the SAMPLE_RATE constant in the MIME type string.

The MIME type string is hardcoded with rate=16000, but the sample rate is defined as a constant at line 531. If the sample rate changes, the MIME type must be manually updated.

🔎 Proposed fix
-                mimeType: 'audio/pcm;rate=16000',
+                mimeType: `audio/pcm;rate=${SAMPLE_RATE}`,

609-613: Consider more nuanced error handling for audio send failures.

The code silently ignores errors containing 'closed' or 'CANCELLED', which may hide legitimate issues if unexpected session closures occur. While these are often benign during normal shutdown, suppressing them entirely can complicate debugging.

🔎 Alternative approach
     } catch (error) {
-        // Only log actual errors, not routine issues
-        if (!error.message?.includes('closed') && !error.message?.includes('CANCELLED')) {
-            console.error('[Gemini] Audio send error:', error.message);
-        }
+        // Log routine closure issues at debug level
+        if (error.message?.includes('closed') || error.message?.includes('CANCELLED')) {
+            console.debug('[Gemini] Audio send on closed session:', error.message);
+        } else {
+            console.error('[Gemini] Audio send error:', error.message);
+        }
     }

284-301: Consider extracting the error message mapping to reduce duplication.

The error-to-user-message mapping logic appears in both the onerror callback (lines 284-301) and the initialization catch block (lines 378-397). Extracting this to a helper function would improve maintainability and ensure consistent error messages.

🔎 Suggested extraction
+function getErrorMessage(error) {
+    const errorStr = error?.message || String(error);
+    
+    if (errorStr.includes('not found') || errorStr.includes('NOT_FOUND')) {
+        return 'Model not available. Try updating the app or check API access.';
+    } else if (errorStr.includes('RESOURCE_EXHAUSTED') || errorStr.includes('quota')) {
+        return 'API rate limit reached. Wait a few minutes or upgrade your API plan.';
+    } else if (errorStr.includes('INVALID_ARGUMENT') || errorStr.includes('API key')) {
+        return 'Invalid API key. Please check your Gemini API key in settings.';
+    } else if (errorStr.includes('PERMISSION_DENIED')) {
+        return 'API permission denied. Ensure your API key has access to the Live API.';
+    } else if (errorStr.includes('UNAVAILABLE') || errorStr.includes('network')) {
+        return 'Connection lost. Check your internet connection.';
+    }
+    
+    return errorStr;
+}

Then use it in both locations:

 onerror: function (e) {
     console.error('Session error:', e.message, e);
-    // Parse and provide user-friendly error messages
-    let userMessage = 'Error: ' + e.message;
-    
-    if (e.message?.includes('not found') || e.message?.includes('NOT_FOUND')) {
-        userMessage = 'Model not available. Try updating the app or check API access.';
-    } else if (e.message?.includes('RESOURCE_EXHAUSTED') || e.message?.includes('quota')) {
-        userMessage = 'API rate limit reached. Wait a few minutes or upgrade your API plan.';
-    } else if (e.message?.includes('INVALID_ARGUMENT') || e.message?.includes('API key')) {
-        userMessage = 'Invalid API key. Please check your Gemini API key in settings.';
-    } else if (e.message?.includes('PERMISSION_DENIED')) {
-        userMessage = 'API permission denied. Ensure your API key has access to the Live API.';
-    } else if (e.message?.includes('UNAVAILABLE') || e.message?.includes('network')) {
-        userMessage = 'Connection lost. Check your internet connection.';
-    }
-    
+    const userMessage = 'Error: ' + getErrorMessage(e);
     sendToRenderer('update-status', userMessage);
 },

Also applies to: 378-397


322-324: Consider toning down the absolute claim in the comment.

The comment states "Never Fails," which is an overconfident claim for any system that depends on network connectivity, API availability, and user input. Consider rephrasing to something like "Optimized for Speed and Accuracy" to set more realistic expectations.

🔎 Suggested rewording
-                // ═══════════════════════════════════════════════════════════
-                // ULTIMATE INTERVIEW CONFIG - Fast, Accurate, Never Fails
-                // ═══════════════════════════════════════════════════════════
+                // ═══════════════════════════════════════════════════════════
+                // INTERVIEW CONFIG - Optimized for Speed and Accuracy
+                // ═══════════════════════════════════════════════════════════

337-350: Acknowledge that this aggressive VAD configuration is intentional, but consider whether the trade-off is right for interviews.

The "MAXIMUM SPEED CONFIG" header confirms this is by design. However, at silenceDurationMs: 100, combined with START_SENSITIVITY_HIGH and END_SENSITIVITY_HIGH, the configuration is at the aggressive end of the spectrum. Gemini's best practices recommend 500–1000ms silence duration for conversational applications where users pause to think, not 100ms.

For interviews where candidates naturally pause to formulate answers, this setup risks cutting off responses mid-thought. Verify in user testing that the speed gain justifies the risk of premature cutoffs, or make silence duration configurable based on application mode.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6be03a0 and 93f669c.

📒 Files selected for processing (1)
  • src/utils/gemini.js
🔇 Additional comments (8)
src/utils/gemini.js (8)

244-246: LGTM!

The real-time transcription feature provides good user feedback by sending live updates to the UI as speech is transcribed.


277-281: LGTM!

The interruption handling correctly clears the message buffer when the user interrupts the AI response, preventing stale content from being displayed.


231-235: LGTM!

The logging optimizations reduce console noise while preserving essential debugging information. The early return for empty text (line 252) is a good efficiency improvement.

Also applies to: 252-252, 262-262


227-228: LGTM!

The session connection logging is clear and consistent with the codebase's logging conventions.


685-698: LGTM with a note!

The IPC handler error logging improvements provide better diagnostics. The error suppression for 'closed' messages is consistent with the pattern used elsewhere (though see the earlier comment about more nuanced handling).

Also applies to: 701-714, 739-754


726-726: LGTM!

The enhanced logging for image buffer validation provides useful diagnostic information.


1-1: No action needed. The @google/genai v1.2.0 library fully supports StartSensitivity and EndSensitivity enums. The imports are correct and available for the VAD configuration.


530-531: The sample rate of 16000 Hz is correct for Gemini Live API audio input.

The Gemini Live API input audio specification requires "raw little-endian 16-bit PCM, nominally 16 kHz," and the code implements this correctly with the SAMPLE_RATE set to 16000. The comment accurately reflects the API requirements.

Comment on lines +218 to +221
// Get selected model from preferences (default to latest if not set)
const selectedModel = getSelectedModel();
console.log('Using Gemini model:', selectedModel);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add validation for the selected model.

The code calls getSelectedModel() without validating the return value. If the function returns null, undefined, or an invalid model identifier, the session initialization will fail with a potentially unclear error message.

🔎 Proposed validation
 // Get selected model from preferences (default to latest if not set)
 const selectedModel = getSelectedModel();
+if (!selectedModel || typeof selectedModel !== 'string') {
+    const errorMsg = 'Invalid model selection. Please check your settings.';
+    console.error(errorMsg);
+    sendToRenderer('update-status', errorMsg);
+    isInitializingSession = false;
+    if (!isReconnect) {
+        sendToRenderer('session-initializing', false);
+    }
+    return null;
+}
 console.log('Using Gemini model:', selectedModel);
🤖 Prompt for AI Agents
In src/utils/gemini.js around lines 218 to 221, getSelectedModel() is used
without validation which can return null/undefined or an unsupported id and
break session initialization; validate the result and fallback to a known
default: check if selectedModel is truthy and matches an allowedModels list (or
a regex/enum used elsewhere), if not set selectedModel = DEFAULT_LATEST_MODEL,
log a warning indicating fallback, and use that validated value for session
initialization (update the console.log to print the final validated model).

@Harshalzarikar Harshalzarikar changed the title Fixed the issue Restore Gemini Live API integration after Google model deprecation Dec 24, 2025
@Harshalzarikar Harshalzarikar changed the title Restore Gemini Live API integration after Google model deprecation fix: Restore Gemini Live API integration after Google model deprecation Dec 24, 2025
Removed Windows output files from .gitignore
@sathwik13198
Copy link

Hey, I Have tested your branch it is awesome, Here the probable fix suggestions:

  1. When I want to send typed the messages, like the server is receiving and replying back the messages and the API is also providing the response, but the in the UI component it is not displaying the messages.
  2. I tried asking Code in Java or some other language like I am getting the response from the server but the response is not getting displayed in the UI box.

@factscosmos07-cmyk
Copy link

Hey, I Have tested your branch it is awesome, Here the probable fix suggestions:

  1. When I want to send typed the messages, like the server is receiving and replying back the messages and the API is also providing the response, but the in the UI component it is not displaying the messages.
  2. I tried asking Code in Java or some other language like I am getting the response from the server but the response is not getting displayed in the UI box.

it only giving 40 answers how to increase it unlimited

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants