Add audio transcription settings to MultiModalLiveClient #50

yarns123 · 2025-06-03T08:56:18Z

Updated MultiModalLiveClient to include properties for input and output audio transcription management. Modified the constructor to accept new parameters for these settings and adjusted setup configuration accordingly. Enhanced BidiGenerateContentSetup with new properties for audio transcription configuration.

Summary by CodeRabbit

New Features
- Added options to enable or disable input and output audio transcription during client setup.
Documentation
- Improved descriptions for audio transcription configuration options in setup documentation.

Updated MultiModalLiveClient to include properties for input and output audio transcription management. Modified the constructor to accept new parameters for these settings and adjusted setup configuration accordingly. Enhanced BidiGenerateContentSetup with new properties for audio transcription configuration.

coderabbitai · 2025-06-03T08:56:26Z

Walkthrough

This change adds two boolean properties to the MultiModalLiveClient class, allowing optional configuration of input and output audio transcription. The constructor and setup method are updated to support these options. Additionally, XML documentation comments are added to related properties in the BidiGenerateContentSetup class.

Changes

File(s)	Change Summary
src/GenerativeAI.Live/Models/MultiModalLiveClient.cs	Added `InputAudioTranscriptionEnabled` and `OutputAudioTranscriptionEnabled` properties and constructor parameters; updated setup payload to include audio transcription configuration.
src/GenerativeAI/Types/MultimodalLive/BidiGenerateContentSetup.cs	Added XML documentation comments to `InputAudioTranscription` and `OutputAudioTranscription` properties.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant MultiModalLiveClient
    participant BidiGenerateContentSetup

    Client->>MultiModalLiveClient: Instantiate (optionally enabling audio transcription)
    Client->>MultiModalLiveClient: Call SendSetupAsync()
    MultiModalLiveClient->>BidiGenerateContentSetup: Create setup payload
    alt Input/Output Audio Transcription Enabled
        MultiModalLiveClient->>BidiGenerateContentSetup: Set Input/OutputAudioTranscription configs
    else Not Enabled
        MultiModalLiveClient->>BidiGenerateContentSetup: Set Input/OutputAudioTranscription to null
    end
    MultiModalLiveClient-->>Client: Setup complete

Possibly related PRs

Add logging for invalid WebSocket payloads and new properties #49: Introduced properties and classes for input and output audio transcription in BidiGenerateContentSetup, directly related to this PR's configuration logic.

Poem

A toggle for transcribing sound,
Now in the client, easily found!
With booleans set, the streams align,
Audio to text, by design.
Setup grows smarter, docs more clear—
A bunny hops and gives a cheer! 🐇

✨ Finishing Touches

📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

src/GenerativeAI.Live/Models/MultiModalLiveClient.cs (1)
122-122: Consider constructor design pattern.

The constructor parameter list is becoming quite lengthy with the addition of these new boolean parameters. Consider using a configuration object or builder pattern for better maintainability and extensibility.

While the current approach works, future parameter additions could benefit from a configuration class approach:
public class MultiModalLiveClientOptions
{
    public bool InputAudioTranscriptionEnabled { get; set; } = false;
    public bool OutputAudioTranscriptionEnabled { get; set; } = false;
    // Future options can be added here
}

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 22483b2 and b73b402.

📒 Files selected for processing (2)

src/GenerativeAI.Live/Models/MultiModalLiveClient.cs (4 hunks)
src/GenerativeAI/Types/MultimodalLive/BidiGenerateContentSetup.cs (1 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (1)

src/GenerativeAI.Live/Models/MultiModalLiveClient.cs (1)

src/GenerativeAI/Types/MultimodalLive/BidiGenerateContentSetup.cs (1)

AudioTranscriptionConfig (149-152)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: build (9.0.x)

🔇 Additional comments (9)

src/GenerativeAI/Types/MultimodalLive/BidiGenerateContentSetup.cs (4)

47-51: LGTM! Clear and concise documentation.

The XML documentation comments clearly describe the purpose of these audio transcription configuration properties.

53-57: LGTM! Clear and concise documentation.

The XML documentation comments clearly describe the purpose of these audio transcription configuration properties.

149-152:
✅ Verification successful

Verify if the empty AudioTranscriptionConfig class is intentional.

The AudioTranscriptionConfig class is currently empty. Please confirm whether this is intentional (e.g., a marker class or future placeholder) or if properties need to be added.

🏁 Script executed:
#!/bin/bash
# Search for any usage or references to AudioTranscriptionConfig properties
# to understand if this class should have specific configuration properties

rg -A 5 -B 5 "AudioTranscriptionConfig" --type cs
Length of output: 6257
AudioTranscriptionConfig is intentionally empty

The AudioTranscriptionConfig class is only used as a placeholder/marker to toggle input/output audio transcription with default settings. No properties are referenced or required by the client, samples, or JSON serializer today, so leaving it empty is intentional.

If in the future you need to expose specific transcription settings (language, punctuation, word-level timestamps, etc.), you can add them then.

47-55: LGTM! Good documentation additions.

The XML documentation comments clearly describe the purpose of the audio transcription properties, improving code maintainability and developer experience.
src/GenerativeAI.Live/Models/MultiModalLiveClient.cs (5)

122-122: Constructor parameter integration looks good.

The constructor parameters follow existing naming conventions and have appropriate default values.

131-132: Property initialization is correct.

The properties are properly initialized from the constructor parameters.

560-561: Conditional transcription configuration logic is well implemented.

The conditional logic correctly creates AudioTranscriptionConfig instances when transcription is enabled and sets them to null when disabled, providing clean optional configuration.

131-132: LGTM! Property assignments are correct.

The constructor parameter assignments are properly implemented and follow the established pattern in the class.

560-561:
✅ Verification successful

Verify AudioTranscriptionConfig implementation.

The code conditionally creates new AudioTranscriptionConfig() instances, but according to the relevant code snippets, this class appears to be empty. This might indicate an incomplete implementation or the class may need additional configuration properties.

Please verify whether the AudioTranscriptionConfig class is intentionally empty or if it requires additional properties for proper audio transcription configuration.

🏁 Script executed:
#!/bin/bash
# Description: Check the complete implementation of AudioTranscriptionConfig class
# Expected: Find the class definition and any properties or methods it should contain

# Search for AudioTranscriptionConfig class definition and usage
ast-grep --pattern 'class AudioTranscriptionConfig {
  $$$
}'

# Also search for any references to audio transcription configuration in documentation or comments
rg -i "audio.*transcription.*config" -A 3 -B 3
Length of output: 4620
AudioTranscriptionConfig is intentionally empty.
The AudioTranscriptionConfig class currently serves as a marker/flag to enable audio transcription and does not define any configurable properties. No additional settings are required at this time.

coderabbitai · 2025-06-03T09:00:01Z

src/GenerativeAI.Live/Models/MultiModalLiveClient.cs

+    public bool InputAudioTranscriptionEnabled { get; set; } = false;
+
+    public bool OutputAudioTranscriptionEnabled { get; set; } = false;
+


🛠️ Refactor suggestion

Add XML documentation comments for consistency.

The new properties lack XML documentation comments, which breaks consistency with other properties in the class. Please add documentation similar to other boolean properties in the class.

+ /// <summary> + /// Gets or sets a value indicating whether input audio transcription is enabled for the session. + /// </summary> public bool InputAudioTranscriptionEnabled { get; set; } = false; + /// <summary> + /// Gets or sets a value indicating whether output audio transcription is enabled for the session. + /// </summary> public bool OutputAudioTranscriptionEnabled { get; set; } = false;

🤖 Prompt for AI Agents

In src/GenerativeAI.Live/Models/MultiModalLiveClient.cs around lines 108 to 111, the new boolean properties InputAudioTranscriptionEnabled and OutputAudioTranscriptionEnabled lack XML documentation comments. Add XML documentation comments above each property, following the style and format used for other boolean properties in the class, to maintain consistency and clarity.

🛠️ Refactor suggestion

Add XML documentation for consistency.

The new properties lack XML documentation comments, which is inconsistent with the established pattern in this class. All other public properties have comprehensive documentation.

Add XML documentation for these properties:

+ /// <summary> + /// Gets or sets a value indicating whether input audio transcription is enabled for the session. + /// </summary> public bool InputAudioTranscriptionEnabled { get; set; } = false; + /// <summary> + /// Gets or sets a value indicating whether output audio transcription is enabled for the session. + /// </summary> public bool OutputAudioTranscriptionEnabled { get; set; } = false;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

public bool InputAudioTranscriptionEnabled { get; set; } = false;

public bool OutputAudioTranscriptionEnabled { get; set; } = false;

/// <summary>

/// Gets or sets a value indicating whether input audio transcription is enabled for the session.

/// </summary>

public bool InputAudioTranscriptionEnabled { get; set; } = false;

/// <summary>

/// Gets or sets a value indicating whether output audio transcription is enabled for the session.

/// </summary>

public bool OutputAudioTranscriptionEnabled { get; set; } = false;

🤖 Prompt for AI Agents

In src/GenerativeAI.Live/Models/MultiModalLiveClient.cs around lines 108 to 111, the new public properties InputAudioTranscriptionEnabled and OutputAudioTranscriptionEnabled lack XML documentation comments. Add XML documentation comments above each property describing their purpose and usage, following the style and detail level of the existing documented properties in the class to maintain consistency.

gunpal5 · 2025-06-03T12:29:45Z

Thanks for the PR.

I have two concerns:

I am not sure if this approach would be future proof as Google would probably add some configuration options into Audio transcription config.
in the BidiGenerateContentSetup there are additional configuration options. We would need to simplify the process of passing those configurations as well.

coderabbitai bot reviewed Jun 3, 2025

View reviewed changes

gunpal5 merged commit 6204e2a into gunpal5:main Jun 3, 2025
2 checks passed

dependabot bot mentioned this pull request Jul 28, 2025

Bump the all group with 1 update tryAGI/CSharpToJsonSchema#51

Merged

This was referenced Dec 3, 2025

Bump the all group with 24 updates tryAGI/LangChain.Providers#226

Closed

Bump Google_GenerativeAI from 2.5.10 to 3.4.1 tryAGI/LangChain.Providers#230

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add audio transcription settings to MultiModalLiveClient #50

Add audio transcription settings to MultiModalLiveClient #50

Uh oh!

yarns123 commented Jun 3, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jun 3, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jun 3, 2025

Uh oh!

gunpal5 commented Jun 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		public bool InputAudioTranscriptionEnabled { get; set; } = false;

		public bool OutputAudioTranscriptionEnabled { get; set; } = false;

-    public bool InputAudioTranscriptionEnabled { get; set; } = false;
-    public bool OutputAudioTranscriptionEnabled { get; set; } = false;
+    /// <summary>
+    /// Gets or sets a value indicating whether input audio transcription is enabled for the session.
+    /// </summary>
+    public bool InputAudioTranscriptionEnabled { get; set; } = false;
+    /// <summary>
+    /// Gets or sets a value indicating whether output audio transcription is enabled for the session.
+    /// </summary>
+    public bool OutputAudioTranscriptionEnabled { get; set; } = false;

Add audio transcription settings to MultiModalLiveClient #50

Add audio transcription settings to MultiModalLiveClient #50

Uh oh!

Conversation

yarns123 commented Jun 3, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

gunpal5 commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yarns123 commented Jun 3, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jun 3, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

gunpal5 commented Jun 3, 2025 •

edited

Loading