Skip to content

Multimodal Features

Gunpal Jain edited this page Feb 16, 2025 · 3 revisions

Introduction

This page explains how to use multimodal features with the Google Generative AI C# SDK. Multimodal capabilities allow you to combine different data types, such as text and images, to generate more comprehensive and insightful responses from the model. This SDK provides easy-to-use methods to leverage these features.

Details

The GenerativeModel class provides methods to handle multimodal input. These methods allow you to include local files (images, videos, audio) directly in your requests.


Identifying Objects in an Image

This example demonstrates how to identify objects within an image.

// Arrange
var googleAI = new GoogleAI(apiKey); // Replace with your actual API key
var model = googleAI.CreateGeminiModel(modelName); // Replace with your desired model name
var request = new GenerateContentRequest();
request.AddInlineFile("image.png"); // Add the image file
request.AddText("Identify objects in the image?"); // Add the text prompt

// Act
var result = await model.GenerateContentAsync(request);

// Use the result
var text = result.Text();
Console.WriteLine(text);

Identifying an Image with File Path

This example shows how to identify an image using its file path directly with the simplified method.

var googleAI = new GoogleAI(apiKey); // Replace with your actual API key
var model = googleAI.CreateGeminiModel(modelName); // Replace with your desired model name
string prompt = "Identify objects in the image?";
string imageFile = "image.png";


var result = await model.GenerateContentAsync(prompt, imageFile);

// Use the result
var text = result.Text();
Console.WriteLine(text);

Processing Video with File Path

This example demonstrates how to process a video file.

var googleAI = new GoogleAI(apiKey); // Replace with your actual API key
var model = googleAI.CreateGeminiModel(modelName); // Replace with your desired model name
string prompt = "Describe this video?";
string videoFile = "TestData/testvideo.mp4";


var result = await model.GenerateContentAsync(prompt, videoFile);

// Use the result
var text = result.Text();
Console.WriteLine(text);

Processing Audio with File Path

This example shows how to process an audio file.

// Arrange
var googleAI = new GoogleAI(apiKey); // Replace with your actual API key
var model = googleAI.CreateGeminiModel(modelName); // Replace with your desired model name
string prompt = "Describe this audio?";
string audioFile = "TestData/testaudio.mp3";

// Act
var result = await model.GenerateContentAsync(prompt, audioFile);

// Use the result
var text = result.Text();
Console.WriteLine(text);

Streaming with Multimodal Input

You can also use streaming with multimodal input.

var googleAI = new GoogleAI(apiKey); // Replace with your actual API key
var model = googleAI.CreateGeminiModel(modelName); // Replace with your desired model name
var imageFile = "image.png";
string prompt = "Identify objects in the image?";

await foreach (var response in model.StreamContentAsync(prompt, imageFile))
{
    Console.WriteLine($"Chunk: {response.Text()}");
}

Chat with Multimodal and Streaming

This example combines chat, multimodal input, and streaming.

var googleAI = new GoogleAI(apiKey); // Replace with your actual API key
var model = googleAI.CreateGeminiModel(modelName); // Replace with your desired model name
var imageFile = "image.png";
string prompt = "Identify objects in the image?";
var chatSession = model.StartChat();

await foreach (var response in chatSession.StreamContentAsync(prompt, imageFile))
{
    Console.WriteLine($"Chunk: {response.Text()}");
}

var response2 = await chatSession.GenerateContentAsync("can you estimate number of blueberries?");
Console.WriteLine(response2.Text());

Important Considerations

  • File Paths: Ensure that the file paths provided are correct and accessible by the application.
  • Supported File Types: The model supports various file types for images, videos, and audio. Refer to the official documentation for a complete list of supported MIME types.
  • Model Capabilities: Multimodal capabilities depend on the specific model being used. Confirm that your chosen model supports the desired multimodal features.
  • Streaming: When streaming, remember that responses can come in chunks. You need to handle these partial responses and combine them if you need the complete output.

API Reference