-
Notifications
You must be signed in to change notification settings - Fork 8
Multimodal Features
This page explains how to use multimodal features with the Google Generative AI C# SDK. Multimodal capabilities allow you to combine different data types, such as text and images, to generate more comprehensive and insightful responses from the model. This SDK provides easy-to-use methods to leverage these features.
The GenerativeModel
class provides methods to handle multimodal input. These methods allow you to include local files (images, videos, audio) directly in your requests.
This example demonstrates how to identify objects within an image.
// Arrange
var googleAI = new GoogleAI(apiKey); // Replace with your actual API key
var model = googleAI.CreateGeminiModel(modelName); // Replace with your desired model name
var request = new GenerateContentRequest();
request.AddInlineFile("image.png"); // Add the image file
request.AddText("Identify objects in the image?"); // Add the text prompt
// Act
var result = await model.GenerateContentAsync(request);
// Use the result
var text = result.Text();
Console.WriteLine(text);
This example shows how to identify an image using its file path directly with the simplified method.
var googleAI = new GoogleAI(apiKey); // Replace with your actual API key
var model = googleAI.CreateGeminiModel(modelName); // Replace with your desired model name
string prompt = "Identify objects in the image?";
string imageFile = "image.png";
var result = await model.GenerateContentAsync(prompt, imageFile);
// Use the result
var text = result.Text();
Console.WriteLine(text);
This example demonstrates how to process a video file.
var googleAI = new GoogleAI(apiKey); // Replace with your actual API key
var model = googleAI.CreateGeminiModel(modelName); // Replace with your desired model name
string prompt = "Describe this video?";
string videoFile = "TestData/testvideo.mp4";
var result = await model.GenerateContentAsync(prompt, videoFile);
// Use the result
var text = result.Text();
Console.WriteLine(text);
This example shows how to process an audio file.
// Arrange
var googleAI = new GoogleAI(apiKey); // Replace with your actual API key
var model = googleAI.CreateGeminiModel(modelName); // Replace with your desired model name
string prompt = "Describe this audio?";
string audioFile = "TestData/testaudio.mp3";
// Act
var result = await model.GenerateContentAsync(prompt, audioFile);
// Use the result
var text = result.Text();
Console.WriteLine(text);
You can also use streaming with multimodal input.
var googleAI = new GoogleAI(apiKey); // Replace with your actual API key
var model = googleAI.CreateGeminiModel(modelName); // Replace with your desired model name
var imageFile = "image.png";
string prompt = "Identify objects in the image?";
await foreach (var response in model.StreamContentAsync(prompt, imageFile))
{
Console.WriteLine($"Chunk: {response.Text()}");
}
This example combines chat, multimodal input, and streaming.
var googleAI = new GoogleAI(apiKey); // Replace with your actual API key
var model = googleAI.CreateGeminiModel(modelName); // Replace with your desired model name
var imageFile = "image.png";
string prompt = "Identify objects in the image?";
var chatSession = model.StartChat();
await foreach (var response in chatSession.StreamContentAsync(prompt, imageFile))
{
Console.WriteLine($"Chunk: {response.Text()}");
}
var response2 = await chatSession.GenerateContentAsync("can you estimate number of blueberries?");
Console.WriteLine(response2.Text());
- File Paths: Ensure that the file paths provided are correct and accessible by the application.
- Supported File Types: The model supports various file types for images, videos, and audio. Refer to the official documentation for a complete list of supported MIME types.
- Model Capabilities: Multimodal capabilities depend on the specific model being used. Confirm that your chosen model supports the desired multimodal features.
- Streaming: When streaming, remember that responses can come in chunks. You need to handle these partial responses and combine them if you need the complete output.