Welcome! Content Understanding is a solution that analyzes and comprehends various media content—including documents, images, audio, and video—and transforms it into structured, organized, and searchable data.
Content Understanding is now a Generally Available (GA) service with the release of the 2025-11-01 API version.
- The samples in this repository default to the latest GA API version:
2025-11-01. - We will provide more samples for new functionalities in the GA API versions soon. For details on the updates in the current GA release, see the Content Understanding What's New Document page.
- As of November 2025, the
2025-11-01API version is now available in a broader range of regions. - To access sample code for version
2025-05-01-preview, please check out the corresponding Git tag2025-05-01-previewor download it directly from the release page.
👉 If you are looking for Python samples, check out this repo.
You can run this sample in GitHub Codespaces or on your local machine.
Run this repo virtually in a cloud-based development environment.
- Click the button above to create a new Codespace.
- Select the
mainbranch, your preferred region, and a 2-core machine. - When the Codespace is ready, VS Code will automatically build the Dev Container.
- Follow the instructions in Configure Azure AI service resource.
- Use the integrated terminal to run the project:
cd {ProjectName}
dotnet build
cd bin/Debug/net8.0/
dotnet {ProjectName}.dll-
For an overview of the available projects in {ProjectName}, refer to the Features section.
⚠️ IMPORTANT: If you plan to use prebuilt analyzers (like in ContentExtraction), you must first run the ModelDeploymentSetup sample. See Step 4: Configure Model Deployments for details.
To run the project locally, select one of the setup options below.
For the smoothest experience, we recommend Option 2.1, which provides a hassle-free environment setup.
-
Docker
Install Docker Desktop (available for Windows, macOS, and Linux).
Docker is used to manage and run the container environment.- Start Docker and ensure it is running in the background.
-
Visual Studio Code
Download and install Visual Studio Code. -
Dev Containers Extension
In the VS Code extension marketplace, install the extension named Dev Containers.
(This extension was previously called "Remote - Containers" but has since been renamed and integrated into Dev Containers.)
- Clone the repo:
git clone https://github.com/Azure-Samples/azure-ai-content-understanding-dotnet.git
cd azure-ai-content-understanding-dotnet- Launch VS Code and open the folder:
code .-
Press
F1, then selectDev Containers: Reopen in Container. -
Wait for the setup to complete. Follow the instructions in Configure Azure AI service resource, then use the integrated terminal in Visual Studio Code to run:
cd {ProjectName}
dotnet build
cd bin/Debug/net8.0/
dotnet {ProjectName}.dll-
For an overview of the available projects in {ProjectName}, refer to the Features section.
⚠️ IMPORTANT: If you plan to use prebuilt analyzers (like in ContentExtraction), you must first run the ModelDeploymentSetup sample. See Step 4: Configure Model Deployments for details.
- Clone the repo using Git or from Visual Studio:
git clone https://github.com/Azure-Samples/azure-ai-content-understanding-dotnet-
Open the
.slnsolution file in Visual Studio 2022+. -
Ensure the target framework is set to .NET 8.
-
Follow the instructions in Configure Azure AI service resource.
-
For an overview of the available projects, refer to the Features section.
⚠️ IMPORTANT: If you plan to use prebuilt analyzers (like in ContentExtraction), you must first run the ModelDeploymentSetup sample. See Step 4: Configure Model Deployments for details. -
Press F5 or click Start to run the console app.
First, create an Azure AI Foundry resource that will host both the Content Understanding service and the required model deployments.
- Follow the steps in the Azure Content Understanding documentation to create an Azure AI Foundry resource
- Get your Foundry resource's endpoint URL from Azure Portal:
- Go to Azure Portal
- Navigate to your Azure AI Foundry resource
- Go to Resource Management > Keys and Endpoint
- Copy the Endpoint URL (typically
https://<your-resource-name>.services.ai.azure.com/)
After creating your Azure AI Foundry resource, you must grant yourself the Cognitive Services User role to enable API calls for setting default GPT deployments:
- Go to Azure Portal
- Navigate to your Azure AI Foundry resource
- Go to Access Control (IAM) in the left menu
- Click Add > Add role assignment
- Select the Cognitive Services User role
- Assign it to yourself (or the user/service principal that will run the samples)
Note: This role assignment is required even if you are the owner of the resource. Without this role, you will not be able to call the Content Understanding API to configure model deployments for prebuilt analyzers.
prebuilt-documentSearch,prebuilt-audioSearch,prebuilt-videoSearchrequire GPT-4.1-mini and text-embedding-3-large- Other prebuilt analyzers like
prebuilt-invoice,prebuilt-receiptrequire GPT-4.1 and text-embedding-3-large
-
Deploy GPT-4.1:
- In Azure AI Foundry, go to Deployments > Deploy model > Deploy base model
- Search for and select gpt-4.1
- Complete the deployment with your preferred settings
- Note the deployment name (by convention, use
gpt-4.1)
-
Deploy GPT-4.1-mini:
- In Azure AI Foundry, go to Deployments > Deploy model > Deploy base model
- Search for and select gpt-4.1-mini
- Complete the deployment with your preferred settings
- Note the deployment name (by convention, use
gpt-4.1-mini)
-
Deploy text-embedding-3-large:
- In Azure AI Foundry, go to Deployments > Deploy model > Deploy base model
- Search for and select text-embedding-3-large
- Complete the deployment with your preferred settings
- Note the deployment name (by convention, use
text-embedding-3-large)
For more information on deploying models, see Deploy models in Azure AI Foundry.
Choose one of the following options to configure your application:
Recommended: This approach uses Azure Active Directory (AAD) token authentication, which is safer and strongly recommended for production environments. You do not need to set
AZURE_AI_API_KEYin yourappsettings.jsonfile when using this method.
-
Copy the sample appsettings file:
cp ContentUnderstanding.Common/appsettings.example.json ContentUnderstanding.Common/appsettings.json
-
Open
ContentUnderstanding.Common/appsettings.jsonand fill in the required values. Replace<your-resource-name>with your actual resource name. If you used different deployment names in Step 2, update the deployment variables accordingly:{ "AZURE_AI_ENDPOINT": "https://<your-resource-name>.services.ai.azure.com", "AZURE_AI_API_KEY": null, "AZURE_AI_API_VERSION": "2025-11-01", "GPT_4_1_DEPLOYMENT": "gpt-4.1", "GPT_4_1_MINI_DEPLOYMENT": "gpt-4.1-mini", "TEXT_EMBEDDING_3_LARGE_DEPLOYMENT": "text-embedding-3-large", "TRAINING_DATA_SAS_URL": null, "TRAINING_DATA_PATH": null }Note: See the appsettings.json Configuration Reference section below for detailed explanations of each setting, since JSON files cannot contain comments.
-
Log in to Azure:
azd auth login
If this does not work, try:
azd auth login --use-device-code
and follow the on-screen instructions.
-
Copy the sample appsettings file:
cp ContentUnderstanding.Common/appsettings.example.json ContentUnderstanding.Common/appsettings.json
-
Edit
ContentUnderstanding.Common/appsettings.jsonand set your credentials:- Replace
<your-resource-name>and<your-azure-ai-api-key>with your actual values. These can be found in your AI Services resource under Resource Management > Keys and Endpoint. - If you used different deployment names in Step 2, update the deployment variables accordingly:
{ "AZURE_AI_ENDPOINT": "https://<your-resource-name>.services.ai.azure.com", "AZURE_AI_API_KEY": "<your-azure-ai-api-key>", "AZURE_AI_API_VERSION": "2025-11-01", "GPT_4_1_DEPLOYMENT": "gpt-4.1", "GPT_4_1_MINI_DEPLOYMENT": "gpt-4.1-mini", "TEXT_EMBEDDING_3_LARGE_DEPLOYMENT": "text-embedding-3-large", "TRAINING_DATA_SAS_URL": null, "TRAINING_DATA_PATH": null }Note: See the appsettings.json Configuration Reference section below for detailed explanations of each setting, since JSON files cannot contain comments.
- Replace
⚠️ Note: If you skip the token authentication step above, you must setAZURE_AI_API_KEYin yourappsettings.jsonfile. Get your API key from Azure Portal by navigating to your Foundry resource > Resource Management > Keys and Endpoint.
-
Ensure you have permission to grant roles under your subscription.
-
Login to Azure:
azd auth loginIf this does not work, try:
azd auth login --use-device-codeand follow the on-screen instructions.
- Set up the environment, following prompts to choose the location:
azd upNote: Unlike
.envfiles which support comments, JSON files cannot contain comments. All configuration explanations are provided in this section.
After copying appsettings.example.json to appsettings.json, configure the following settings:
-
AZURE_AI_ENDPOINT(Required)- Your Azure AI Foundry resource endpoint URL
- Format:
https://<your-resource-name>.services.ai.azure.com - Get this from Azure Portal: Your Foundry resource > Resource Management > Keys and Endpoint
-
GPT_4_1_DEPLOYMENT(Required for prebuilt analyzers likeprebuilt-invoice,prebuilt-receipt)- The deployment name for GPT-4.1 model in your Azure AI Foundry resource
- Default:
gpt-4.1(if you used this name during deployment) - Required along with
TEXT_EMBEDDING_3_LARGE_DEPLOYMENTfor certain prebuilt analyzers
-
GPT_4_1_MINI_DEPLOYMENT(Required for prebuilt analyzers likeprebuilt-documentSearch,prebuilt-audioSearch,prebuilt-videoSearch)- The deployment name for GPT-4.1-mini model in your Azure AI Foundry resource
- Default:
gpt-4.1-mini(if you used this name during deployment) - Required along with
TEXT_EMBEDDING_3_LARGE_DEPLOYMENTfor search-related prebuilt analyzers
-
TEXT_EMBEDDING_3_LARGE_DEPLOYMENT(Required for prebuilt analyzers)- The deployment name for text-embedding-3-large model in your Azure AI Foundry resource
- Default:
text-embedding-3-large(if you used this name during deployment) - Required for all prebuilt analyzers that use embeddings
-
AZURE_AI_API_KEY(Optional)- Your Azure AI Foundry API key for key-based authentication
- WARNING: Keys are less secure and should only be used for testing/development
- Leave as
nullto use DefaultAzureCredential (recommended for production) - Get this from Azure Portal: Your Foundry resource > Resource Management > Keys and Endpoint
- If using DefaultAzureCredential, ensure you're logged in with
azd auth loginoraz login
-
AZURE_AI_API_VERSION(Optional)- The API version to use for Content Understanding
- Default:
2025-11-01(GA version) - Only change if you need to use a different API version
-
AZURE_AI_USER_AGENT(Optional)- The user agent string sent with HTTP requests to the Content Understanding API
- Default:
azure-ai-content-understanding-dotnet-sample-ga - The user agent is used for tracking sample usage and does not provide identity information
- You can customize this value to any string you prefer
- To opt out of tracking: Set this to
nullor an empty string ("") to prevent the user agent header from being sent - Can be set in
appsettings.jsonor as an environment variable - Example:
"AZURE_AI_USER_AGENT": "my-custom-user-agent"or"AZURE_AI_USER_AGENT": null
-
TRAINING_DATA_SAS_URL(Optional - Only required forAnalyzerTrainingsample)- SAS URL for the Azure Blob container containing training data
- Format:
https://<storage-account-name>.blob.core.windows.net/<container-name>?<sas-token> - Only needed when running the analyzer training sample
- Note: Currently, the
AnalyzerTrainingsample prompts for this value interactively at runtime. You can set it inappsettings.jsonfor convenience, but the sample will still prompt if not provided via configuration. - For more information, see Set up training data
-
TRAINING_DATA_PATH(Optional - Only required forAnalyzerTrainingsample)- Folder path within the blob container where training data is stored
- Example:
training_data/orlabeling-data/ - Only needed when running the analyzer training sample
- Note: Currently, the
AnalyzerTrainingsample prompts for this value interactively at runtime. You can set it inappsettings.jsonfor convenience, but the sample will still prompt if not provided via configuration. - For more information, see Set up training data
Option 1: DefaultAzureCredential (Recommended)
- Set
AZURE_AI_API_KEYtonull - Most common development scenario:
- Install Azure CLI
- Login:
az loginorazd auth login - Run the application (no additional configuration needed)
- Also supports:
- Environment variables (
AZURE_CLIENT_ID,AZURE_CLIENT_SECRET,AZURE_TENANT_ID) - Managed Identity (for Azure-hosted applications)
- Visual Studio Code authentication
- Azure PowerShell authentication
- Environment variables (
- For more info: DefaultAzureCredential documentation
Option 2: API Key (For Testing Only)
- Set
AZURE_AI_API_KEYto your API key value - Less secure - only recommended for local testing/development
- Get your API key from Azure Portal: Your Foundry resource > Resource Management > Keys and Endpoint
⚠️ IMPORTANT: Before running any samples that use prebuilt analyzers (likeContentExtraction), you must configure the model deployments. This is a one-time setup that maps your deployed models to the prebuilt analyzers.
-
Run the ModelDeploymentSetup sample:
cd ModelDeploymentSetup dotnet build dotnet run -
This sample will:
- Read your deployment names from
appsettings.json - Configure the default model mappings in your Azure AI Foundry resource
- Verify that all required deployments are configured correctly
- Read your deployment names from
-
After successful configuration, you can run other samples that use prebuilt analyzers.
Note: The configuration is persisted in your Azure AI Foundry resource, so you only need to run this once (or whenever you change your deployment names).
Azure AI Content Understanding is a new Generative AI-based Azure AI service designed to process and ingest content of any type—documents, images, audio, and video—into a user-defined output format. Content Understanding provides a streamlined way to analyze large volumes of unstructured data, accelerating time-to-value by generating output that can be integrated into automation and analytical workflows.
Documentation: Each sample includes a detailed
README.mdfile with concepts, code examples, and implementation details. See the sample directories for comprehensive documentation.
| Project | Key Source File | Description |
|---|---|---|
| ModelDeploymentSetup |
Program.cs | REQUIRED: This sample configures the default model deployments required for prebuilt analyzers. You must run this once before running any other samples that use prebuilt analyzers (like ContentExtraction). |
| ContentExtraction | ContentExtractionService.cs | This sample demonstrates how to extract semantic content from multimodal files—documents, audio, and video. The sample uses prebuilt analyzers to transform unstructured content into structured, machine-readable data optimized for retrieval-augmented generation (RAG) and automated workflows. |
| FieldExtraction | FieldExtractionService.cs | This sample demonstrates how to extract custom fields from multimodal files—documents, audio, and video. It shows both using prebuilt analyzers (recommended starting point) and creating custom analyzers when prebuilt options don't meet your needs. |
| Classifier | ClassifierService.cs | This sample will demo how to (1) create a classifier to categorize documents, (2) create a custom analyzer to extract specific fields, and (3) combine classifier and analyzers to classify, optionally split, and analyze documents in a flexible processing pipeline. |
| AnalyzerTraining | AnalyzerTrainingService.cs | This sample demonstrates how to enhance your analyzer's performance by training it with labeled data. Labeled data consists of samples that have been tagged with one or more labels to add context or meaning, which improves the analyzer's accuracy. Note: This feature is currently available for document scenarios only. |
| Management | ManagementService.cs | This sample demonstrates how to manage analyzers in your Azure AI Content Understanding resource. You'll learn how to create custom analyzers, list all analyzers, retrieve analyzer details, and delete analyzers you no longer need. |
Here is an example of the console output from the ContentExtraction project.
$ dotnet ContentExtraction.dll
Please enter a number to run sample:
[1] - Extract Document Content
[2] - Extract Audio Content
[3] - Extract Video Content
[4] - Extract Video Content With Face
1
Document Content Extraction Sample is running...
Use prebuilt-documentAnalyzer to extract document content from the file: ./data/invoice.pdf
===== Document Extraction has been saved to the following output file path =====
./outputs/content_extraction/AnalyzeDocumentAsync_20250714034618.json
===== The markdown output contains layout information, which is very useful for Retrieval-Augmented Generation (RAG) scenarios. You can paste the markdown into a viewer such as Visual Studio Code and preview the layout structure. =====
CONTOSO LTD.
# INVOICE
Contoso Headquarters
123 456th St
New York, NY, 10001
INVOICE: INV-100
INVOICE DATE: 11/15/2019
DUE DATE: 12/15/2019
CUSTOMER NAME: MICROSOFT CORPORATION
SERVICE PERIOD: 10/14/2019 - 11/14/2019
CUSTOMER ID: CID-12345
<<< Truncated for brevity >>>
Note: The following samples are currently targeting Preview.2 (API version
2025-05-01-preview) and will be updated to the GA API version (2025-11-01) soon.
- Azure Content Understanding Samples (Python)
- Azure Search with Content Understanding
- Azure Content Understanding with OpenAI
-
Trademarks - This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos is subject to those third-party’s policies.
-
Data Collection - The software may collect information about you and your use of the software and send it to Microsoft. Microsoft may use this information to provide services and improve our products and services. You may turn off the telemetry as described in the repository. There are also some features in the software that may enable you and Microsoft to collect data from users of your applications. If you use these features, you must comply with applicable law, including providing appropriate notices to users of your applications together with a copy of Microsoft’s privacy statement. Our privacy statement is located at https://go.microsoft.com/fwlink/?LinkID=824704. You can learn more about data collection and use in the help documentation and our privacy statement. Your use of the software operates as your consent to these practices.