Basic Multimodal Example

This example demonstrates how to use the Atomic Agents framework to analyze images with text, specifically focusing on extracting structured information from nutrition labels using GPT-4 Vision capabilities.

Features

Image Analysis: Process nutrition label images using GPT-4 Vision
Structured Data Extraction: Convert visual information into structured Pydantic models
Multi-Image Processing: Analyze multiple nutrition labels simultaneously
Comprehensive Nutritional Data: Extract detailed nutritional information including:
- Basic nutritional facts (calories, fats, proteins, etc.)
- Serving size information
- Vitamin and mineral content
- Product details

Getting Started

Clone the main Atomic Agents repository:

git clone https://github.com/BrainBlend-AI/atomic-agents

Navigate to the basic-multimodal directory:

cd atomic-agents/atomic-examples/basic-multimodal

Install dependencies using Poetry:
```
poetry install
```
Set up environment variables: Create a .env file in the basic-multimodal directory with the following content:
```
OPENAI_API_KEY=your_openai_api_key
```
Replace your_openai_api_key with your actual OpenAI API key.

Run the example:

poetry run python basic_multimodal/main.py

Components

1. Nutrition Label Schema (`NutritionLabel`)

Defines the structure for storing nutrition information, including:

Macronutrients (fats, proteins, carbohydrates)
Micronutrients (vitamins and minerals)
Serving information
Product details

2. Input/Output Schemas

NutritionAnalysisInput: Handles input images and analysis instructions
NutritionAnalysisOutput: Structures the extracted nutrition information

3. Nutrition Analyzer Agent

A specialized agent configured with:

GPT-4 Vision capabilities
Custom system prompts for nutrition label analysis
Structured data validation

Example Usage

The example includes test images in the test_images directory:

nutrition_label_1.png: Example nutrition label image
nutrition_label_2.jpg: Another example nutrition label image

Running the example will:

Load the test images
Process them through the nutrition analyzer
Display structured nutritional information for each label

Customization

You can modify the example by:

Adding your own nutrition label images to the test_images directory
Adjusting the NutritionLabel schema to capture additional information
Modifying the system prompt to focus on specific aspects of nutrition labels

Contributing

Contributions are welcome! Please fork the repository and submit a pull request with your enhancements or bug fixes.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Basic Multimodal Example

Features

Getting Started

Components

1. Nutrition Label Schema (`NutritionLabel`)

2. Input/Output Schemas

3. Nutrition Analyzer Agent

Example Usage

Customization

Contributing

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Basic Multimodal Example

Features

Getting Started

Components

1. Nutrition Label Schema (NutritionLabel)

2. Input/Output Schemas

3. Nutrition Analyzer Agent

Example Usage

Customization

Contributing

License

1. Nutrition Label Schema (`NutritionLabel`)