This project is a proof of concept aimed at analyzing the functionality and integration of OpenAI's function calling with APIs to automate various operations, connections, and implementations through natural language. It is an interactive application designed to generate and edit images using APIs like Ideogram and OpenAI. It operates based on the context provided by the user, enabling dynamic decision-making for image generation and editing. Additionally, it remembers previously generated or user-provided images, ensuring a seamless and intuitive workflow.
-
Image Generation:
- Allows users to create images from text descriptions (prompts).
- Supports optional parameters like style, model, aspect ratio, and more for customization.
- The generated image is stored in the context for future actions.
-
Image Editing:
- Users can edit previously generated images or those provided via a URL.
- A white mask that covers the entire image is automatically generated for complete modifications.
- Enables users to apply styles and adjustments based on their needs.
-
Context Management:
- The application stores:
- Last generated image: An image created by the application.
- Last provided image: An image manually provided by the user.
- This context allows dynamic image editing, whether for the last generated or provided image.
- The application stores:
-
Interactive History:
- Keeps a record of actions performed, allowing users to review the conversation flow.
- The user describes their desired action: generate, edit, or ask questions.
- If the input includes a valid URL, it is detected and stored in the context as the last provided image.
- The application uses a model to decide whether the user intends to:
- Generate a new image.
- Edit an existing image.
- Take no action.
- If image generation is chosen:
- A request is constructed with the parameters provided by the user.
- The generated image is saved in the context as the last generated image.
- The URL of the created image is displayed to the user.
- If image editing is chosen:
- An image is automatically selected from the context:
- If the user provided a URL, it takes priority.
- Otherwise, the last image generated by the application is used.
- A white mask is generated to cover the entire image.
- The editing request is sent to the API with the defined parameters.
- The newly edited image is saved in the context as the last generated image.
- An image is automatically selected from the context:
- Before performing any action, the parameters are presented to the user.
- The user can confirm or adjust the parameters before proceeding.
-
Start:
- The user provides an initial request.
- If a URL is included, it is automatically saved in the context.
-
Decision:
- The request is analyzed to determine if it is a generation, editing, or general inquiry action.
-
Action Execution:
- Generate:
- A new image is created based on the prompt and optional parameters.
- Edit:
- An image from the context (provided or previously generated) is used.
- A mask is generated, and the editing request is sent.
- No Action:
- The conversation continues without performing any operation.
- Generate:
-
History and Context:
- Every action is logged in an interactive history.
- Generated or edited images are saved in the context for future reference.
-
Image URL Detection:
- Analyzes user input to identify valid URLs.
- Detected URLs are stored as the last provided image.
-
Automatic Mask:
- Generates a fully white mask that covers the original image.
- Allows complete edits without manual intervention.
-
Context Persistence:
- Remembers previously used images, enabling continuous workflows and history-dependent actions.
-
Parameter Confirmation:
- Users can review and adjust parameters before executing any action.
-
Automatic Temporary File Management:
- Downloaded images and generated masks are automatically deleted after each operation.
User:
"I want a cat on a pool table."
System:
"Image generation detected.
Prompt: A cat on a pool table.
Do you want to proceed with these parameters? (y/n)"
Result:
The image is generated and saved in the context.
User:
"I want the cat to be orange."
System:
"Image editing detected.
Prompt: Change the color of the cat to orange.
Image used: URL of the last generated image.
Do you want to proceed with these parameters? (y/n)"
Result:
The image is edited, and the new version is saved in the context.
-
Dependencies:
- Install the dependencies using the following command:
pip install -r requirements.txt
- Install the dependencies using the following command:
-
Environment Variables:
- Create a
.env
file with the following variables:IDEOGRAM_API_KEY=<your_ideogram_api_key> OPENAI_API_KEY=<your_openai_api_key>
- Create a