[Bug]: Dalle-Critic not working #2510

nazkhan-8451 · 2024-04-25T13:58:37Z

Describe the bug

Followed the notebook https://github.com/microsoft/autogen/blob/main/notebook/agentchat_image_generation_capability.ipynb, but getting the following response:

Code:

import autogen

config_list_gpt4 = autogen.config_list_from_json(
    "OAI_CONFIG_LIST",
    filter_dict={
        "model": ["wag-gpt4-128k"],
    },
)

config_list_gpt4_vision = autogen.config_list_from_json(
    "OAI_CONFIG_LIST",
    filter_dict={
        "model": ["gpt4-vision"],
    },
)

config_list_dalle = autogen.config_list_from_json(
    "OAI_CONFIG_LIST",
    filter_dict={
        "model": ["dall-e-3"],
    },
)

gpt_config = {
    "cache_seed": 42,  # change the cache_seed for different trials
    "temperature": 0.7,
    "config_list": config_list_gpt4,
    "timeout": 300,
}

gpt_vision_config = {
    "cache_seed": 42,  # change the cache_seed for different trials
    "temperature": 0.7,
    "config_list": config_list_gpt4_vision,
    "timeout": 300,
}

dalle_config = {
    "cache_seed": 42,  # change the cache_seed for different trials
    "temperature": 0.7,
    "config_list": config_list_dalle,
    "timeout": 300,
}

def _is_termination_message(msg) -> bool:
    # Detects if we should terminate the conversation
    if isinstance(msg.get("content"), str):
        return msg["content"].rstrip().endswith("TERMINATE")
    elif isinstance(msg.get("content"), list):
        for content in msg["content"]:
            if isinstance(content, dict) and "text" in content:
                return content["text"].rstrip().endswith("TERMINATE")
    return False


def critic_agent() -> autogen.ConversableAgent:
    return autogen.ConversableAgent(
        name="critic",
        llm_config=gpt_vision_config,
        system_message=CRITIC_SYSTEM_MESSAGE,
        max_consecutive_auto_reply=3,
        human_input_mode="NEVER",
        is_termination_msg=lambda msg: _is_termination_message(msg),
    )


def image_generator_agent() -> autogen.ConversableAgent:
    # Create the agent
    agent = autogen.ConversableAgent(
        name="dalle",
        llm_config=gpt_vision_config,
        max_consecutive_auto_reply=3,
        human_input_mode="NEVER",
        is_termination_msg=lambda msg: _is_termination_message(msg),
    )

    # Add image generation ability to the agent
    dalle_gen = generate_images.DalleImageGenerator(llm_config=dalle_config)
    image_gen_capability = generate_images.ImageGeneration(
        image_generator=dalle_gen, text_analyzer_llm_config=gpt_config
    )

    image_gen_capability.add_to_agent(agent)
    return agent

def extract_images(sender: autogen.ConversableAgent, recipient: autogen.ConversableAgent) -> Image:
    images = []
    all_messages = sender.chat_messages[recipient]

    for message in reversed(all_messages):
        # The GPT-4V format, where the content is an array of data
        contents = message.get("content", [])
        for content in contents:
            if isinstance(content, str):
                continue
            if content.get("type", "") == "image_url":
                img_data = content["image_url"]["url"]
                images.append(img_utils.get_pil_image(img_data))

    if not images:
        raise ValueError("No image data found in messages.")

    return images

###################################################

dalle = image_generator_agent()
critic = critic_agent()

img_prompt = "robot"

result = dalle.initiate_chat(critic, message=img_prompt)

Steps to reproduce

No response

Model Used

No response

Expected Behavior

No response

Screenshots and logs

No response

Additional Information

# Name                    Version                   Build  Channel
aiohttp                   3.9.5                    pypi_0    pypi
aiosignal                 1.3.1                    pypi_0    pypi
annotated-types           0.6.0                    pypi_0    pypi
anyio                     4.3.0                    pypi_0    pypi
appnope                   0.1.4              pyhd8ed1ab_0    conda-forge
asgiref                   3.8.1                    pypi_0    pypi
asttokens                 2.4.1              pyhd8ed1ab_0    conda-forge
attrs                     23.2.0                   pypi_0    pypi
azure-core                1.30.1                   pypi_0    pypi
azure-identity            1.16.0                   pypi_0    pypi
backoff                   2.2.1                    pypi_0    pypi
bcrypt                    4.1.2                    pypi_0    pypi
beautifulsoup4            4.12.3                   pypi_0    pypi
build                     1.2.1                    pypi_0    pypi
bzip2                     1.0.8                h80987f9_5  
ca-certificates           2024.2.2             hf0a4a13_0    conda-forge
cachetools                5.3.3                    pypi_0    pypi
certifi                   2024.2.2                 pypi_0    pypi
cffi                      1.16.0                   pypi_0    pypi
chardet                   5.2.0                    pypi_0    pypi
charset-normalizer        3.2.0                    pypi_0    pypi
chroma-hnswlib            0.7.3                    pypi_0    pypi
chromadb                  0.4.24                   pypi_0    pypi
click                     8.1.7                    pypi_0    pypi
coloredlogs               15.0.1                   pypi_0    pypi
comm                      0.2.2              pyhd8ed1ab_0    conda-forge
contourpy                 1.2.1                    pypi_0    pypi
cryptography              42.0.5                   pypi_0    pypi
cycler                    0.12.1                   pypi_0    pypi
dataclasses-json          0.6.4                    pypi_0    pypi
dataclasses-json-speakeasy 0.5.11                   pypi_0    pypi
debugpy                   1.6.7           py311h313beb8_0  
decorator                 5.1.1              pyhd8ed1ab_0    conda-forge
deprecated                1.2.14                   pypi_0    pypi
dirtyjson                 1.0.8                    pypi_0    pypi
diskcache                 5.6.3                    pypi_0    pypi
distro                    1.9.0                    pypi_0    pypi
docker                    7.0.0                    pypi_0    pypi
emoji                     2.11.0                   pypi_0    pypi
exceptiongroup            1.2.0              pyhd8ed1ab_2    conda-forge
executing                 2.0.1              pyhd8ed1ab_0    conda-forge
fastapi                   0.110.1                  pypi_0    pypi
filelock                  3.13.4                   pypi_0    pypi
filetype                  1.2.0                    pypi_0    pypi
flaml                     2.1.2                    pypi_0    pypi
flatbuffers               24.3.25                  pypi_0    pypi
fonttools                 4.51.0                   pypi_0    pypi
frozenlist                1.4.1                    pypi_0    pypi
fsspec                    2024.3.1                 pypi_0    pypi
google-auth               2.29.0                   pypi_0    pypi
googleapis-common-protos  1.63.0                   pypi_0    pypi
greenlet                  3.0.3                    pypi_0    pypi
grpcio                    1.62.1                   pypi_0    pypi
h11                       0.14.0                   pypi_0    pypi
httpcore                  1.0.5                    pypi_0    pypi
httptools                 0.6.1                    pypi_0    pypi
httpx                     0.27.0                   pypi_0    pypi
huggingface-hub           0.22.2                   pypi_0    pypi
humanfriendly             10.0                     pypi_0    pypi
idna                      3.7                      pypi_0    pypi
importlib-metadata        7.0.0                    pypi_0    pypi
importlib-resources       6.4.0                    pypi_0    pypi
importlib_metadata        7.1.0                hd8ed1ab_0    conda-forge
ipykernel                 6.29.3             pyh3cd1d5f_0    conda-forge
ipython                   8.22.2             pyh707e725_0    conda-forge
jedi                      0.19.1             pyhd8ed1ab_0    conda-forge
jinja2                    3.1.3                    pypi_0    pypi
joblib                    1.4.0                    pypi_0    pypi
jsonpatch                 1.33                     pypi_0    pypi
jsonpath-python           1.0.6                    pypi_0    pypi
jsonpointer               2.4                      pypi_0    pypi
jupyter_client            8.6.1              pyhd8ed1ab_0    conda-forge
jupyter_core              5.5.0           py311hca03da5_0  
kiwisolver                1.4.5                    pypi_0    pypi
kubernetes                29.0.0                   pypi_0    pypi
langchain                 0.1.16                   pypi_0    pypi
langchain-community       0.0.33                   pypi_0    pypi
langchain-core            0.1.44                   pypi_0    pypi
langchain-text-splitters  0.0.1                    pypi_0    pypi
langdetect                1.0.9                    pypi_0    pypi
langsmith                 0.1.49                   pypi_0    pypi
libcxx                    16.0.6               h4653b0c_0    conda-forge
libffi                    3.4.4                hca03da5_0  
libsodium                 1.0.18               h27ca646_1    conda-forge
llama-index               0.10.30                  pypi_0    pypi
llama-index-agent-openai  0.2.2                    pypi_0    pypi
llama-index-cli           0.1.12                   pypi_0    pypi
llama-index-core          0.10.30                  pypi_0    pypi
llama-index-embeddings-azure-openai 0.1.7                    pypi_0    pypi
llama-index-embeddings-openai 0.1.7                    pypi_0    pypi
llama-index-indices-managed-llama-cloud 0.1.5                    pypi_0    pypi
llama-index-legacy        0.9.48                   pypi_0    pypi
llama-index-llms-azure-openai 0.1.6                    pypi_0    pypi
llama-index-llms-openai   0.1.15                   pypi_0    pypi
llama-index-multi-modal-llms-openai 0.1.5                    pypi_0    pypi
llama-index-program-openai 0.1.5                    pypi_0    pypi
llama-index-question-gen-openai 0.1.3                    pypi_0    pypi
llama-index-readers-file  0.1.19                   pypi_0    pypi
llama-index-readers-llama-parse 0.1.4                    pypi_0    pypi
llama-parse               0.4.1                    pypi_0    pypi
llamaindex-py-client      0.1.18                   pypi_0    pypi
lxml                      5.2.1                    pypi_0    pypi
markdown-it-py            3.0.0                    pypi_0    pypi
markdownify               0.12.1                   pypi_0    pypi
markupsafe                2.1.5                    pypi_0    pypi
marshmallow               3.21.1                   pypi_0    pypi
matplotlib                3.8.4                    pypi_0    pypi
matplotlib-inline         0.1.7              pyhd8ed1ab_0    conda-forge
mdurl                     0.1.2                    pypi_0    pypi
mmh3                      4.1.0                    pypi_0    pypi
monotonic                 1.6                      pypi_0    pypi
mpmath                    1.3.0                    pypi_0    pypi
msal                      1.28.0                   pypi_0    pypi
msal-extensions           1.1.0                    pypi_0    pypi
multidict                 6.0.5                    pypi_0    pypi
mypy-extensions           1.0.0                    pypi_0    pypi
ncurses                   6.4                  h313beb8_0  
nest-asyncio              1.6.0              pyhd8ed1ab_0    conda-forge
networkx                  3.3                      pypi_0    pypi
nltk                      3.8.1                    pypi_0    pypi
numpy                     1.26.4                   pypi_0    pypi
oauthlib                  3.2.2                    pypi_0    pypi
onnxruntime               1.17.3                   pypi_0    pypi
openai                    1.21.2                   pypi_0    pypi
openssl                   1.1.1w               h53f4e23_0    conda-forge
opentelemetry-api         1.24.0                   pypi_0    pypi
opentelemetry-exporter-otlp-proto-common 1.24.0                   pypi_0    pypi
opentelemetry-exporter-otlp-proto-grpc 1.24.0                   pypi_0    pypi
opentelemetry-instrumentation 0.45b0                   pypi_0    pypi
opentelemetry-instrumentation-asgi 0.45b0                   pypi_0    pypi
opentelemetry-instrumentation-fastapi 0.45b0                   pypi_0    pypi
opentelemetry-proto       1.24.0                   pypi_0    pypi
opentelemetry-sdk         1.24.0                   pypi_0    pypi
opentelemetry-semantic-conventions 0.45b0                   pypi_0    pypi
opentelemetry-util-http   0.45b0                   pypi_0    pypi
orjson                    3.10.1                   pypi_0    pypi
overrides                 7.7.0                    pypi_0    pypi
packaging                 23.2                     pypi_0    pypi
pandas                    2.2.2                    pypi_0    pypi
parso                     0.8.4              pyhd8ed1ab_0    conda-forge
pexpect                   4.9.0              pyhd8ed1ab_0    conda-forge
pickleshare               0.7.5                   py_1003    conda-forge
pillow                    10.3.0                   pypi_0    pypi
pip                       23.3.1          py311hca03da5_0  
platformdirs              4.2.0              pyhd8ed1ab_0    conda-forge
portalocker               2.8.2                    pypi_0    pypi
posthog                   3.5.0                    pypi_0    pypi
prompt-toolkit            3.0.42             pyha770c72_0    conda-forge
protobuf                  4.25.3                   pypi_0    pypi
psutil                    5.9.0           py311h80987f9_0  
ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
pulsar-client             3.5.0                    pypi_0    pypi
pure_eval                 0.2.2              pyhd8ed1ab_0    conda-forge
pyasn1                    0.6.0                    pypi_0    pypi
pyasn1-modules            0.4.0                    pypi_0    pypi
pyautogen                 0.2.25                   pypi_0    pypi
pycparser                 2.22                     pypi_0    pypi
pydantic                  2.7.0                    pypi_0    pypi
pydantic-core             2.18.1                   pypi_0    pypi
pygments                  2.17.2             pyhd8ed1ab_0    conda-forge
pyjwt                     2.8.0                    pypi_0    pypi
pyparsing                 3.1.2                    pypi_0    pypi
pypdf                     4.2.0                    pypi_0    pypi
pypika                    0.48.9                   pypi_0    pypi
pyproject-hooks           1.0.0                    pypi_0    pypi
python                    3.11.0               hc0d8a6c_3  
python-dateutil           2.9.0.post0              pypi_0    pypi
python-dotenv             1.0.1                    pypi_0    pypi
python-iso639             2024.2.7                 pypi_0    pypi
python-magic              0.4.27                   pypi_0    pypi
pytz                      2024.1                   pypi_0    pypi
pyyaml                    6.0.1                    pypi_0    pypi
pyzmq                     25.1.2          py311h313beb8_0  
rapidfuzz                 3.8.1                    pypi_0    pypi
readline                  8.2                  h1a28f6b_0  
regex                     2024.4.16                pypi_0    pypi
replicate                 0.25.2                   pypi_0    pypi
requests                  2.31.0                   pypi_0    pypi
requests-oauthlib         2.0.0                    pypi_0    pypi
rich                      13.7.1                   pypi_0    pypi
rsa                       4.9                      pypi_0    pypi
safetensors               0.4.3                    pypi_0    pypi
scikit-learn              1.4.2                    pypi_0    pypi
scipy                     1.13.0                   pypi_0    pypi
sentence-transformers     2.7.0                    pypi_0    pypi
setuptools                68.2.2          py311hca03da5_0  
shellingham               1.5.4                    pypi_0    pypi
six                       1.16.0             pyh6c4a22f_0    conda-forge
sniffio                   1.3.1                    pypi_0    pypi
soupsieve                 2.5                      pypi_0    pypi
sqlalchemy                2.0.29                   pypi_0    pypi
sqlite                    3.41.2               h80987f9_0  
stack_data                0.6.2              pyhd8ed1ab_0    conda-forge
starlette                 0.37.2                   pypi_0    pypi
striprtf                  0.0.26                   pypi_0    pypi
sympy                     1.12                     pypi_0    pypi
tabulate                  0.9.0                    pypi_0    pypi
tenacity                  8.2.3                    pypi_0    pypi
termcolor                 2.4.0                    pypi_0    pypi
threadpoolctl             3.4.0                    pypi_0    pypi
tiktoken                  0.6.0                    pypi_0    pypi
tk                        8.6.12               hb8d0fd4_0  
tokenizers                0.19.1                   pypi_0    pypi
torch                     2.2.2                    pypi_0    pypi
tornado                   6.3.3           py311h80987f9_0  
tqdm                      4.66.2                   pypi_0    pypi
traitlets                 5.14.2             pyhd8ed1ab_0    conda-forge
transformers              4.40.0                   pypi_0    pypi
typer                     0.12.3                   pypi_0    pypi
typing-inspect            0.9.0                    pypi_0    pypi
typing_extensions         4.11.0             pyha770c72_0    conda-forge
tzdata                    2024.1                   pypi_0    pypi
unstructured              0.13.2                   pypi_0    pypi
unstructured-client       0.18.0                   pypi_0    pypi
urllib3                   1.26.18                  pypi_0    pypi
uvicorn                   0.29.0                   pypi_0    pypi
uvloop                    0.19.0                   pypi_0    pypi
watchfiles                0.21.0                   pypi_0    pypi
wcwidth                   0.2.13             pyhd8ed1ab_0    conda-forge
websocket-client          1.7.0                    pypi_0    pypi
websockets                12.0                     pypi_0    pypi
wheel                     0.41.2          py311hca03da5_0  
wrapt                     1.16.0                   pypi_0    pypi
xz                        5.4.6                h80987f9_0  
yarl                      1.9.4                    pypi_0    pypi
zeromq                    4.3.5                hebf3989_1    conda-forge
zipp                      3.18.1                   pypi_0    pypi

The text was updated successfully, but these errors were encountered:

ekzhu · 2024-04-25T19:08:14Z

Looks like the message output from critic is not complete. @WaelKarkoub do you know about possible cause of this?

WaelKarkoub · 2024-04-25T19:33:44Z

@ekzhu this is new to me, maybe the API provider is limiting the number of output tokens.

@nazkhan-8451 I ran your code and it works perfectly fine for me. I'm not sure how you setup your OAI_CONFIG_LIST, but check if you have max_tokens set which could limit the number of output tokens. I also would recommend "cache_seed": None, when testing with autogen, which should make it easier for you to debug.

Here is my version (I don't know what is wag-gpt4-128k, and gpt-4 now supports vision as well):

import os

from PIL.Image import Image

import autogen
from autogen.agentchat.contrib import img_utils
from autogen.agentchat.contrib.capabilities import generate_images

CRITIC_SYSTEM_MESSAGE = """You need to improve the prompt of the figures you saw.
How to create an image that is better in terms of color, shape, text (clarity), and other things.
Reply with the following format:

CRITICS: the image needs to improve...
PROMPT: here is the updated prompt!

If you have no critique or a prompt, just say TERMINATE
"""

config_list_gpt4 = [
    {
        "model": "gpt-4-turbo-2024-04-09",
        "api_key": os.environ["OPENAI_API_KEY"],
    }
]

config_list_gpt4_vision = config_list_gpt4

config_list_dalle = [
    {
        "model": "dall-e-3",
        "api_key": os.environ["OPENAI_API_KEY"],
    }
]

gpt_config = {
    "cache_seed": None,  # change the cache_seed for different trials
    "temperature": 0.7,
    "config_list": config_list_gpt4,
    "timeout": 300,
}

gpt_vision_config = {
    "cache_seed": None,  # change the cache_seed for different trials
    "temperature": 0.7,
    "config_list": config_list_gpt4_vision,
    "timeout": 300,
}

dalle_config = {
    "cache_seed": None,  # change the cache_seed for different trials
    "temperature": 0.7,
    "config_list": config_list_dalle,
    "timeout": 300,
}


def _is_termination_message(msg) -> bool:
    # Detects if we should terminate the conversation
    if isinstance(msg.get("content"), str):
        return msg["content"].rstrip().endswith("TERMINATE")
    elif isinstance(msg.get("content"), list):
        for content in msg["content"]:
            if isinstance(content, dict) and "text" in content:
                return content["text"].rstrip().endswith("TERMINATE")
    return False


def critic_agent() -> autogen.ConversableAgent:
    return autogen.ConversableAgent(
        name="critic",
        llm_config=gpt_vision_config,
        system_message=CRITIC_SYSTEM_MESSAGE,
        max_consecutive_auto_reply=3,
        human_input_mode="NEVER",
        is_termination_msg=lambda msg: _is_termination_message(msg),
    )


def image_generator_agent() -> autogen.ConversableAgent:
    # Create the agent
    agent = autogen.ConversableAgent(
        name="dalle",
        llm_config=gpt_vision_config,
        max_consecutive_auto_reply=3,
        human_input_mode="NEVER",
        is_termination_msg=lambda msg: _is_termination_message(msg),
    )

    # Add image generation ability to the agent
    dalle_gen = generate_images.DalleImageGenerator(llm_config=dalle_config)
    image_gen_capability = generate_images.ImageGeneration(
        image_generator=dalle_gen, text_analyzer_llm_config=gpt_config
    )

    image_gen_capability.add_to_agent(agent)
    return agent


def extract_images(sender: autogen.ConversableAgent, recipient: autogen.ConversableAgent) -> Image:
    images = []
    all_messages = sender.chat_messages[recipient]

    for message in reversed(all_messages):
        # The GPT-4V format, where the content is an array of data
        contents = message.get("content", [])
        for content in contents:
            if isinstance(content, str):
                continue
            if content.get("type", "") == "image_url":
                img_data = content["image_url"]["url"]
                images.append(img_utils.get_pil_image(img_data))

    if not images:
        raise ValueError("No image data found in messages.")

    return images


###################################################

dalle = image_generator_agent()
critic = critic_agent()

img_prompt = "robot"

result = dalle.initiate_chat(critic, message=img_prompt)

nazkhan-8451 · 2024-04-26T00:16:20Z

@WaelKarkoub wag-gpt4-128k is the deployed model name of gpt-4-turbo in azure. I don't know what I am doing wrong here. If your code and mine are same then I have no clue why this is happening. Is there any library version mismatch that could cause this?

WaelKarkoub · 2024-04-26T00:41:30Z

@nazkhan-8451 try updating to the latest autogen version, not certain if that would change anything. In your OAI_CONFIG_LIST, I know you have set your model and the API key, do you have anything else set up?

nazkhan-8451 · 2024-04-26T14:09:34Z

@WaelKarkoub here is my file. I have checked the models individually. The api-key and url are correct

[
        {
            "model": "wag-gpt4-128k",
            "api_key": "api-key",
            "api_type": "azure",
            "base_url": "url",
            "api_version": "2024-02-15-preview",
            "tags": ["wag-gpt4-128k"]
        },
        {
            "model": "gpt-35-turbo-16k",
            "api_key": "api-key",
            "api_type": "azure",
            "base_url": "url",
            "api_version": "2024-02-15-preview",
            "tags": ["gpt-35"]
        },

        {
            "model": "gpt4-vision",
            "api_key": "api-key",
            "api_type": "azure",
            "base_url": "url",
            "api_version": "2023-12-01-preview",
            "tags": ["gpt-vision"]
        },

        {
            "model": "dall-e-3",
            "api_key": "api-key",
            "api_type": "azure",
            "base_url": "url/",
            "api_version": "2023-12-01-preview",
            "tags": ["dalle"]
        }
]

nazkhan-8451 · 2024-04-26T14:27:34Z

@WaelKarkoub I changed the code to cache=None and upgraded to pyautogen latest. There are 2 problems I am seeing:

messages are truncated
dalle agent saying, as a AI-text based model it can't generate images

dalle (to critic):

robot

--------------------------------------------------------------------------------
critic (to dalle):

CRITICS: the image needs to improve the depiction of the robot to make

--------------------------------------------------------------------------------
dalle (to critic):

I'm sorry for any confusion, but as an AI text-based model, I

--------------------------------------------------------------------------------
critic (to dalle):

TERMINATE

WaelKarkoub · 2024-04-26T14:30:41Z

@nazkhan-8451 your config looks correct. Your timeout is high enough that it shouldn't cause a problem. I'll make an azure account and test your script again

WaelKarkoub · 2024-04-29T23:04:41Z

@nazkhan-8451 i couldn't reproduce this bug, does this still happen to you?

nazkhan-8451 · 2024-04-30T19:01:07Z

@WaelKarkoub It does. Not sure what am i doing wrong or how to go around it.

WaelKarkoub · 2024-04-30T19:04:08Z

@WaelKarkoub It does. Not sure what am i doing wrong or how to go around it.

@nazkhan-8451 check if you set hard limits in azure, not sure how that would look like. And if possible, check if this happens with OpenAI

nazkhan-8451 · 2024-04-30T19:07:32Z

@WaelKarkoub dall-e deployment works fine because i can generate image with this

import os
from openai import AzureOpenAI
import json
from autogen.agentchat.contrib import img_utils

client = AzureOpenAI(
    api_version="2024-02-01",
    azure_endpoint="",
    api_key="",
)

result = client.images.generate(
    model="dall-e-3", # the name of your DALL-E 3 deployment
    prompt="""Create image based on ice-cream description. Just create ice-cream image. Do NOT include name, words and description. Make it photorealistic, enhance its clarity. focus on ice-cream.

    Name: """ + str(recipe_name[0]) + 
    
    """Description:""" + str(recipe_description[0]),
    n=1
)

image_url = json.loads(result.model_dump_json())['data'][0]['url']
pil_img = img_utils.get_pil_image(image_url)
pil_img

nazkhan-8451 · 2024-04-30T19:08:06Z

I don't have openAI dall-e to test it.

WaelKarkoub · 2024-04-30T19:11:09Z

@nazkhan-8451 my concern is not the image generation part, but the chat completion side of things (i.e using the gpt models). See if you can still generate large texts with gpt

nazkhan-8451 · 2024-04-30T19:12:06Z

@WaelKarkoub this works: https://github.com/microsoft/autogen/blob/main/notebook/agentchat_groupchat_research.ipynb

WaelKarkoub · 2024-04-30T19:14:53Z

@nazkhan-8451 https://github.com/microsoft/autogen/blob/main/notebook/agentchat_image_generation_capability.ipynb Does this work for you? just change the model name, API key, etc... accordingly

nazkhan-8451 · 2024-04-30T19:38:37Z

@WaelKarkoub this is giving the error: dalle (to critic):

robot

critic (to dalle):

CRITICS: the image needs to improve the depiction of the robot to make

dalle (to critic):

I'm sorry for any confusion, but as an AI text-based model, I

critic (to dalle):

TERMINATE

WaelKarkoub · 2024-04-30T19:42:20Z

@nazkhan-8451 Disable cache again by adjusting the configs, the output is the same because it's looking through your cache.

WaelKarkoub · 2024-05-01T02:16:52Z

@nazkhan-8451 just making sure, the prompt in the notebook is different from the console output your pasted in your comment. Can you run the notebook as is and see what the output is like? Make sure you disable the cache seed as well

nazkhan-8451 · 2024-05-01T13:21:54Z

@WaelKarkoub I ran the notebook as is.

dalle (to critic):

A happy dog wearing a shirt saying 'I Love AutoGen'. Make sure the text is clear.

--------------------------------------------------------------------------------
critic (to dalle):

CRITICS: the image needs to improve the visibility and readability of the text

--------------------------------------------------------------------------------
dalle (to critic):

I'm sorry for any confusion, but as an AI text-based model, I

--------------------------------------------------------------------------------
critic (to dalle):

TERMINATE

WaelKarkoub · 2024-05-01T13:39:03Z

@nazkhan-8451 yeah I'm stumped, would you mind posting it on Discord? https://aka.ms/autogen-dc. If not, I can post the issue myself as well

nazkhan-8451 · 2024-05-01T13:40:41Z

@WaelKarkoub I don't have discord. If you could post, we can continue to collaborate here. Thank you for all the help.

WaelKarkoub · 2024-05-01T13:47:43Z

@nazkhan-8451 can you try using MultimodalConversableAgent in your test script instead of conversable agents? it's in autogen/agentchat/contrib/multimodal_conversable_agent.py

nazkhan-8451 · 2024-05-01T14:02:48Z

@WaelKarkoub Converted both of them to Multimodal agent.

def critic_agent() -> MultimodalConversableAgent:
    return MultimodalConversableAgent(
        name="critic",
        llm_config=gpt_vision_config,
        system_message=CRITIC_SYSTEM_MESSAGE,
        max_consecutive_auto_reply=3,
        human_input_mode="NEVER",
        is_termination_msg=lambda msg: _is_termination_message(msg),
    )


def image_generator_agent() -> MultimodalConversableAgent:
    # Create the agent
    agent = MultimodalConversableAgent(
        name="dalle",
        llm_config=gpt_vision_config,
        max_consecutive_auto_reply=3,
        human_input_mode="NEVER",
        is_termination_msg=lambda msg: _is_termination_message(msg),
    )

    # Add image generation ability to the agent
    dalle_gen = generate_images.DalleImageGenerator(llm_config=dalle_config)
    image_gen_capability = generate_images.ImageGeneration(
        image_generator=dalle_gen, text_analyzer_llm_config=gpt_config
    )

    image_gen_capability.add_to_agent(agent)
    return agent

Needed to fix error in /autogen/agentchat/contrib/capabilities/generate_images.py to run it. Changed the system_message to

# system_messages = "\n".join([msg['message'] for msg in agent.system_message if 'message' in msg])
     # agent.update_system_message(system_messages + "\n" + SYSTEM_MESSAGE)

still got:

dalle (to critic):

A happy dog wearing a shirt saying 'I Love AutoGen'. Make sure the text is clear.

--------------------------------------------------------------------------------
critic (to dalle):

CRITICS: the image needs to improve the visibility of the text on the

--------------------------------------------------------------------------------
dalle (to critic):

I'm sorry for any confusion, but I am unable to generate images. If

--------------------------------------------------------------------------------
critic (to dalle):

TERMINATE

nazkhan-8451 · 2024-05-03T19:10:50Z

@WaelKarkoub I figured out where the bug is. It's the code which creates DalleImageGenerator. Because if I create a class manually like following and calls the Azure Dalle, it works. So basically DalleImageGenerator doesn't know how to create from Azure Dalle and only works with OpenAI (or that's what I understand).

This works (https://github.com/microsoft/autogen/blob/main/notebook/agentchat_dalle_and_gpt4v.ipynb):

from openai import AzureOpenAI

dalle_client = AzureOpenAI(
    api_version="2024-02-01",
    azure_endpoint="",
    api_key="",
)

class DALLEAgent(ConversableAgent):
    def __init__(self, name, llm_config: dict, **kwargs):
        super().__init__(name, llm_config=llm_config, **kwargs)

        # try:
        #     config_list = llm_config["config_list"]
        #     api_key = config_list[0]["api_key"]
        # except Exception as e:
        #     print("Unable to fetch API Key, because", e)
        #     api_key = os.getenv("OPENAI_API_KEY")

        # I had to remove all code that can call OpenAI and created force called the Azure client
        self._dalle_client = dalle_client
        self.register_reply([Agent, None], DALLEAgent.generate_dalle_reply)

WaelKarkoub · 2024-05-03T19:42:28Z

@nazkhan-8451 great catch! It's interesting how this bug affected the text output for other agents I'll have to take a look at it. Do you want to submit a PR for a fix? I don't mind doing that as well

nazkhan-8451 · 2024-05-03T19:47:50Z

Please, you do that. I will make this issue closed. Thank you.

whiskyboy · 2024-05-13T03:09:36Z

@WaelKarkoub @nazkhan-8451 I've faced the same text output cut-off issue when testing image generation capabilities. I'm also using AzureOpenAI deployment, and finally found it may be a limitation with AzureOpenAI GPT-4 Turbo with Vision deployment.

From the document, it looks like we have to set a max_tokens value in the request, otherwise the response will be cut-off:

After adding a max_tokens field into the llm_config when constructing the critic and dalle agent, I got the expected output:

def critic_agent() -> autogen.ConversableAgent:
    return autogen.ConversableAgent(
        name="critic",
        llm_config={"config_list": config_list_gpt4v, "temperature": 0.7, "max_tokens": 400},
        system_message=CRITIC_SYSTEM_MESSAGE,
        max_consecutive_auto_reply=3,
        human_input_mode="NEVER",
    )

nazkhan-8451 · 2024-05-15T13:46:28Z

@whiskyboy max_token solved the cutoff problem!
Are you using Azure Dalle? Because for Dalle I get the following error. But the api key is fine because when I forced it to accept it (showed above comment) it works.

AuthenticationError Traceback (most recent call last) Cell In[11], [line 7](vscode-notebook-cell:?execution_count=11&line=7) [4](vscode-notebook-cell:?execution_count=11&line=4) img_prompt = "A happy dog wearing a shirt saying 'I Love AutoGen'. Make sure the text is clear." [5](vscode-notebook-cell:?execution_count=11&line=5) # img_prompt = "Ask me how I'm doing" ----> [7](vscode-notebook-cell:?execution_count=11&line=7) result = dalle.initiate_chat(critic, message=img_prompt) AuthenticationError: Error code: 401 - {'error': {'code': 'invalid_api_key', 'message': 'Incorrect API key provided: f***************************4e23. You can find your API key at https://platform.openai.com/account/api-keys.', 'param': None, 'type': 'invalid_request_error'}}

whiskyboy · 2024-05-17T01:18:43Z

Are you using Azure Dalle? Because for Dalle I get the following error. But the api key is fine because when I forced it to accept it (showed above comment) it works.

@nazkhan-8451 No, I'm not using Azure Dalle. Instead I'm testing with HuggingFace text-to-image models (see #2599 ). I will try Azure Dalle later.

nazkhan-8451 added the bug label Apr 25, 2024

WaelKarkoub self-assigned this Apr 29, 2024

nazkhan-8451 closed this as completed May 3, 2024

WaelKarkoub mentioned this issue May 4, 2024

[Bug] Add Azure-hosted Dalle Image Generation Support #2586

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Dalle-Critic not working #2510

[Bug]: Dalle-Critic not working #2510

nazkhan-8451 commented Apr 25, 2024 •

edited by WaelKarkoub

Loading

ekzhu commented Apr 25, 2024

WaelKarkoub commented Apr 25, 2024 •

edited

Loading

nazkhan-8451 commented Apr 26, 2024

WaelKarkoub commented Apr 26, 2024

nazkhan-8451 commented Apr 26, 2024 •

edited by WaelKarkoub

Loading

nazkhan-8451 commented Apr 26, 2024

WaelKarkoub commented Apr 26, 2024

WaelKarkoub commented Apr 29, 2024

nazkhan-8451 commented Apr 30, 2024

WaelKarkoub commented Apr 30, 2024

nazkhan-8451 commented Apr 30, 2024

nazkhan-8451 commented Apr 30, 2024

WaelKarkoub commented Apr 30, 2024

nazkhan-8451 commented Apr 30, 2024

WaelKarkoub commented Apr 30, 2024

nazkhan-8451 commented Apr 30, 2024

WaelKarkoub commented Apr 30, 2024 •

edited

Loading

WaelKarkoub commented May 1, 2024

nazkhan-8451 commented May 1, 2024

WaelKarkoub commented May 1, 2024

nazkhan-8451 commented May 1, 2024

WaelKarkoub commented May 1, 2024

nazkhan-8451 commented May 1, 2024

nazkhan-8451 commented May 3, 2024

WaelKarkoub commented May 3, 2024

nazkhan-8451 commented May 3, 2024

whiskyboy commented May 13, 2024 •

edited

Loading

nazkhan-8451 commented May 15, 2024

whiskyboy commented May 17, 2024

[Bug]: Dalle-Critic not working #2510

[Bug]: Dalle-Critic not working #2510

Comments

nazkhan-8451 commented Apr 25, 2024 • edited by WaelKarkoub Loading

Describe the bug

Steps to reproduce

Model Used

Expected Behavior

Screenshots and logs

Additional Information

ekzhu commented Apr 25, 2024

WaelKarkoub commented Apr 25, 2024 • edited Loading

nazkhan-8451 commented Apr 26, 2024

WaelKarkoub commented Apr 26, 2024

nazkhan-8451 commented Apr 26, 2024 • edited by WaelKarkoub Loading

nazkhan-8451 commented Apr 26, 2024

WaelKarkoub commented Apr 26, 2024

WaelKarkoub commented Apr 29, 2024

nazkhan-8451 commented Apr 30, 2024

WaelKarkoub commented Apr 30, 2024

nazkhan-8451 commented Apr 30, 2024

nazkhan-8451 commented Apr 30, 2024

WaelKarkoub commented Apr 30, 2024

nazkhan-8451 commented Apr 30, 2024

WaelKarkoub commented Apr 30, 2024

nazkhan-8451 commented Apr 30, 2024

WaelKarkoub commented Apr 30, 2024 • edited Loading

WaelKarkoub commented May 1, 2024

nazkhan-8451 commented May 1, 2024

WaelKarkoub commented May 1, 2024

nazkhan-8451 commented May 1, 2024

WaelKarkoub commented May 1, 2024

nazkhan-8451 commented May 1, 2024

nazkhan-8451 commented May 3, 2024

WaelKarkoub commented May 3, 2024

nazkhan-8451 commented May 3, 2024

whiskyboy commented May 13, 2024 • edited Loading

nazkhan-8451 commented May 15, 2024

whiskyboy commented May 17, 2024

nazkhan-8451 commented Apr 25, 2024 •

edited by WaelKarkoub

Loading

WaelKarkoub commented Apr 25, 2024 •

edited

Loading

nazkhan-8451 commented Apr 26, 2024 •

edited by WaelKarkoub

Loading

WaelKarkoub commented Apr 30, 2024 •

edited

Loading

whiskyboy commented May 13, 2024 •

edited

Loading