Huggingface agent #2599

whiskyboy · 2024-05-05T16:24:30Z

Why are these changes needed?

Introducing a new agent named HuggingFaceAgent which can connect to models in HuggingFace Hub to achieve several multimodal capabilities.

This agent essentially consists of a pairing between an assistant and a user-proxy agent, both are registered with the huggingface-hub models capabilities. Users could seamlessly access this agent to leverage its multimodal capabilities, without the need for manual registration of toolkits for execution.

Some key changes:

added HuggingFaceClient class in autogen/agentchat/contrib/huggingface_utils.py: this class simplifies calling HuggingFace models locally or remotely.
added HuggingFaceAgent class in autogen/agentchat/contrib/huggingface_agent.py: this agent utilizes HuggingFaceClient to achieve multimodal capabilities.
added HuggingFaceImageGenerator class in autogen/agentchat/contrib/capabilities/generate_images.py: this class enables text-based LLMs to generate images using HuggingFaceClient.
added notebook samples to demostrate how these new classes work
fixed some bugs

Related issue number

The second approach mentioned in #2577

Checks

I've included any doc changes needed for https://microsoft.github.io/autogen/. See https://microsoft.github.io/autogen/docs/Contribute#documentation to build and test documentation locally.
I've added tests (if relevant) corresponding to the changes introduced in this PR.
I've made sure all auto checks have passed.

codecov-commenter · 2024-05-05T16:25:58Z

Codecov Report

Attention: Patch coverage is 4.08922% with 258 lines in your changes missing coverage. Please review.

Project coverage is 19.01%. Comparing base (84c7c24) to head (9155090).
Report is 288 commits behind head on 0.2.

Files with missing lines	Patch %	Lines
autogen/oai/huggingface.py	3.90%	123 Missing ⚠️
autogen/agentchat/contrib/huggingface_agent.py	0.00%	109 Missing ⚠️
.../agentchat/contrib/capabilities/generate_images.py	0.00%	20 Missing ⚠️
autogen/oai/client.py	40.00%	5 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##              0.2    #2599       +/-   ##
===========================================
- Coverage   33.12%   19.01%   -14.12%     
===========================================
  Files          88       96        +8     
  Lines        9518     9868      +350     
  Branches     2037     2253      +216     
===========================================
- Hits         3153     1876     -1277     
- Misses       6096     7805     +1709     
+ Partials      269      187       -82

Flag	Coverage Δ
unittests	`18.97% <4.08%> (-14.16%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚨 Try these New Features:

Flaky Tests Detection - Detect and resolve failed and flaky tests
JS Bundle Analysis - Avoid shipping oversized bundles

WaelKarkoub · 2024-05-05T22:15:19Z

@whiskyboy thanks for the PR! I had a couple of design questions and wanted your opinion on them.

Autogen has an image generation capability, which allows anyone to add text-to-image capabilities to any LLM.

autogen/autogen/agentchat/contrib/capabilities/generate_images.py

Line 112 in e878be5

class ImageGeneration(AgentCapability):

What do you think about implementing a new custom ImageGenerator that uses huggingface's apis, as opposed to creating a new agent type? We have dalle image generator implemented for reference.

autogen/autogen/agentchat/contrib/capabilities/generate_images.py

Line 22 in e878be5

class ImageGenerator(Protocol):

For image-to-text, we also have a capability called VisionCapability. @BeibinLi has more information on the design choices for that capability but I just wanted to bring it up for awareness.

autogen/autogen/agentchat/contrib/capabilities/vision_capability.py

Line 25 in e878be5

class VisionCapability(AgentCapability):

whiskyboy · 2024-05-06T01:47:04Z

@WaelKarkoub Thanks for your comment!
Yes, and in fact I have got inspired and learned a lot from the design of the two capabilities you mentioend above, and also from the MultimodalConversableAgent and LLaVAAgent, during development. Here are my thoughts:

Can we achieve the same functionality within the current multimodal capability implementations?
Certainly, we can implement a custom ImageGenerator or a custom custom_caption_func to realize the text-to-image and image-to-text capabilities using Huggingface's APIs. However, Huggingface provides the potential of many other multimodal capabilities, such as 'image-to-image', 'audio-to-audio', etc, which go beyond the current implementations. (A full list could be found here.) This draft PR serves as a PoC only now to show how a huggingface agent works. Once we align on the design, I'll proceed with implementing additional capabilities
Should we add a new agent type or should we add some new multimodal capabilities to leveraging Huggingface multimodal models?
Both designs make sense to me. Introducing a new agent type would allow for covering a diverse range of different multimodal capabilities for general purpose easily, while registering a new capability is more suitable for a specific task. (But we can also have a general capability or register multiple capabilities to one agent. So I'm flexible and open to either approach)
Do we really need a built-in support to Huggingface multimodal models?
I got the idea inspired from Transformers Agents and JARVIS . It's appealing (to me at least) to have a non-openai and out-of-box solution for adding multimodal capabilities to a text-only LLM in autogen. Huggingface stands out as a suitable choice due to its diverse range of multimodal models spanning from general-purpose to domain-specific areas. Additionally, it offers a cost-effective solution.

WaelKarkoub · 2024-05-06T02:31:28Z

@whiskyboy This is very cool and I appreciate your efforts! Your reasoning fits well with what I think now. Both approaches could be beneficial to the autogen community and could coexist. We can have standalone huggingface conversible agents as well as huggingface image generators, audio generators, etc.

I look at Autogen as a lego world where users can mix and match different useful tools (lego pieces), and the tools you've developed are valuable and versatile enough to be applicable across many areas (e.g., agent capabilities). For a concrete example, what do you think about breaking down the text-to-image functionality and implementing it as an ImageGenerator that HuggingFaceAgent could also utilize? The HuggingFaceAgent wouldn't implement it as a capability but could directly use this newly decoupled logic. We could apply a similar strategy to other modalities as well.

One last question, is the image-to-image capability the same as image editing? If so, I'm considering improving the image generator capability to allow for this.

whiskyboy · 2024-05-06T12:31:36Z

@WaelKarkoub It's glad to know we are working towards the same goal!

what do you think about breaking down the text-to-image functionality and implementing it as an ImageGenerator that HuggingFaceAgent could also utilize?

Sounds like a versatile lego block that could be utilized by both standalone agents and agent capabilities? I think it's a good idea! As it could enhance the function reusability, and make the code more readable and maintainable.

is the image-to-image capability the same as image editing?

Yes, some typical user scenarios include style transfer, image inpainting, etc. For instance, the timbrooks/instruct-pix2pix model could transform a dog in one image into a cat. These models are usually diffusion models that accept a souce image and a prompt text as input.

…4v format output

…method

gitguardian · 2024-05-27T05:55:17Z

⚠️ GitGuardian has uncovered 3 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

Since your pull request originates from a forked repository, GitGuardian is not able to associate the secrets uncovered with secret incidents on your GitGuardian dashboard.
Skipping this check run and merging your pull request will create secret incidents on your GitGuardian dashboard.

🔎 Detected hardcoded secrets in your pull request

GitGuardian id	GitGuardian status	Secret	Commit	Filename
10493810	Triggered	Generic Password	`d422c63`	notebook/agentchat_pgvector_RetrieveChat.ipynb	View secret
10493810	Triggered	Generic Password	`d422c63`	notebook/agentchat_pgvector_RetrieveChat.ipynb	View secret
10493810	Triggered	Generic Password	`d422c63`	notebook/agentchat_pgvector_RetrieveChat.ipynb	View secret

🛠 Guidelines to remediate hardcoded secrets

Understand the implications of revoking this secret by investigating where it is used in your code.
Replace and store your secrets safely. Learn here the best practices.
Revoke and rotate these secrets.
If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider

following these best practices for managing and storing secrets including API keys and other credentials
install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.

^{🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.}

WaelKarkoub · 2024-05-30T22:59:07Z

#2836 see if this PR could make sense for you as well, we want to add multimodality support for all agents and this is the first step

whiskyboy · 2024-06-03T07:39:54Z

#2836 see if this PR could make sense for you as well, we want to add multimodality support for all agents and this is the first step

Loving the design!

whiskyboy · 2024-06-17T08:16:45Z

@WaelKarkoub do you have any more comments on this PR?

rysweet · 2024-10-12T00:10:17Z

hi @whiskyboy - thanks so much for this PR - we've rebased it to the 0.2 branch. consider please also updating for 0.4 if you want, or resolving the conflicts with the 0.2 and we will get someone to review further.

rysweet · 2024-10-18T18:30:02Z

closing as stale, please reopen if you would like to update

whiskyboy added 7 commits May 4, 2024 15:16

Add HuggingFaceAgent

feed2fd

Add a PoC notebook with text-to-image capability

9cd2eb1

Add image_to_text tool

0822be5

rename hf task to hf capability

218fc7d

Add image-to-image capability

705741f

update notebook

74191f0

update notebook

a758f21

whiskyboy mentioned this pull request May 5, 2024

[Feature Request]: Connect to the HuggingFace Hub to achieve a multimodal capability #2577

Open

sonichi requested a review from BeibinLi May 5, 2024 17:07

sonichi added multimodal language + vision, speech etc. integration models Pertains to using alternate, non-GPT, models (e.g., local models, llama, etc.) labels May 5, 2024

whiskyboy and others added 11 commits May 8, 2024 13:57

Merge branch 'microsoft:main' into huggingface_agent

0c96f60

Support multiple turns between inner proxy and assistant; support gpt…

45d36b9

…4v format output

Merge branch 'main' into huggingface_agent

9a17774

simplify arguments

1b5ddfb

add HuggingFaceImageGenerator to image generation capability

e53fdff

add HuggingFaceClient

0cf54fd

add HuggingFaceImageGenerator example

002ea7b

update model used in sample notebook

1c1c70c

add default model and inference_mode to HuggingFaceClient.__init__() …

c0c60aa

…method

use HuggingFaceClient() to execute task and add VQA capability

5d65d7d

bug fix and remove unused import

960b3f0

whiskyboy mentioned this pull request May 17, 2024

[Bug]: Dalle-Critic not working #2510

Closed

bugs fix

a469514

Merge branch 'microsoft:main' into huggingface_agent

d422c63

refactor HuggingFaceAgent init

3282ae4

whiskyboy added 3 commits May 31, 2024 08:21

Implement HuggingFaceClient with ModelClient protocol

9899e1f

fix default hf config

a1d9ba0

expose assistant agent in the constructor of HuggingFaceAgent

0f17c09

Merge branch 'main' into huggingface_agent

9155090

whiskyboy had a problem deploying to openai1 July 5, 2024 04:12 — with GitHub Actions Failure

ekzhu changed the base branch from main to 0.2 October 2, 2024 18:29

jackgerrits added the 0.2 Issues which are related to the pre 0.4 codebase label Oct 4, 2024

rysweet added the awaiting-op-response Issue or pr has been triaged or responded to and is now awaiting a reply from the original poster label Oct 10, 2024

rysweet closed this Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Huggingface agent #2599

Huggingface agent #2599

whiskyboy commented May 5, 2024 •

edited

Loading

codecov-commenter commented May 5, 2024 •

edited

Loading

WaelKarkoub commented May 5, 2024

whiskyboy commented May 6, 2024

WaelKarkoub commented May 6, 2024

whiskyboy commented May 6, 2024 •

edited

Loading

gitguardian bot commented May 27, 2024 •

edited

Loading

WaelKarkoub commented May 30, 2024

whiskyboy commented Jun 3, 2024 •

edited

Loading

whiskyboy commented Jun 17, 2024 •

edited

Loading

rysweet commented Oct 12, 2024

rysweet commented Oct 18, 2024

Huggingface agent #2599

Huggingface agent #2599

Conversation

whiskyboy commented May 5, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

codecov-commenter commented May 5, 2024 • edited Loading

Codecov Report

WaelKarkoub commented May 5, 2024

whiskyboy commented May 6, 2024

WaelKarkoub commented May 6, 2024

whiskyboy commented May 6, 2024 • edited Loading

gitguardian bot commented May 27, 2024 • edited Loading

⚠️ GitGuardian has uncovered 3 secrets following the scan of your pull request.

WaelKarkoub commented May 30, 2024

whiskyboy commented Jun 3, 2024 • edited Loading

whiskyboy commented Jun 17, 2024 • edited Loading

rysweet commented Oct 12, 2024

rysweet commented Oct 18, 2024

whiskyboy commented May 5, 2024 •

edited

Loading

codecov-commenter commented May 5, 2024 •

edited

Loading

whiskyboy commented May 6, 2024 •

edited

Loading

gitguardian bot commented May 27, 2024 •

edited

Loading

whiskyboy commented Jun 3, 2024 •

edited

Loading

whiskyboy commented Jun 17, 2024 •

edited

Loading