-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Huggingface agent #2599
Huggingface agent #2599
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## 0.2 #2599 +/- ##
===========================================
- Coverage 33.12% 19.01% -14.12%
===========================================
Files 88 96 +8
Lines 9518 9868 +350
Branches 2037 2253 +216
===========================================
- Hits 3153 1876 -1277
- Misses 6096 7805 +1709
+ Partials 269 187 -82
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
@whiskyboy thanks for the PR! I had a couple of design questions and wanted your opinion on them. Autogen has an image generation capability, which allows anyone to add text-to-image capabilities to any LLM.
What do you think about implementing a new custom
For image-to-text, we also have a capability called
|
@WaelKarkoub Thanks for your comment!
|
@whiskyboy This is very cool and I appreciate your efforts! Your reasoning fits well with what I think now. Both approaches could be beneficial to the autogen community and could coexist. We can have standalone huggingface conversible agents as well as huggingface image generators, audio generators, etc. I look at Autogen as a lego world where users can mix and match different useful tools (lego pieces), and the tools you've developed are valuable and versatile enough to be applicable across many areas (e.g., agent capabilities). For a concrete example, what do you think about breaking down the text-to-image functionality and implementing it as an One last question, is the image-to-image capability the same as image editing? If so, I'm considering improving the image generator capability to allow for this. |
@WaelKarkoub It's glad to know we are working towards the same goal!
Sounds like a versatile lego block that could be utilized by both standalone agents and agent capabilities? I think it's a good idea! As it could enhance the function reusability, and make the code more readable and maintainable.
Yes, some typical user scenarios include style transfer, image inpainting, etc. For instance, the |
|
GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
---|---|---|---|---|---|
10493810 | Triggered | Generic Password | d422c63 | notebook/agentchat_pgvector_RetrieveChat.ipynb | View secret |
10493810 | Triggered | Generic Password | d422c63 | notebook/agentchat_pgvector_RetrieveChat.ipynb | View secret |
10493810 | Triggered | Generic Password | d422c63 | notebook/agentchat_pgvector_RetrieveChat.ipynb | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secrets safely. Learn here the best practices.
- Revoke and rotate these secrets.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
#2836 see if this PR could make sense for you as well, we want to add multimodality support for all agents and this is the first step |
Loving the design! |
@WaelKarkoub do you have any more comments on this PR? |
hi @whiskyboy - thanks so much for this PR - we've rebased it to the 0.2 branch. consider please also updating for 0.4 if you want, or resolving the conflicts with the 0.2 and we will get someone to review further. |
closing as stale, please reopen if you would like to update |
Why are these changes needed?
Introducing a new agent named
HuggingFaceAgent
which can connect to models in HuggingFace Hub to achieve several multimodal capabilities.This agent essentially consists of a pairing between an assistant and a user-proxy agent, both are registered with the huggingface-hub models capabilities. Users could seamlessly access this agent to leverage its multimodal capabilities, without the need for manual registration of toolkits for execution.
Some key changes:
HuggingFaceClient
class inautogen/agentchat/contrib/huggingface_utils.py
: this class simplifies calling HuggingFace models locally or remotely.HuggingFaceAgent
class inautogen/agentchat/contrib/huggingface_agent.py
: this agent utilizesHuggingFaceClient
to achieve multimodal capabilities.HuggingFaceImageGenerator
class inautogen/agentchat/contrib/capabilities/generate_images.py
: this class enables text-based LLMs to generate images usingHuggingFaceClient
.Related issue number
The second approach mentioned in #2577
Checks