diff --git a/docs/deployment/frameworks/autogen.md b/docs/deployment/frameworks/autogen.md new file mode 100644 index 000000000000..ad8c167659ef --- /dev/null +++ b/docs/deployment/frameworks/autogen.md @@ -0,0 +1,83 @@ +--- +title: AutoGen +--- +[](){ #deployment-autogen } + +[AutoGen](https://github.com/microsoft/autogen) is a framework for creating multi-agent AI applications that can act autonomously or work alongside humans. + +## Prerequisites + +- Setup vLLM environment + +- Setup [AutoGen](https://microsoft.github.io/autogen/0.2/docs/installation/) environment + +```console +pip install vllm + +# Install AgentChat and OpenAI client from Extensions +# AutoGen requires Python 3.10 or later. +pip install -U "autogen-agentchat" "autogen-ext[openai]" +``` + +## Deploy + +- Start the vLLM server with the supported chat completion model, e.g. + +```console +python -m vllm.entrypoints.openai.api_server \ + --model mistralai/Mistral-7B-Instruct-v0.2 +``` + +- Call it with AutoGen: + +```python +import asyncio +from autogen_core.models import UserMessage +from autogen_ext.models.openai import OpenAIChatCompletionClient +from autogen_core.models import ModelFamily + + +async def main() -> None: + # Create a model client + model_client = OpenAIChatCompletionClient( + model="mistralai/Mistral-7B-Instruct-v0.2", + base_url="http://{your-vllm-host-ip}:{your-vllm-host-port}/v1", + api_key="EMPTY", + model_info={ + "vision": False, + "function_calling": False, + "json_output": False, + "family": ModelFamily.MISTRAL, + "structured_output": True, + }, + ) + + messages = [UserMessage(content="Write a very short story about a dragon.", source="user")] + + # Create a stream. + stream = model_client.create_stream(messages=messages) + + # Iterate over the stream and print the responses. + print("Streamed responses:") + async for response in stream: + if isinstance(response, str): + # A partial response is a string. + print(response, flush=True, end="") + else: + # The last response is a CreateResult object with the complete message. + print("\n\n------------\n") + print("The complete response:", flush=True) + print(response.content, flush=True) + + # Close the client when done. + await model_client.close() + + +asyncio.run(main()) +``` + +For details, see the tutorial: + +- [Using vLLM in AutoGen](https://microsoft.github.io/autogen/0.2/docs/topics/non-openai-models/local-vllm/) + +- [OpenAI-compatible API examples](https://microsoft.github.io/autogen/stable/reference/python/autogen_ext.models.openai.html#autogen_ext.models.openai.OpenAIChatCompletionClient)