Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 83 additions & 0 deletions docs/deployment/frameworks/autogen.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
---
title: AutoGen
---
[](){ #deployment-autogen }

[AutoGen](https://github.com/microsoft/autogen) is a framework for creating multi-agent AI applications that can act autonomously or work alongside humans.

## Prerequisites

- Setup vLLM environment

- Setup [AutoGen](https://microsoft.github.io/autogen/0.2/docs/installation/) environment

```console
pip install vllm

# Install AgentChat and OpenAI client from Extensions
# AutoGen requires Python 3.10 or later.
pip install -U "autogen-agentchat" "autogen-ext[openai]"
```

## Deploy

- Start the vLLM server with the supported chat completion model, e.g.

```console
python -m vllm.entrypoints.openai.api_server \
--model mistralai/Mistral-7B-Instruct-v0.2
```

- Call it with AutoGen:

```python
import asyncio
from autogen_core.models import UserMessage
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_core.models import ModelFamily


async def main() -> None:
# Create a model client
model_client = OpenAIChatCompletionClient(
model="mistralai/Mistral-7B-Instruct-v0.2",
base_url="http://{your-vllm-host-ip}:{your-vllm-host-port}/v1",
api_key="EMPTY",
model_info={
"vision": False,
"function_calling": False,
"json_output": False,
"family": ModelFamily.MISTRAL,
"structured_output": True,
},
)

messages = [UserMessage(content="Write a very short story about a dragon.", source="user")]

# Create a stream.
stream = model_client.create_stream(messages=messages)

# Iterate over the stream and print the responses.
print("Streamed responses:")
async for response in stream:
if isinstance(response, str):
# A partial response is a string.
print(response, flush=True, end="")
else:
# The last response is a CreateResult object with the complete message.
print("\n\n------------\n")
print("The complete response:", flush=True)
print(response.content, flush=True)

# Close the client when done.
await model_client.close()


asyncio.run(main())
```

For details, see the tutorial:

- [Using vLLM in AutoGen](https://microsoft.github.io/autogen/0.2/docs/topics/non-openai-models/local-vllm/)

- [OpenAI-compatible API examples](https://microsoft.github.io/autogen/stable/reference/python/autogen_ext.models.openai.html#autogen_ext.models.openai.OpenAIChatCompletionClient)