Skip to content

Conversation

@pratapladhani
Copy link

Pull Request: Enable Container Deployment for Python Agent Framework Sample

Summary

This PR fixes a critical networking issue that prevents the Python Agent Framework sample from running in containerized environments and adds comprehensive container deployment documentation and tooling.


Problem Statement

The Issue

When deploying the Python Agent Framework sample to containerized environments (Docker, Azure Container Apps, Kubernetes), the agent fails to receive any external traffic. The server starts successfully, but:

  • Health checks fail with "Connection refused"
  • Bot Framework messages never reach the agent
  • The agent appears healthy inside the container but is unreachable from outside

Root Cause

The host_agent_server.py was binding the HTTP server to localhost:

run_app(app, host="localhost", port=port, handle_signals=True)

In container networking, localhost (127.0.0.1) refers only to the loopback interface inside the container. This means:

  1. The server only accepts connections originating from within the same container
  2. External requests (from the container runtime, load balancer, or Bot Framework) are rejected
  3. Even the container's own health check probe fails because it comes from the container runtime, not from inside the container

Impact

Anyone deploying this sample to a containerized environment will encounter this issue, including:

  • Docker (local or remote)
  • Azure Container Apps
  • Azure Kubernetes Service (AKS)
  • AWS ECS/EKS
  • Google Cloud Run
  • Any Kubernetes cluster

This is a blocking issue for production deployments.


Solution

Code Change

Changed the network binding from localhost to 0.0.0.0:

# Before
run_app(app, host="localhost", port=port, handle_signals=True)

# After  
run_app(app, host="0.0.0.0", port=port, handle_signals=True)

Binding to 0.0.0.0 instructs the server to accept connections on all available network interfaces, including:

  • The container's virtual network interface (for external traffic)
  • The loopback interface (for internal health checks)

This is the standard practice for containerized applications and is safe because:

  1. Container isolation provides the security boundary
  2. Only exposed ports are accessible from outside the container
  3. Azure Container Apps and Kubernetes provide additional network policies

Additional Files

File Purpose
Dockerfile Production-ready container image with health checks
.dockerignore Excludes dev files from container builds
README.md (updated) Container deployment documentation and rationale

Why Container Deployment?

Production Readiness

Container deployment is the recommended approach for production Agent 365 workloads because it provides:

1. Environment Consistency

  • Identical runtime behavior across development, staging, and production
  • Eliminates "works on my machine" issues
  • Reproducible builds with explicit dependency versions

2. Operational Excellence

  • Health monitoring with automatic restart of unhealthy agents
  • Resource limits (CPU/memory) to prevent runaway processes
  • Centralized logging and observability

3. Scalability

  • Horizontal scaling based on demand
  • Azure Container Apps provides automatic scale-to-zero
  • Kubernetes HPA for enterprise deployments

4. Deployment Velocity

  • Zero-downtime deployments with rolling updates
  • Blue-green or canary deployment strategies
  • Instant rollback capability

Azure Integration

Container deployment unlocks the full Azure ecosystem:

Service Benefit
Azure Container Apps Serverless containers with built-in autoscaling, managed certificates, and pay-per-use pricing
Azure Kubernetes Service Enterprise-grade orchestration for complex multi-agent deployments
Azure Container Registry Private registry with geo-replication and vulnerability scanning
Azure Monitor Integrated logging, metrics, and distributed tracing

Cost Efficiency

  • Scale-to-zero: Pay only when the agent is handling requests
  • Right-sizing: Set precise CPU/memory limits to optimize costs
  • Shared infrastructure: Run multiple agents on the same cluster

Files Changed

1. host_agent_server.py

Change: localhost0.0.0.0 in two locations

- print(f"🚀 Server: localhost:{port}")
+ print(f"🚀 Server: 0.0.0.0:{port}")

- run_app(app, host="localhost", port=port, handle_signals=True)
+ run_app(app, host="0.0.0.0", port=port, handle_signals=True)

2. Dockerfile (New)

Production-ready Dockerfile with:

  • Python 3.12 slim base image
  • System dependencies for native Python packages
  • Health check configuration
  • Non-root user considerations
  • Optimized layer caching

3. .dockerignore (New)

Excludes from container builds:

  • Python cache files (__pycache__/, *.pyc)
  • Virtual environments (.venv/, venv/)
  • IDE configurations (.vscode/, .idea/)
  • Local environment files (.env)
  • Git metadata

4. README.md (Updated)

Added comprehensive "Container Deployment" section covering:

  • Why container deployment is recommended
  • Production benefits
  • Azure integration advantages
  • Network binding explanation
  • Docker build and run commands
  • Azure Container Apps deployment instructions

Testing

Local Validation

# Build the container
docker build -t python-agent-test -f python/agent-framework/sample-agent/Dockerfile python/agent-framework/sample-agent/

# Run with required environment variables
docker run -p 3978:3978 \
  -e AZURE_OPENAI_ENDPOINT=https://your-endpoint.openai.azure.com/ \
  -e AZURE_OPENAI_API_KEY=your-key \
  -e AZURE_OPENAI_DEPLOYMENT=gpt-4o \
  python-agent-test

# Test health endpoint (from host machine)
curl http://localhost:3978/api/health
# Expected: {"status": "ok", "agent_type": "AgentFrameworkAgent", "agent_initialized": true}

Test Results

Test Result
Docker build ✅ Passed
Container startup ✅ Passed
Health endpoint accessible from host ✅ Passed (with 0.0.0.0 binding)
Health endpoint fails with localhost ❌ Failed (expected - this is the bug)

Discovery Context

This issue was discovered while deploying the Python Agent Framework sample to Azure Container Apps for a shared demo tenant.

Symptoms observed:

  1. Container started successfully (logs showed initialization)
  2. Azure Container Apps health probes failed repeatedly
  3. Container entered restart loop due to failed health checks
  4. Bot Framework webhook deliveries failed with connection timeout

Debugging steps:

  1. Exec'd into the container - health endpoint worked with curl localhost:3978/api/health
  2. Checked container logs - no errors, server running normally
  3. Tested from host machine - "Connection refused"
  4. Identified the localhost binding as the root cause

Time to resolution: ~2 hours (would have been immediate with proper documentation)


Related Resources


Checklist

  • Code follows repository conventions
  • Copyright headers present on all new files
  • No "Kairo" legacy references
  • Docker build tested locally
  • Health endpoint accessible from outside container
  • README updated with deployment documentation
  • Commit message follows conventional commits format

Problem:
The Python Agent Framework sample bound to 'localhost' which prevented
the agent from receiving external traffic when running in containers.
Health checks failed and Bot Framework messages couldn't reach the agent.

Solution:
- Changed network binding from 'localhost' to '0.0.0.0' in host_agent_server.py
- Added Dockerfile for containerized deployments
- Added .dockerignore to exclude dev files from container builds
- Added comprehensive Container Deployment section to README explaining:
  - Why container deployment is recommended for production
  - Azure integration benefits (Container Apps, AKS, ACR)
  - Network binding requirements for containerized agents
  - Build and deployment instructions for Docker and Azure Container Apps

This fix was discovered while deploying to Azure Container Apps for a
shared demo tenant. The agent worked locally but failed all health
checks when containerized due to the localhost binding limitation.
@pratapladhani pratapladhani requested a review from a team as a code owner February 10, 2026 04:31
Copilot AI review requested due to automatic review settings February 10, 2026 04:31
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enables container-friendly deployment for the Python Agent Framework sample by adjusting the server bind address and adding Docker tooling + documentation to support running in Docker/Azure container environments.

Changes:

  • Bind the aiohttp server to 0.0.0.0 instead of localhost so the agent is reachable outside containers.
  • Add a Dockerfile and .dockerignore for building/running the sample in containers.
  • Extend the sample README with container deployment guidance (Docker + Azure Container Apps).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.

File Description
python/agent-framework/sample-agent/host_agent_server.py Updates server bind address and startup log output for container networking.
python/agent-framework/sample-agent/README.md Adds container deployment documentation and example commands.
python/agent-framework/sample-agent/Dockerfile Introduces a container build/run path with an HTTP healthcheck.
python/agent-framework/sample-agent/.dockerignore Reduces Docker build context by excluding common local/dev files.

Comment on lines 24 to 26
# Install Python dependencies from pyproject.toml
RUN pip install --no-cache-dir --root-user-action=ignore .

Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description mentions “Non-root user considerations”, but this Dockerfile runs everything as root (and suppresses root warnings via --root-user-action=ignore). If the intent is a production-ready container, consider adding a non-root user and switching to it before running the app (and installing deps accordingly).

Copilot uses AI. Check for mistakes.
Comment on lines 21 to 26
# Copy application code
COPY . .

# Install Python dependencies from pyproject.toml
RUN pip install --no-cache-dir --root-user-action=ignore .

Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For faster/cleaner image builds, consider copying only pyproject.toml (and any lock file) first, installing dependencies, then copying the rest of the source. The current “COPY . .” before dependency installation makes Docker layer caching ineffective and will reinstall deps on any source change.

Suggested change
# Copy application code
COPY . .
# Install Python dependencies from pyproject.toml
RUN pip install --no-cache-dir --root-user-action=ignore .
# Copy dependency metadata first for better caching
COPY pyproject.toml .
# Install Python dependencies from pyproject.toml
RUN pip install --no-cache-dir --root-user-action=ignore .
# Copy application code
COPY . .

Copilot uses AI. Check for mistakes.
Comment on lines 63 to 69
```bash
docker build -t python-agent .
docker run -p 3978:3978 \
-e AZURE_OPENAI_ENDPOINT=https://your-endpoint.openai.azure.com/ \
-e AZURE_OPENAI_API_KEY=your-key \
-e AZURE_OPENAI_DEPLOYMENT=gpt-4o \
python-agent
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Docker build/run snippet assumes the current working directory is the sample-agent folder. If a reader runs this from the repo root, it will either fail or pick up the wrong context. Consider clarifying the working directory (cd into python/agent-framework/sample-agent) or using an explicit -f path + build context in the command.

Copilot uses AI. Check for mistakes.
Comment on lines 75 to 85
# Build and push to Azure Container Registry
az acr build --registry <your-acr> --image python-agent:latest .

# Create Container App
az containerapp create \
--name python-agent \
--resource-group <your-rg> \
--environment <your-env> \
--image <your-acr>.azurecr.io/python-agent:latest \
--target-port 3978 \
--ingress external
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Azure Container Apps example doesn’t show how to provide required runtime configuration (e.g., AZURE_OPENAI_ENDPOINT/API_KEY/DEPLOYMENT, and any bot/auth settings) to the container app. Without setting these env vars/secrets, the deployed container likely won’t start successfully. Consider adding an example using --env-vars / secrets, and noting prerequisites like Azure CLI login and an ACR pull identity if needed.

Suggested change
# Build and push to Azure Container Registry
az acr build --registry <your-acr> --image python-agent:latest .
# Create Container App
az containerapp create \
--name python-agent \
--resource-group <your-rg> \
--environment <your-env> \
--image <your-acr>.azurecr.io/python-agent:latest \
--target-port 3978 \
--ingress external
# Make sure you are logged in and using the correct subscription
az login
az account set --subscription <your-subscription-id>
# Build and push to Azure Container Registry
az acr build --registry <your-acr> --image python-agent:latest .
# Create Container App (ensure this environment can pull from your ACR)
az containerapp create \
--name python-agent \
--resource-group <your-rg> \
--environment <your-env> \
--image <your-acr>.azurecr.io/python-agent:latest \
--target-port 3978 \
--ingress external \
--env-vars \
AZURE_OPENAI_ENDPOINT=https://your-endpoint.openai.azure.com/ \
AZURE_OPENAI_API_KEY=your-key \
AZURE_OPENAI_DEPLOYMENT=gpt-4o

Copilot uses AI. Check for mistakes.
Comment on lines 1 to 15
# Python
__pycache__/
*.py[cod]
.venv/
venv/

# IDE
.vscode/
.idea/

# Local files
.env
.env.local
*.log

Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding common Python build artifacts to .dockerignore (e.g., .pytest_cache/, .mypy_cache/, .ruff_cache/, build/, dist/, *.egg-info/). Excluding these helps avoid accidentally copying local build outputs into the image and improves build performance.

Copilot uses AI. Check for mistakes.
Comment on lines 336 to 337
print(f"📚 Endpoint: http://localhost:{port}/api/messages")
print(f"❤️ Health: http://localhost:{port}/api/health\n")
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The server now binds to 0.0.0.0, but the printed Endpoint/Health URLs still use http://localhost. In container/remote scenarios this is misleading (localhost from a different machine won’t reach the container). Consider printing 0.0.0.0 or deriving a display host (e.g., from an env var) to keep the startup output consistent with the actual binding.

Suggested change
print(f"📚 Endpoint: http://localhost:{port}/api/messages")
print(f"❤️ Health: http://localhost:{port}/api/health\n")
print(f"📚 Endpoint: http://0.0.0.0:{port}/api/messages")
print(f"❤️ Health: http://0.0.0.0:{port}/api/health\n")

Copilot uses AI. Check for mistakes.
Comment on lines 339 to +340
try:
run_app(app, host="localhost", port=port, handle_signals=True)
run_app(app, host="0.0.0.0", port=port, handle_signals=True)
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file’s header doesn’t follow the repo’s standard Microsoft MIT header format (it’s currently a single-line “Copyright (c) Microsoft. All rights reserved.”). Since this file is being modified, please update the header to the standard two-line “Copyright (c) Microsoft Corporation.” / “Licensed under the MIT License.” at the top of the file.

Copilot uses AI. Check for mistakes.
Comment on lines 34 to 36
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=10s --retries=3 \
CMD python -c "import urllib.request; urllib.request.urlopen('http://127.0.0.1:3978/api/health')" || exit 1
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HEALTHCHECK hard-codes port 3978 and doesn’t set a request timeout. This can fail if the app is configured to use a different PORT, or if the server ever selects a different port (host_agent_server.py can increment the port if the desired one is in use). Consider reading PORT from the environment in the healthcheck command and passing an explicit timeout to avoid hangs.

Copilot uses AI. Check for mistakes.
- Dockerfile: Add non-root user for security, fix layer caching order,
  use curl with PORT env var for healthcheck
- host_agent_server.py: Update copyright header to standard format,
  fix print statements to show 0.0.0.0 binding
- .dockerignore: Add Python build artifacts (.pytest_cache, .mypy_cache,
  .ruff_cache, build/, dist/, *.egg-info)
- README.md: Add cd instructions, Azure CLI login steps, and env vars
  for Container Apps deployment with secret reference example

Related to microsoft#187
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant