Skip to content

Conversation

skamenan7
Copy link
Contributor

@skamenan7 skamenan7 commented Oct 8, 2025

Implements AWS Bedrock inference provider using OpenAI-compatible endpoint for Llama models available through Bedrock.

Closes: #3410

What does this PR do?

Adds AWS Bedrock as an inference provider using the OpenAI-compatible endpoint. This lets us use Bedrock models (GPT-OSS, Llama) through the standard llama-stack inference API.

The implementation uses LiteLLM's OpenAI client under the hood, so it gets all the OpenAI compatibility features. The provider handles per-request API key overrides via headers.

Test Plan

Tested the following scenarios:

  • Non-streaming completion - basic request/response flow
  • Streaming completion - SSE streaming with chunked responses
  • Multi-turn conversations - context retention across turns
  • Tool calling - function calling with proper tool_calls format

Bedrock OpenAI-Compatible Provider - Test Results

Model: bedrock-inference/openai.gpt-oss-20b-1:0


Test 1: Model Listing

Request:

GET /v1/models HTTP/1.1

Response:

HTTP/1.1 200 OK
Content-Type: application/json

{
  "data": [
    {"identifier": "bedrock-inference/openai.gpt-oss-20b-1:0", ...},
    {"identifier": "bedrock-inference/openai.gpt-oss-40b-1:0", ...}
  ]
}

Test 2: Non-Streaming Completion

Request:

POST /v1/chat/completions HTTP/1.1
Content-Type: application/json

{
  "model": "bedrock-inference/openai.gpt-oss-20b-1:0",
  "messages": [{"role": "user", "content": "Say 'Hello from Bedrock' and nothing else"}],
  "stream": false
}

Response:

HTTP/1.1 200 OK
Content-Type: application/json

{
  "choices": [{
    "finish_reason": "stop",
    "message": {"content": "...Hello from Bedrock"}
  }],
  "usage": {"prompt_tokens": 79, "completion_tokens": 50, "total_tokens": 129}
}

Test 3: Streaming Completion

Request:

POST /v1/chat/completions HTTP/1.1
Content-Type: application/json

{
  "model": "bedrock-inference/openai.gpt-oss-20b-1:0",
  "messages": [{"role": "user", "content": "Count from 1 to 5"}],
  "stream": true
}

Response:

HTTP/1.1 200 OK
Content-Type: text/event-stream

[6 SSE chunks received]
Final content: "1, 2, 3, 4, 5"

Test 4: Error Handling - Invalid Model

Request:

POST /v1/chat/completions HTTP/1.1
Content-Type: application/json

{
  "model": "invalid-model-id",
  "messages": [{"role": "user", "content": "Hello"}],
  "stream": false
}

Response:

HTTP/1.1 404 Not Found
Content-Type: application/json

{
  "detail": "Model 'invalid-model-id' not found. Use 'client.models.list()' to list available Models."
}

Test 5: Multi-Turn Conversation

Request 1:

POST /v1/chat/completions HTTP/1.1

{
  "messages": [{"role": "user", "content": "My name is Alice"}]
}

Response 1:

HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Nice to meet you, Alice! How can I help you today?"}
  }]
}

Request 2 (with history):

POST /v1/chat/completions HTTP/1.1

{
  "messages": [
    {"role": "user", "content": "My name is Alice"},
    {"role": "assistant", "content": "...Nice to meet you, Alice!..."},
    {"role": "user", "content": "What is my name?"}
  ]
}

Response 2:

HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Your name is Alice."}
  }],
  "usage": {"prompt_tokens": 183, "completion_tokens": 42}
}

Context retained across turns


Test 6: System Messages

Request:

POST /v1/chat/completions HTTP/1.1

{
  "messages": [
    {"role": "system", "content": "You are Shakespeare. Respond only in Shakespearean English."},
    {"role": "user", "content": "Tell me about the weather"}
  ]
}

Response:

HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "Lo! I heed thy request..."}
  }],
  "usage": {"completion_tokens": 813}
}

Test 7: Tool Calling

Request:

POST /v1/chat/completions HTTP/1.1

{
  "messages": [{"role": "user", "content": "What's the weather in San Francisco?"}],
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "parameters": {"type": "object", "properties": {"location": {"type": "string"}}}
    }
  }]
}

Response:

HTTP/1.1 200 OK

{
  "choices": [{
    "finish_reason": "tool_calls",
    "message": {
      "tool_calls": [{
        "function": {"name": "get_weather", "arguments": "{\"location\":\"San Francisco\"}"}
      }]
    }
  }]
}

Test 8: Sampling Parameters

Request:

POST /v1/chat/completions HTTP/1.1

{
  "messages": [{"role": "user", "content": "Say hello"}],
  "temperature": 0.7,
  "top_p": 0.9
}

Response:

HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Hello! 👋 How can I help you today?"}
  }]
}

Test 9: Authentication Error Handling

Subtest A: Invalid API Key

Request:

POST /v1/chat/completions HTTP/1.1
x-llamastack-provider-data: {"aws_bedrock_api_key": "invalid-fake-key-12345"}

{"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}

Response:

HTTP/1.1 400 Bad Request

{
  "detail": "Invalid value: Authentication failed: Error code: 401 - {'error': {'message': 'Invalid API Key format: Must start with pre-defined prefix', ...}}"
}

Subtest B: Empty API Key (Fallback to Config)

Request:

POST /v1/chat/completions HTTP/1.1
x-llamastack-provider-data: {"aws_bedrock_api_key": ""}

{"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}

Response:

HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Hello! How can I assist you today?"}
  }]
}

Fell back to config key


Subtest C: Malformed Token

Request:

POST /v1/chat/completions HTTP/1.1
x-llamastack-provider-data: {"aws_bedrock_api_key": "not-a-valid-bedrock-token-format"}

{"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}

Response:

HTTP/1.1 400 Bad Request

{
  "detail": "Invalid value: Authentication failed: Error code: 401 - {'error': {'message': 'Invalid API Key format: Must start with pre-defined prefix', ...}}"
}

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 8, 2025
@skamenan7 skamenan7 force-pushed the feat-3410-bedrock-openai-compatible-provider branch 2 times, most recently from 59d4cfa to e4d71e7 Compare October 9, 2025 12:34
Copy link
Collaborator

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use this one as a reference #3707

@skamenan7 skamenan7 force-pushed the feat-3410-bedrock-openai-compatible-provider branch 5 times, most recently from 14919c1 to 3a9af0c Compare October 9, 2025 17:44
@skamenan7 skamenan7 marked this pull request as ready for review October 9, 2025 21:09
@skamenan7 skamenan7 requested a review from leseb October 9, 2025 21:09
Copy link
Collaborator

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please report the result of the tests/integration/inference/test_openai_completion.py and other openai related tested.

Also why has the uv.lock changed?

Copy link
Collaborator

@mattf mattf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • nothing from models.py is used, please remove it
  • is the /v1/embeddings endpoint available? if not, add a NotImplementedError stub
  • is the /v1/compltions endpoint available? if not...
  • great find wrt telemetry and stream usage, after this pr we should consider adding that nugget to the mixin for all providers

@skamenan7 skamenan7 force-pushed the feat-3410-bedrock-openai-compatible-provider branch from 3a9af0c to 56bff11 Compare October 10, 2025 14:04
@skamenan7 skamenan7 requested review from leseb and mattf October 10, 2025 14:05
@skamenan7 skamenan7 force-pushed the feat-3410-bedrock-openai-compatible-provider branch from 56bff11 to 7024e56 Compare October 13, 2025 20:19
Implements AWS Bedrock inference provider using OpenAI-compatible endpoint
for Llama models available through Bedrock.

Changes:
- Add BedrockInferenceAdapter using OpenAIMixin base
- Configure region-specific endpoint URLs
- Add NotImplementedError stubs for unsupported endpoints
- Implement authentication error handling with helpful messages
- Remove unused models.py file
- Add comprehensive unit tests (12 total)
- Add provider registry configuration
@skamenan7 skamenan7 force-pushed the feat-3410-bedrock-openai-compatible-provider branch from 7024e56 to 55aaa6e Compare October 13, 2025 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AWS Bedrock OpenAI-Compatible Provider

3 participants