Overview

The project implements AI DIAL API for language models from AWS Bedrock.

Supported models

Chat completion models

The following models support POST SERVER_URL/openai/deployments/DEPLOYMENT_NAME/chat/completions endpoint along with an optional support of POST /tokenize and POST /truncate_prompt endpoints:

Note that a model supports /truncate_prompt endpoint if and only if it supports max_prompt_tokens request parameter.

Vendor	Model	Deployment name	Modality	`/tokenize`	`/truncate_prompt`, `max_prompt_tokens`	tools/functions
Anthropic	Claude 3.7 Sonnet	us.anthropic.claude-3-7-sonnet-20250219-v1:0	(text/image)-to-text	🟡	🟡	✅
Anthropic	Claude 3.5 Sonnet	[us.\|eu.]anthropic.claude-3-5-sonnet-20240620-v1:0	(text/image)-to-text	🟡	🟡	✅
Anthropic	Claude 3.5 Sonnet 2.0	[us.]anthropic.claude-3-5-sonnet-20241022-v2:0	(text/image)-to-text	🟡	🟡	✅
Anthropic	Claude 3 Sonnet	[us.\|eu.]anthropic.claude-3-sonnet-20240229-v1:0	(text/image)-to-text	🟡	🟡	✅
Anthropic	Claude 3 Haiku	[us.\|eu.]anthropic.claude-3-haiku-20240307-v1:0	(text/image)-to-text	🟡	🟡	✅
Anthropic	Claude 3.5 Haiku	[us.]anthropic.claude-3-5-haiku-20241022-v1:0	text-to-text	🟡	🟡	✅
Anthropic	Claude 3 Opus	[us.]anthropic.claude-3-opus-20240229-v1:0	(text/image)-to-text	🟡	🟡	✅
Anthropic	Claude 2.1	anthropic.claude-v2:1	text-to-text	✅	✅	✅
Anthropic	Claude 2	anthropic.claude-v2	text-to-text	✅	✅	❌
Anthropic	Claude Instant 1.2	anthropic.claude-instant-v1	text-to-text	🟡	🟡	❌
Meta	Llama 3.3 70B Instruct	meta.llama3-3-70b-instruct-v1:0	text-to-text	🟡	🟡	✅
Meta	Llama 3.2 90B Instruct	us.meta.llama3-2-90b-instruct-v1:0	(text/image)-to-text	🟡	🟡	✅
Meta	Llama 3.2 11B Instruct	us.meta.llama3-2-11b-instruct-v1:0	(text/image)-to-text	🟡	🟡	❌
Meta	Llama 3.2 3B Instruct	us.meta.llama3-2-3b-instruct-v1:0	text-to-text	🟡	🟡	❌
Meta	Llama 3.2 1B Instruct	us.meta.llama3-2-1b-instruct-v1:0	text-to-text	🟡	🟡	❌
Meta	Llama 3.1 405B Instruct	[us.]meta.llama3-1-405b-instruct-v1:0	text-to-text	🟡	🟡	✅
Meta	Llama 3.1 70B Instruct	[us.]meta.llama3-1-70b-instruct-v1:0	text-to-text	🟡	🟡	✅
Meta	Llama 3.1 8B Instruct	meta.llama3-1-8b-instruct-v1:0	text-to-text	🟡	🟡	❌
Meta	Llama 3 Chat 70B Instruct	meta.llama3-70b-instruct-v1:0	text-to-text	🟡	🟡	❌
Meta	Llama 3 Chat 8B Instruct	meta.llama3-8b-instruct-v1:0	text-to-text	🟡	🟡	❌
Stability AI	SDXL 1.0	stability.stable-diffusion-xl-v1	text-to-image	❌	🟡	❌
Stability AI	SD3 Large 1.0	stability.sd3-large-v1:0	(text/image)-to-image	❌	🟡	❌
Stability AI	Stable Image Ultra 1.0	stability.stable-image-ultra-v1:0	text-to-image	❌	🟡	❌
Stability AI	Stable Image Core 1.0	stability.stable-image-core-v1:0	text-to-image	❌	🟡	❌
Amazon	Titan Text G1 - Express	amazon.titan-tg1-large	text-to-text	🟡	🟡	❌
Amazon	Nova Pro	amazon.nova-pro-v1:0	(text/image/document)-to-text	🟡	🟡	✅
Amazon	Nova Lite	amazon.nova-lite-v1:0	(text/image/document)-to-text	🟡	🟡	✅
Amazon	Nova Micro	amazon.nova-micro-v1:0	text-to-text	🟡	🟡	❌
AI21 Labs	Jurassic-2 Ultra	ai21.j2-jumbo-instruct	text-to-text	🟡	🟡	❌
AI21 Labs	Jurassic-2 Ultra v1	ai21.j2-ultra-v1	text-to-text	🟡	🟡	❌
AI21 Labs	Jurassic-2 Mid	ai21.j2-grande-instruct	text-to-text	🟡	🟡	❌
AI21 Labs	Jurassic-2 Mid v1	ai21.j2-mid-v1	text-to-text	🟡	🟡	❌
Cohere	Command	cohere.command-text-v14	text-to-text	🟡	🟡	❌
Cohere	Command Light	cohere.command-light-text-v14	text-to-text	🟡	🟡	❌

✅, 🟡, and ❌ denote degrees of support of the given feature:

	`/tokenize`, `/truncate_prompt`, `max_prompt_token`	tools/functions
✅	Fully supported via an official tokenization algorithm	Fully supported via native tools API or official prompts to enable tools
🟡	Partially supported, because tokenization algorithm wasn't made public by the model vendor. An approximate tokenization algorithm is used instead. It conservatively counts every byte in UTF-8 encoding of a string as a single token.	Partially supported, because the model doesn't support tools natively. Prompt engineering is used instead to emulate tools, which may not be very reliable.
❌	Not supported	Not supported

Embedding models

The following models support SERVER_URL/openai/deployments/DEPLOYMENT_NAME/embeddings endpoint:

Model	Deployment name	Modality
Titan Multimodal Embeddings Generation 1 (G1)	amazon.titan-embed-image-v1	image/text-to-embedding
Amazon Titan Text Embeddings V2	amazon.titan-embed-text-v2:0	text-to-embedding
Titan Embeddings G1 – Text v1.2	amazon.titan-embed-text-v1	text-to-embedding
Cohere Embed English	cohere.embed-english-v3	text-to-embedding
Cohere Multilingual	cohere.embed-multilingual-v3	text-to-embedding

Developer environment

This project uses Python>=3.11 and Poetry>=1.6.1 as a dependency manager.

Check out Poetry's documentation on how to install it on your system before proceeding.

To install requirements:

poetry install

This will install all requirements for running the package, linting, formatting and tests.

IDE configuration

The recommended IDE is VSCode. Open the project in VSCode and install the recommended extensions.

The VSCode is configured to use PEP-8 compatible formatter Black.

Alternatively you can use PyCharm.

Set-up the Black formatter for PyCharm manually or install PyCharm>=2023.2 with built-in Black support.

Run

Run the development server:

make serve

Open localhost:5001/docs to make sure the server is up and running.

Environment Variables

Copy .env.example to .env and customize it for your environment:

Variable	Default	Description
AWS_ACCESS_KEY_ID	NA	AWS credentials with access to Bedrock service
AWS_SECRET_ACCESS_KEY	NA	AWS credentials with access to Bedrock service
AWS_DEFAULT_REGION		AWS region e.g. `us-east-1`
AWS_ASSUME_ROLE_ARN		AWS assume role arn e.g. `arn:aws:iam::123456789012:role/RoleName`
LOG_LEVEL	INFO	Log level. Use DEBUG for dev purposes and INFO in prod
AIDIAL_LOG_LEVEL	WARNING	AI DIAL SDK log level
DIAL_URL		URL of the core DIAL server. If defined, images generated by Stability are uploaded to the DIAL file storage and attachments are returned with URLs pointing to the images. Otherwise, the images are returned as base64 encoded strings.
WEB_CONCURRENCY	1	Number of workers for the server
COMPATIBILITY_MAPPING	{}	A JSON dictionary that maps Bedrock deployments that aren't supported by the Adapter to the Bedrock deployments that are supported by the Adapter (see the Supported models section). Find more details in the compatibility mode section.

Running unsupported Bedrock models in the compatibility mode

The Adapter supports a predefined list of AWS Bedrock deployments. The Supported models section lists the models. These models could be accessed via /openai/deployments/{deployment_name}/(chat_completions|embeddings) endpoints. The Adapter won't recognize any other deployment name and will result in 404 error.

Now, suppose AWS Bedrock released a new version of a model, e.g. anthropic.claude-3-5-sonnet-20250210-v3:0 which is a better version of an older anthropic.claude-3-5-sonnet-20241022-v2:0 model.

Immediately after the release, the former model is unsupported by the Adapter, but the latter is supported. Therefore, the request to openai/deployments/anthropic.claude-3-5-sonnet-20250210-v3:0/chat/completions will result in 404 error.

It will take some time for the Adapter to catch up with AWS Bedrock - support the v3 model and publish the release with the fix.

What to do in the meantime? Presumably, the v3 model is backward compatible with v2, so we may try to run v3 in the compatibility mode - that is to convince the Adapter to process v3 request as if it's v2 request with the only difference that the final upstream request to AWS Bedrock will be to v3 and not v2.

The COMPATIBILITY_MAPPING env variable enables exactly this scenario.

When it's defined like this:

COMPATIBILITY_MAPPING={"anthropic.claude-3-5-sonnet-20250210-v3:0": "anthropic.claude-3-5-sonnet-20241022-v2:0"}

the Adapter will be able to handle requests to anthropic.claude-3-5-sonnet-20250210-v3:0 deployment. The requests will be processed by the same pipeline as anthropic.claude-3-5-sonnet-20241022-v2:0, but the call to AWS Bedrock will be done to anthropic.claude-3-5-sonnet-20250210-v3:0 deployment name.

Naturally, this will only work if the APIs of v2 and v3 deployments are compatible:

The requests utilizing the modalities supported by both v2 and v3 will work just fine.
However, the requests with modalities that are supported by v3 (e.g. audio) and aren't supported by v2, won't be processed correctly. You will have to wait until the Adapter supports the v3 deployment natively.

When a version of the Adapter supporting the v3 model is released, you may migrate to it and safely remove the entry from the COMPATIBILITY_MAPPING dictionary.

Note that a mapping such as this one would be ineffectual:

COMPATIBILITY_MAPPING={"anthropic.claude-3-5-sonnet-20250210-v3:0": "stability.stable-image-ultra-v1:0"}

since the APIs and capabilities of these two models are drastically different.

Load balancing

If you use DIAL Core load balancing mechanism, you can provide extraData upstream setting with different AWS account credentials/regions to use different model deployments:

{
  "upstreams": [
    {
      "extraData": {
        "region": "eu-west-1",
        "aws_access_key_id": "key_id_1",
        "aws_secret_access_key": "access_key_1"
      }
    },
    {
      "extraData": {
        "region": "eu-west-1",
        "aws_access_key_id": "key_id_2",
        "aws_secret_access_key": "access_key_2"
      }
    },
    {
      "extraData": {
        "region": "eu-west-1",
        "aws_assume_role_arn": "arn:aws:iam::123456789012:role/BedrockAccessAdapterRoleName"
      }
    }
  ]
}

Supported extraData fields:

region
aws_access_key_id
aws_secret_access_key
aws_assume_role_arn

Docker

Run the server in Docker:

make docker_serve

Lint

Run the linting before committing:

make lint

To auto-fix formatting issues run:

make format

Test

Run unit tests locally:

make test

Run unit tests in Docker:

make docker_test

Run integration tests locally:

make integration_tests

Clean

To remove the virtual environment and build artifacts:

make clean

Name	Name	Last commit message	Last commit date
Latest commit dependabot[bot] chore: bump jinja2 from 3.1.5 to 3.1.6 (#233 ) Mar 7, 2025 28fbf12 · Mar 7, 2025 History 177 Commits
.github	.github	chore: add GitHub issue and pull request templates [skip ci]	Feb 14, 2025
.vscode	.vscode	chore: added formatter for TOML files (#211 )	Jan 9, 2025
aidial_adapter_bedrock	aidial_adapter_bedrock	feat: add Meta models: Llama 3.3 70B Instruct, Llama 3.1 405B Instruc…	Mar 6, 2025
scripts	scripts	feat: migrated latest fixes (#23 )	Nov 10, 2023
tests	tests	feat: add Meta models: Llama 3.3 70B Instruct, Llama 3.1 405B Instruc…	Mar 6, 2025
.dockerignore	.dockerignore	feat: initial commit	Oct 10, 2023
.env.example	.env.example	feat: introduce compatibility mode for unsupported Bedrock deployment…	Dec 18, 2024
.flake8	.flake8	chore: bump black from 23.3.0 to 24.3.0 (#79 )	Mar 26, 2024
.gitignore	.gitignore	chore: added pytest-html (#206 )	Jan 7, 2025
.ort.yml	.ort.yml	feat: migrated to the latest DIAL SDK (#127 )	Jul 26, 2024
CONTRIBUTING.md	CONTRIBUTING.md	chore: added CONTRIBUTING.md (#27 )	Nov 13, 2023
Dockerfile	Dockerfile	chore: pin poetry to 1.8.5 (#210 )	Jan 9, 2025
Dockerfile.test	Dockerfile.test	chore: pin poetry to 1.8.5 (#210 )	Jan 9, 2025
LICENSE	LICENSE	feat: initial commit	Oct 10, 2023
Makefile	Makefile	chore: bump epam/ai-dial-ci from 1.10.2 to 1.11.0 (#216 )	Jan 17, 2025
README.md	README.md	feat: add Meta models: Llama 3.3 70B Instruct, Llama 3.1 405B Instruc…	Mar 6, 2025
SECURITY.md	SECURITY.md	chore: GitHub workflow update (#64 )	Feb 1, 2024
noxfile.py	noxfile.py	fix: fixed integration tests (#147 )	Oct 30, 2024
poetry.lock	poetry.lock	chore: bump jinja2 from 3.1.5 to 3.1.6 (#233 )	Mar 7, 2025
poetry.toml	poetry.toml	feat: initial commit	Oct 10, 2023
pyproject.toml	pyproject.toml	[skip ci] Update version	Mar 6, 2025
trivy.yaml	trivy.yaml	chore: cleanup untagged images (#179 )	Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Supported models

Chat completion models

Embedding models

Developer environment

IDE configuration

Run

Environment Variables

Running unsupported Bedrock models in the compatibility mode

Load balancing

Docker

Lint

Test

Clean

About

Releases 31

Packages 1

Contributors 13

Languages

License

epam/ai-dial-adapter-bedrock

Folders and files

Latest commit

History

Repository files navigation

Overview

Supported models

Chat completion models

Embedding models

Developer environment

IDE configuration

Run

Environment Variables

Running unsupported Bedrock models in the compatibility mode

Load balancing

Docker

Lint

Test

Clean

About

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases 31

Packages 1

Contributors 13

Languages