Skip to content

Feat/self hosted models #213

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 43 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
057e4fb
chore: Provider Unit Tests (#173)
MiNeves00 Dec 18, 2024
2cd3f94
[fix] bump prerelease version in pyproject.toml
actions-user Dec 18, 2024
cdeb84f
chore: rename action
diogoncalves Dec 18, 2024
f9feb73
feat: added action to run tests on PR
diogoncalves Dec 18, 2024
95c1723
chore: comments
diogoncalves Dec 18, 2024
8ea43c6
fix: fix azure config tests
diogoncalves Dec 18, 2024
9f42faa
chore: style format
diogoncalves Dec 18, 2024
91f232e
fix: tests workflow
diogoncalves Dec 18, 2024
ee7c373
Feature/prompt management (#200)
brunoalho99 Jan 23, 2025
549e388
[fix] bump prerelease version in pyproject.toml
actions-user Jan 23, 2025
c6cdad0
[bugfix] return empty prompt
brunoalho99 Jan 29, 2025
8d83a45
[bugfix] return empty prompt (#201)
brunoalho99 Jan 29, 2025
2a5b4fb
[fix] bump prerelease version in pyproject.toml
actions-user Jan 29, 2025
d330303
Update CONTRIBUTING.md
diogoncalves Jan 29, 2025
21e9352
Feat/ Use Openai Usage to calculate Cache and Reasoning Costs (#199)
MiNeves00 Jan 30, 2025
c23dfae
chore: update poetry.lock
diogoncalves Jan 30, 2025
2660006
chore: specify python versions
diogoncalves Jan 30, 2025
8fbe717
chore: moving langchain integration tests to sdk
diogoncalves Jan 30, 2025
29e39d8
chore: format
diogoncalves Jan 30, 2025
39443df
feat: added support for o3-mini and updated o1-mini prices. also upda…
MiNeves00 Feb 5, 2025
9610bcb
chore: removed duplicated code; removed duplicated integration tests
MiNeves00 Feb 10, 2025
7b9a866
chore: updated github actions to run integration tests
MiNeves00 Feb 10, 2025
c231e91
chore: fixing github actions
MiNeves00 Feb 10, 2025
0503810
chore: fixing github actions again
MiNeves00 Feb 10, 2025
30e4fa6
chore: fixing github actions again-x2
MiNeves00 Feb 10, 2025
91562b3
chore: fixing github actions again-x2
MiNeves00 Feb 10, 2025
545d990
chore: added cache of dependencies to integration-tests in githubaction
MiNeves00 Feb 10, 2025
ddc250f
chore: updated integration-tests action to inject github secrets into…
MiNeves00 Feb 10, 2025
83d7b55
Feat/bedrock support for Nova models through the ConverseAPI (#207)
MiNeves00 Feb 12, 2025
74e6b4f
[fix] bump prerelease version in pyproject.toml
actions-user Feb 12, 2025
7840ef4
[fix] bump prerelease version in pyproject.toml
actions-user Feb 12, 2025
65d0f22
[fix] bump prerelease version in pyproject.toml
actions-user Feb 12, 2025
162f1af
Update pyproject.toml
MiNeves00 Feb 12, 2025
10a604d
[fix] bump prerelease version in pyproject.toml
actions-user Feb 12, 2025
92223ea
chore: updated llmstudio sdk poetry.lock
MiNeves00 Feb 12, 2025
b3cbc53
feat: add self hosted models
diogoncalves Feb 26, 2025
1cd859e
[feat] self-hosted cleaned
brunoalho99 Feb 26, 2025
85133f7
chore: added llama from azure
diogoncalves Feb 26, 2025
152f4ef
[feat] cleaned self-hosted and azure providers
brunoalho99 Feb 28, 2025
2a9ae25
Feat/converse support images (#211)
MiNeves00 Mar 3, 2025
986e3e8
[fix] bump prerelease version in pyproject.toml
actions-user Mar 3, 2025
5d3b434
Merge branch 'develop' into feat/self-hosted-models
diogoncalves Mar 4, 2025
96e926a
fix: fix import on azure
diogoncalves Mar 5, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
166 changes: 159 additions & 7 deletions examples/_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,11 @@ providers:
max_tokens: 200000
input_token_cost: 0.000003
output_token_cost: 0.000015
claude-3-sonnet:
mode: chat
max_tokens: 200000
input_token_cost: 0.000003
output_token_cost: 0.000015
claude-3-haiku-20240307:
mode: chat
max_tokens: 200000
Expand Down Expand Up @@ -66,18 +71,134 @@ providers:
min: 0
max: 500
step: 1
ollama:
id: ollama
name: Ollama
bedrock:
id: bedrock
name: Bedrock ConverseAPI
chat: true
embed: true
keys:
- BEDROCK_SECRET_KEY
- BEDROCK_ACCESS_KEY
- BEDROCK_REGION
models:
llama2:
anthropic.claude-3-sonnet-20240229-v1:0:
mode: chat
max_tokens: 200000
input_token_cost: 0.000003
output_token_cost: 0.000015
anthropic.claude-3-5-sonnet-20240620-v1:0:
mode: chat
max_tokens: 200000
input_token_cost: 0.000003
output_token_cost: 0.000015
anthropic.claude-3-5-sonnet-20241022-v2:0:
mode: chat
max_tokens: 200000
input_token_cost: 0.000003
output_token_cost: 0.000015
anthropic.claude-3-haiku-20240307-v1:0:
mode: chat
max_tokens: 200000
input_token_cost: 0.00000025
output_token_cost: 0.00000125
anthropic.claude-3-5-haiku-20241022-v1:0:
mode: chat
max_tokens: 200000
input_token_cost: 0.000001
output_token_cost: 0.000005
anthropic.claude-3-opus-20240229-v1:0:
mode: chat
max_tokens: 200000
input_token_cost: 0.000015
output_token_cost: 0.000075
anthropic.claude-instant-v1:
mode: chat
max_tokens: 0
max_tokens: 100000
input_token_cost: 0.0000008
output_token_cost: 0.000024
anthropic.claude-v2:
mode: chat
max_tokens: 100000
input_token_cost: 0.000008
output_token_cost: 0.000024
anthropic.claude-v2:1:
mode: chat
max_tokens: 100000
input_token_cost: 0.000008
output_token_cost: 0.000024
us.amazon.nova-pro-v1:0:
mode: chat
max_tokens: 300000
input_token_cost: 0.0000008
output_token_cost: 0.0000016
us.amazon.nova-lite-v1:0:
mode: chat
max_tokens: 300000
input_token_cost: 0.00000006
output_token_cost: 0.00000012
us.amazon.nova-micro-v1:0:
mode: chat
max_tokens: 128000
input_token_cost: 0.000000035
output_token_cost: 0.00000007

parameters:
temperature:
name: "Temperature"
type: float
default: 1
min: 0
max: 1
step: 0.01
max_tokens:
name: "Maximum tokens"
type: float
default: 256
min: 1
max: 4096
step: 0.01
top_p:
name: "Top P"
type: float
default: 1
min: 0
max: 1
step: 0.01
top_k:
name: "Top K"
type: float
default: 5
min: 0
max: 500
step: 1
self-hosted:
id: self-hosted
name: Self Hosted
chat: true
embed: true
keys:
models:
deepseek-r1:1.5b:
mode: chat
max_tokens: 200000
input_token_cost: 0
output_token_cost: 0

deepseek-r1-tool-calling:
mode: chat
max_tokens: 128000
input_token_cost: 0
output_token_cost: 0
llama3.2:
mode: chat
max_tokens: 200000
input_token_cost: 0
output_token_cost: 0
Llama-3-3-70B-Instruct-llmstudio:
mode: chat
max_tokens: 200000
input_token_cost: 0.00000071
output_token_cost: 0.00000071
parameters:
temperature:
name: "Temperature"
Expand Down Expand Up @@ -115,6 +236,24 @@ providers:
keys:
- OPENAI_API_KEY
models:
o1-preview:
mode: chat
max_completion_tokens: 128000
input_token_cost: 0.000015
cached_token_cost: 0.0000075
output_token_cost: 0.000060
o1-mini:
mode: chat
max_completion_tokens: 128000
input_token_cost: 0.0000011
cached_token_cost: 0.00000055
output_token_cost: 0.0000044
o3-mini:
mode: chat
max_completion_tokens: 200000
input_token_cost: 0.0000011
cached_token_cost: 0.00000055
output_token_cost: 0.0000044
o1-preview:
mode: chat
max_completion_tokens: 128000
Expand Down Expand Up @@ -204,6 +343,18 @@ providers:
- AZURE_API_ENDPOINT
- AZURE_API_VERSION
models:
o1-preview:
mode: chat
max_completion_tokens: 128000
input_token_cost: 0.0000165
cached_token_cost: 0.00000825
output_token_cost: 0.000066
o1-mini:
mode: chat
max_completion_tokens: 128000
input_token_cost: 0.0000033
cached_token_cost: 0.00000165
output_token_cost: 0.0000132
gpt-4o-mini:
mode: chat
max_tokens: 128000
Expand All @@ -212,8 +363,9 @@ providers:
gpt-4o:
mode: chat
max_tokens: 128000
input_token_cost: 0.000005
output_token_cost: 0.000015
input_token_cost: 0.0000025
cached_token_cost: 0.00000125
output_token_cost: 0.00001
gpt-4-turbo:
mode: chat
max_tokens: 128000
Expand Down
91 changes: 69 additions & 22 deletions examples/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,12 @@
from pprint import pprint
import os
import asyncio
import asyncio
from dotenv import load_dotenv
load_dotenv()

def run_provider(provider, model, api_key=None, **kwargs):
def run_provider(provider, model, api_key=None=None, **kwargs):
print(f"\n\n###RUNNING for <{provider}>, <{model}> ###")
print(f"\n\n###RUNNING for <{provider}>, <{model}> ###")
llm = LLMCore(provider=provider, api_key=api_key, **kwargs)

Expand Down Expand Up @@ -58,7 +60,7 @@ def run_provider(provider, model, api_key=None, **kwargs):

print("\nAsync Stream")
async def async_stream():
chat_request = build_chat_request(model, chat_input="Hello, my name is Tom Json", is_stream=True)
chat_request = build_chat_request(model, chat_input="Hello, my name is Tom", is_stream=True)

response_async = await llm.achat(**chat_request)
async for p in response_async:
Expand All @@ -74,15 +76,15 @@ async def async_stream():


print("\nSync Non-Stream")
chat_request = build_chat_request(model, chat_input="Hello, my name is Alice Json", is_stream=False)
chat_request = build_chat_request(model, chat_input="Hello, my name is Alice", is_stream=False)

response_sync = llm.chat(**chat_request)
pprint(response_sync)
latencies["sync (ms)"]= response_sync.metrics["latency_s"]*1000


print("\nSync Stream")
chat_request = build_chat_request(model, chat_input="Hello, my name is Mary Json", is_stream=True)
chat_request = build_chat_request(model, chat_input="Hello, my name is Mary", is_stream=True)

response_sync_stream = llm.chat(**chat_request)
for p in response_sync_stream:
Expand Down Expand Up @@ -126,7 +128,6 @@ def build_chat_request(model: str, chat_input: str, is_stream: bool, max_tokens:
"parameters": {
"temperature": 0,
"max_tokens": max_tokens,
"response_format": {"type": "json_object"},
"functions": None,
}
}
Expand All @@ -138,29 +139,75 @@ def multiple_provider_runs(provider:str, model:str, num_runs:int, api_key:str, *
latencies = run_provider(provider=provider, model=model, api_key=api_key, **kwargs)
pprint(latencies)


def run_chat_all_providers():
# OpenAI
multiple_provider_runs(provider="openai", model="gpt-4o-mini", api_key=os.environ["OPENAI_API_KEY"], num_runs=1)
multiple_provider_runs(provider="openai", model="o3-mini", api_key=os.environ["OPENAI_API_KEY"], num_runs=1)
#multiple_provider_runs(provider="openai", model="o1-preview", api_key=os.environ["OPENAI_API_KEY"], num_runs=1)


# OpenAI
multiple_provider_runs(provider="openai", model="gpt-4o-mini", api_key=os.environ["OPENAI_API_KEY"], num_runs=1)
multiple_provider_runs(provider="openai", model="o3-mini", api_key=os.environ["OPENAI_API_KEY"], num_runs=1)
#multiple_provider_runs(provider="openai", model="o1-preview", api_key=os.environ["OPENAI_API_KEY"], num_runs=1)
# Azure
multiple_provider_runs(provider="azure", model="gpt-4o-mini", num_runs=1, api_key=os.environ["AZURE_API_KEY"], api_version=os.environ["AZURE_API_VERSION"], api_endpoint=os.environ["AZURE_API_ENDPOINT"])
#multiple_provider_runs(provider="azure", model="gpt-4o", num_runs=1, api_key=os.environ["AZURE_API_KEY"], api_version=os.environ["AZURE_API_VERSION"], api_endpoint=os.environ["AZURE_API_ENDPOINT"])
#multiple_provider_runs(provider="azure", model="o1-mini", num_runs=1, api_key=os.environ["AZURE_API_KEY"], api_version=os.environ["AZURE_API_VERSION"], api_endpoint=os.environ["AZURE_API_ENDPOINT"])
#multiple_provider_runs(provider="azure", model="o1-preview", num_runs=1, api_key=os.environ["AZURE_API_KEY"], api_version=os.environ["AZURE_API_VERSION"], api_endpoint=os.environ["AZURE_API_ENDPOINT"])


# Azure
multiple_provider_runs(provider="azure", model="gpt-4o-mini", num_runs=1, api_key=os.environ["AZURE_API_KEY"], api_version=os.environ["AZURE_API_VERSION"], api_endpoint=os.environ["AZURE_API_ENDPOINT"])
#multiple_provider_runs(provider="azure", model="gpt-4o", num_runs=1, api_key=os.environ["AZURE_API_KEY"], api_version=os.environ["AZURE_API_VERSION"], api_endpoint=os.environ["AZURE_API_ENDPOINT"])
#multiple_provider_runs(provider="azure", model="o1-mini", num_runs=1, api_key=os.environ["AZURE_API_KEY"], api_version=os.environ["AZURE_API_VERSION"], api_endpoint=os.environ["AZURE_API_ENDPOINT"])
#multiple_provider_runs(provider="azure", model="o1-preview", num_runs=1, api_key=os.environ["AZURE_API_KEY"], api_version=os.environ["AZURE_API_VERSION"], api_endpoint=os.environ["AZURE_API_ENDPOINT"])
#multiple_provider_runs(provider="anthropic", model="claude-3-opus-20240229", num_runs=1, api_key=os.environ["ANTHROPIC_API_KEY"])

#multiple_provider_runs(provider="azure", model="o1-preview", num_runs=1, api_key=os.environ["AZURE_API_KEY"], api_version=os.environ["AZURE_API_VERSION"], api_endpoint=os.environ["AZURE_API_ENDPOINT"])
#multiple_provider_runs(provider="azure", model="o1-mini", num_runs=1, api_key=os.environ["AZURE_API_KEY"], api_version=os.environ["AZURE_API_VERSION"], api_endpoint=os.environ["AZURE_API_ENDPOINT"])

#multiple_provider_runs(provider="anthropic", model="claude-3-opus-20240229", num_runs=1, api_key=os.environ["ANTHROPIC_API_KEY"])

#multiple_provider_runs(provider="azure", model="o1-preview", num_runs=1, api_key=os.environ["AZURE_API_KEY"], api_version=os.environ["AZURE_API_VERSION"], api_endpoint=os.environ["AZURE_API_ENDPOINT"])
#multiple_provider_runs(provider="azure", model="o1-mini", num_runs=1, api_key=os.environ["AZURE_API_KEY"], api_version=os.environ["AZURE_API_VERSION"], api_endpoint=os.environ["AZURE_API_ENDPOINT"])
multiple_provider_runs(provider="vertexai", model="gemini-1.5-flash", num_runs=1, api_key=os.environ["GOOGLE_API_KEY"])

# Bedrock
multiple_provider_runs(provider="bedrock", model="us.amazon.nova-lite-v1:0", num_runs=1, api_key=None, region=os.environ["BEDROCK_REGION"], secret_key=os.environ["BEDROCK_SECRET_KEY"], access_key=os.environ["BEDROCK_ACCESS_KEY"])
#multiple_provider_runs(provider="bedrock", model="anthropic.claude-3-5-sonnet-20241022-v2:0", num_runs=1, api_key=None, region=os.environ["BEDROCK_REGION"], secret_key=os.environ["BEDROCK_SECRET_KEY"], access_key=os.environ["BEDROCK_ACCESS_KEY"])

multiple_provider_runs(provider="vertexai", model="gemini-1.5-flash", num_runs=1, api_key=os.environ["GOOGLE_API_KEY"])
run_chat_all_providers()

# Bedrock
multiple_provider_runs(provider="bedrock", model="us.amazon.nova-lite-v1:0", num_runs=1, api_key=None, region=os.environ["BEDROCK_REGION"], secret_key=os.environ["BEDROCK_SECRET_KEY"], access_key=os.environ["BEDROCK_ACCESS_KEY"])
#multiple_provider_runs(provider="bedrock", model="anthropic.claude-3-5-sonnet-20241022-v2:0", num_runs=1, api_key=None, region=os.environ["BEDROCK_REGION"], secret_key=os.environ["BEDROCK_SECRET_KEY"], access_key=os.environ["BEDROCK_ACCESS_KEY"])

import base64

def messages(img_path):
"""
Creates a message payload with both text and image.
Adapts format based on the provider.
"""
with open(img_path, "rb") as f:
image_bytes = f.read()

base64_image = base64.b64encode(image_bytes).decode("utf-8")
return [
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
},
{
"type": "image_url",
"image_url": {"url": "https://awsmp-logos.s3.amazonaws.com/seller-zx4pk43qpmxoa/53d235806f343cec94aac3c577d81c13.png"},
},
],
}
]

def run_send_imgs():
provider="bedrock"
model="us.amazon.nova-lite-v1:0"
chat_input=messages(img_path="./libs/llmstudio/tests/integration_tests/test_data/llmstudio-logo.jpeg")
chat_request = build_chat_request(model=model, chat_input=chat_input, is_stream=False)
llm = LLMCore(provider=provider, api_key=os.environ["OPENAI_API_KEY"], region=os.environ["BEDROCK_REGION"], secret_key=os.environ["BEDROCK_SECRET_KEY"], access_key=os.environ["BEDROCK_ACCESS_KEY"])
response_sync = llm.chat(**chat_request)
#print(response_sync)
response_sync.clean_print()

#for p in response_sync:
# if p.metrics:
# p.clean_print()

run_send_imgs()
Loading
Loading