What's New
- Support more MLX inference parameters, such as
adapter_path
,top_k
,min_tokens_to_keep
,min_p
,presence_penalty
, etc. - close #12
Usage Examples
OpenAI SDK:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:10240/v1", # MLX Omni Server endpoint
api_key="not-needed"
)
# Using extra_body adapter_path
response = client.chat.completions.create(
model="mlx-community/Llama-3.2-1B-Instruct-4bit",
messages=[
{"role": "user", "content": "What's the weather like today?"}
],
extra_body={
"adapter_path": "path/to/your/adapter", # Path to fine-tuned adapter
}
)
Curl :
curl http://localhost:10240/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "mlx-community/Llama-3.2-1B-Instruct-4bit",
"messages": [
{
"role": "user",
"content": "What'\''s the weather like today?"
}
],
"adapter_path": "path/to/your/adapter"
}'
Full Changelog: v0.3.0...v0.3.1