`InferenceClient` alignment with `base_url` as in `OpenAI` client #2414

alvarobartt · 2024-07-24T09:46:28Z

Describe the bug

I've just experimented a bit with the InferenceClient and the base_url is not provided as <URL>/v1 as in the OpenAI client, but as <URL> which is not fully compatible with OpenAI where we do need to provide the URL including the /v1 endpoint path.

For a better compatibility and seamless integration with the OpenAI client, we could allow the base_url to be provided including the /v1 endpoint path, but removing it if provided, or something like that. I'm unsure about the potential issues of stripping the provided base_url tho.

Reproduction

import os
# Instead of `from openai import OpenAI`
from huggingface_hub import InferenceClient

# Instead of `client = OpenAI(base_url="http://0.0.0.0:8080/v1", api_key=os.getenv("OPENAI_API_KEY"))`
client = InferenceClient(base_url="http://0.0.0.0:8080/v1", api_key=os.getenv("HF_TOKEN", "-"))

chat_completion = client.chat.completions.create(
  # Instead of `model="tgi"`
  model="hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is Deep Learning?"},
  ],
  max_tokens=128,
)

Which is solved at the moment as follows:

- client = InferenceClient(base_url="http://0.0.0.0:8080/v1", api_key=os.getenv("HF_TOKEN", "-"))
+ client = InferenceClient(base_url="http://0.0.0.0:8080", api_key=os.getenv("HF_TOKEN", "-"))

Logs

Raises the following error:


huggingface_hub.utils._errors.HfHubHTTPError: 404 Client Error: Not Found for url: http://0.0.0.0:8080/v1/v1/chat/completions

System info

- huggingface_hub version: 0.24.1
- Platform: Linux-6.5.0-1022-aws-x86_64-with-glibc2.35
- Python version: 3.10.12
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Token path ?: /home/ubuntu/.cache/huggingface/token
- Has saved token ?: False
- Configured git credential helpers:
- FastAI: N/A
- Tensorflow: N/A
- Torch: N/A
- Jinja2: 3.1.4
- Graphviz: N/A
- keras: N/A
- Pydot: N/A
- Pillow: 10.4.0
- hf_transfer: N/A
- gradio: N/A
- tensorboard: N/A
- numpy: N/A
- pydantic: 2.8.2
- aiohttp: 3.9.5
- ENDPOINT: https://huggingface.co
- HF_HUB_CACHE: /home/ubuntu/.cache/huggingface/hub
- HF_ASSETS_CACHE: /home/ubuntu/.cache/huggingface/assets
- HF_TOKEN_PATH: /home/ubuntu/.cache/huggingface/token
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
- HF_HUB_ETAG_TIMEOUT: 10
- HF_HUB_DOWNLOAD_TIMEOUT: 10

The text was updated successfully, but these errors were encountered:

alvarobartt added the bug Something isn't working label Jul 24, 2024

alvarobartt assigned Wauplin Jul 24, 2024

Wauplin mentioned this issue Jul 24, 2024

Fix chat completion url for OpenAI compatibility #2418

Merged

Wauplin closed this as completed in #2418 Jul 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`InferenceClient` alignment with `base_url` as in `OpenAI` client #2414

`InferenceClient` alignment with `base_url` as in `OpenAI` client #2414

alvarobartt commented Jul 24, 2024

InferenceClient alignment with base_url as in OpenAI client #2414

InferenceClient alignment with base_url as in OpenAI client #2414

Comments

alvarobartt commented Jul 24, 2024

Describe the bug

Reproduction

Logs

System info

`InferenceClient` alignment with `base_url` as in `OpenAI` client #2414

`InferenceClient` alignment with `base_url` as in `OpenAI` client #2414