Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InferenceClient alignment with base_url as in OpenAI client #2414

Closed
alvarobartt opened this issue Jul 24, 2024 · 0 comments · Fixed by #2418
Closed

InferenceClient alignment with base_url as in OpenAI client #2414

alvarobartt opened this issue Jul 24, 2024 · 0 comments · Fixed by #2418
Assignees
Labels
bug Something isn't working

Comments

@alvarobartt
Copy link
Member

Describe the bug

Hi here @Wauplin!

I've just experimented a bit with the InferenceClient and the base_url is not provided as <URL>/v1 as in the OpenAI client, but as <URL> which is not fully compatible with OpenAI where we do need to provide the URL including the /v1 endpoint path.

For a better compatibility and seamless integration with the OpenAI client, we could allow the base_url to be provided including the /v1 endpoint path, but removing it if provided, or something like that. I'm unsure about the potential issues of stripping the provided base_url tho.

Reproduction

import os
# Instead of `from openai import OpenAI`
from huggingface_hub import InferenceClient

# Instead of `client = OpenAI(base_url="http://0.0.0.0:8080/v1", api_key=os.getenv("OPENAI_API_KEY"))`
client = InferenceClient(base_url="http://0.0.0.0:8080/v1", api_key=os.getenv("HF_TOKEN", "-"))

chat_completion = client.chat.completions.create(
  # Instead of `model="tgi"`
  model="hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is Deep Learning?"},
  ],
  max_tokens=128,
)

Which is solved at the moment as follows:

- client = InferenceClient(base_url="http://0.0.0.0:8080/v1", api_key=os.getenv("HF_TOKEN", "-"))
+ client = InferenceClient(base_url="http://0.0.0.0:8080", api_key=os.getenv("HF_TOKEN", "-"))

Logs

Raises the following error:


huggingface_hub.utils._errors.HfHubHTTPError: 404 Client Error: Not Found for url: http://0.0.0.0:8080/v1/v1/chat/completions

System info

- huggingface_hub version: 0.24.1
- Platform: Linux-6.5.0-1022-aws-x86_64-with-glibc2.35
- Python version: 3.10.12
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Token path ?: /home/ubuntu/.cache/huggingface/token
- Has saved token ?: False
- Configured git credential helpers:
- FastAI: N/A
- Tensorflow: N/A
- Torch: N/A
- Jinja2: 3.1.4
- Graphviz: N/A
- keras: N/A
- Pydot: N/A
- Pillow: 10.4.0
- hf_transfer: N/A
- gradio: N/A
- tensorboard: N/A
- numpy: N/A
- pydantic: 2.8.2
- aiohttp: 3.9.5
- ENDPOINT: https://huggingface.co
- HF_HUB_CACHE: /home/ubuntu/.cache/huggingface/hub
- HF_ASSETS_CACHE: /home/ubuntu/.cache/huggingface/assets
- HF_TOKEN_PATH: /home/ubuntu/.cache/huggingface/token
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
- HF_HUB_ETAG_TIMEOUT: 10
- HF_HUB_DOWNLOAD_TIMEOUT: 10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants