Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix InferenceClient for HF Nvidia NIM API #2482

Merged
merged 2 commits into from
Aug 26, 2024

Conversation

Wauplin
Copy link
Contributor

@Wauplin Wauplin commented Aug 22, 2024

Fix #2480.

Two things in this PR (for chat_completion):

  • the model value was not set correctly in the payload. This is now fixed.
  • None values are not sent to the server anymore. Only values from the user are forwarded.

With this, it is now possible to use the HF Nvidia NIM API using InferenceClient:

from huggingface_hub import InferenceClient


# instead of `client = OpenAI(...)`
client = InferenceClient(
    base_url="https://huggingface.co/api/integrations/dgx/v1",
    api_key="MY_FINEGRAINED_TOKEN"
)

output = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Count to 10"},
    ],
    stream=True,
    max_tokens=1024,
)

for chunk in output:
    print(chunk.choices[0].delta.content, end="")
Here it goes:

1, 2, 3, 4, 5, 6, 7, 8, 9, 10!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@MoritzLaurer MoritzLaurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! installed from the branch and confirm that it works.

Also see that it works directly with client.chat_completion just like client.chat.completions.create

@Wauplin
Copy link
Contributor Author

Wauplin commented Aug 26, 2024

Let's get this merge! Thanks for the reviews :)

@Wauplin Wauplin merged commit 6e9e4e4 into main Aug 26, 2024
15 of 17 checks passed
@Wauplin Wauplin deleted the 2480-fix-inference-client-for-nim branch August 26, 2024 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

How to use the HF Nvidia NIM API with the HF inference client?
4 participants