Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support asyncio with AsyncInferenceClient #1524

Merged
merged 47 commits into from
Jul 4, 2023
Merged

Conversation

Wauplin
Copy link
Contributor

@Wauplin Wauplin commented Jun 21, 2023

Discussed in slack (internal link). Adds a AsyncInferenceClient for async calls to the inference endpoints. File is generated automatically by a script. The goal is to start with this and reassess in the future (maybe drop the generation-script at some point if it's too much work to maintain vs too little benefit).

import asyncio
from huggingface_hub import AsyncInferenceClient

async def main():
    client = AsyncInferenceClient()

    print("\nText to image")
    image = await client.text_to_image("Astronaut riding a horse")
    print(image)

    print("\nText generation (100 tokens)")
    text = await client.text_generation(
        "Number between 1 and 1000: 1, 2, ", max_new_tokens=100, model="bigcode/starcoder"
    )
    print(text)

    print("\nText generation (5 tokens, stream + details)")
    async for token in await client.text_generation(
        "Number between 1 and 1000: 1, 2, ", max_new_tokens=5, model="bigcode/starcoder", stream=True, details=True
    ):
        print(token, flush=True)

loop = asyncio.new_event_loop()
loop.run_until_complete(main())
Text to image
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=512x512 at 0x7F4E739D5AB0>

Text generation (100 tokens)
3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,

Text generation (5 tokens, stream + details)
TextGenerationStreamResponse(token=Token(id=37, text='3', logprob=-0.14343262, special=False), generated_text=None, details=None)
TextGenerationStreamResponse(token=Token(id=30, text=',', logprob=-0.17370605, special=False), generated_text=None, details=None)
TextGenerationStreamResponse(token=Token(id=225, text=' ', logprob=-0.4597168, special=False), generated_text=None, details=None)
TextGenerationStreamResponse(token=Token(id=38, text='4', logprob=-0.1583252, special=False), generated_text=None, details=None)
TextGenerationStreamResponse(token=Token(id=30, text=',', logprob=-0.13146973, special=False), generated_text='3, 4,', details=StreamDetails(finish_reason=<FinishReason.Length: 'length'>, generated_tokens=5, seed=None))

TODO:

  • Add some tests. Ideally the same one as VCRed one but with asyncio? Do we need everything?
  • Documentation
  • Find a solution to tests with VCR + aiohttp + stream=True (2 tests impacted)

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jun 21, 2023

The documentation is not available anymore as the PR was closed or merged.

@codecov
Copy link

codecov bot commented Jun 27, 2023

Codecov Report

Patch coverage: 71.92% and project coverage change: -0.71 ⚠️

Comparison is base (618bb23) 82.71% compared to head (b1ee580) 82.00%.

❗ Current head b1ee580 differs from pull request most recent head adb970d. Consider uploading reports for the commit adb970d to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1524      +/-   ##
==========================================
- Coverage   82.71%   82.00%   -0.71%     
==========================================
  Files          58       60       +2     
  Lines        6357     6591     +234     
==========================================
+ Hits         5258     5405     +147     
- Misses       1099     1186      +87     
Impacted Files Coverage Δ
src/huggingface_hub/__init__.py 75.75% <ø> (ø)
src/huggingface_hub/utils/__init__.py 100.00% <ø> (ø)
src/huggingface_hub/inference/_async_client.py 58.24% <58.24%> (ø)
src/huggingface_hub/utils/_runtime.py 56.00% <60.00%> (+0.13%) ⬆️
src/huggingface_hub/inference/_client.py 78.33% <85.18%> (-4.75%) ⬇️
src/huggingface_hub/inference/_common.py 92.17% <92.17%> (ø)
src/huggingface_hub/inference/_text_generation.py 96.25% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@Wauplin Wauplin changed the title [WIP] Support asyncio with AsyncInferenceClient Support asyncio with AsyncInferenceClient Jun 28, 2023
@Wauplin Wauplin marked this pull request as ready for review June 28, 2023 11:54
@osanseviero osanseviero self-requested a review June 28, 2023 12:13
@Wauplin Wauplin requested a review from LysandreJik June 28, 2023 16:28
@Wauplin Wauplin mentioned this pull request Jun 29, 2023
7 tasks
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I understand why you chose to go this way with your implementation. I don't have any problems with the generation of the async file from the sync file.

I have a small issue with the way that it is generated in that it seems to be very specific to the existing sync client. If you add a method to that client, then you'll likely need to get your hands dirty with the script to adapt the file generation. This seems harder than just editing the async client file to ensure that the two files have the same methods. I understand that this is not the case for tasks methods (_make_tasks_methods_async), so maybe I misunderstand and you already set it up this way.

In any case, happy to have this for now and up to you whether you want to update it down the line to have something different after maintaining it for a bit.

As said in the comments, I'd be clearer about the async file not being editable by contributors as it's being generated.

utils/generate_async_inference_client.py Outdated Show resolved Hide resolved
Comment on lines 16 to 19
# WARNING
# This entire file has been generated automatically based on `src/huggingface_hub/inference/_client.py`.
# To re-generate it, run `make style` or `python ./utils/generate_async_inference_client.py --update`.
# WARNING
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd either add a pretty visible disclaimer that any code written in this file will be overwritten by make style, or I would move this file entirely to a _generated folder to ensure that it's not being modified by someone (with the correct links for imports as well).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of a _generated/ folder! Far better than just a tiny disclaimer that no-one will read. Thanks for the suggestion :)

@@ -0,0 +1,138 @@
"""Contains tests for AsyncInferenceClient.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have the copyright as well

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we test that both the async and sync clients output the exact same things, but with one being synchronous and the other being asynchronous? (maybe that's already the case).

If you intend on the two clients having the same methods, with the same signatures, would it make sense to include a test for that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we test that both the async and sync clients output the exact same things, but with one being synchronous and the other being asynchronous?

Yes that's the case for text_generation and sentence_similarity tests. I chose to test those two since text_generation is by far the "most complicated" code. I don't think it's worth testing every single task as long as we know that they are generated via the exact same code.

About exact values, that's kinda what we are doing. But since the tests are using VCR (i.e. not calling the production endpoint), it's not really a complete test anyway. I tried making VCR load the same yaml file for both a sync and a async test but I didn't manage to make it work with VCR.py. In the end I don't think it's worth investigating to much in this.

If you intend on the two clients having the same methods, with the same signatures, would it make sense to include a test for that?

I can add such a test!

@Wauplin
Copy link
Contributor Author

Wauplin commented Jul 4, 2023

Thanks for the review and the feedback @LysandreJik.

As discussed on Slack, the plan is to be pragmatic with the generation script:

  • IMO it makes sense at the moment to quickly get a AsyncInferenceClient without too much manual duplicated work. In particular I expect that adding new methods should not need to update the generating script.
  • I (reasonably) hope that we don't get too much maintenance on InferenceClient once all tasks have been implemented (same as InferenceAPI that got almost unchanged for 2 years).
  • If at any point we realize maintaining a script is harder than manually maintaining the async code, I am completely fine with going fully manual. Generating the code a first time was necessary for consistency but once it's there the future changes should hopefully be minimal.
  • Let's keep an extra eye opened when adding new tasks! The script should be ok but let's be careful on next PRs :)

Since we are aligned on this, let's move forward and merge this PR :)

@Wauplin Wauplin merged commit 6527e07 into main Jul 4, 2023
@Wauplin Wauplin deleted the inference-async-client branch July 4, 2023 14:21
@julien-c
Copy link
Member

julien-c commented Jul 4, 2023

Yay! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants