Support asyncio with AsyncInferenceClient #1524

Wauplin · 2023-06-21T14:38:51Z

Discussed in slack (internal link). Adds a AsyncInferenceClient for async calls to the inference endpoints. File is generated automatically by a script. The goal is to start with this and reassess in the future (maybe drop the generation-script at some point if it's too much work to maintain vs too little benefit).

import asyncio
from huggingface_hub import AsyncInferenceClient

async def main():
    client = AsyncInferenceClient()

    print("\nText to image")
    image = await client.text_to_image("Astronaut riding a horse")
    print(image)

    print("\nText generation (100 tokens)")
    text = await client.text_generation(
        "Number between 1 and 1000: 1, 2, ", max_new_tokens=100, model="bigcode/starcoder"
    )
    print(text)

    print("\nText generation (5 tokens, stream + details)")
    async for token in await client.text_generation(
        "Number between 1 and 1000: 1, 2, ", max_new_tokens=5, model="bigcode/starcoder", stream=True, details=True
    ):
        print(token, flush=True)

loop = asyncio.new_event_loop()
loop.run_until_complete(main())

Text to image
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=512x512 at 0x7F4E739D5AB0>

Text generation (100 tokens)
3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,

Text generation (5 tokens, stream + details)
TextGenerationStreamResponse(token=Token(id=37, text='3', logprob=-0.14343262, special=False), generated_text=None, details=None)
TextGenerationStreamResponse(token=Token(id=30, text=',', logprob=-0.17370605, special=False), generated_text=None, details=None)
TextGenerationStreamResponse(token=Token(id=225, text=' ', logprob=-0.4597168, special=False), generated_text=None, details=None)
TextGenerationStreamResponse(token=Token(id=38, text='4', logprob=-0.1583252, special=False), generated_text=None, details=None)
TextGenerationStreamResponse(token=Token(id=30, text=',', logprob=-0.13146973, special=False), generated_text='3, 4,', details=StreamDetails(finish_reason=<FinishReason.Length: 'length'>, generated_tokens=5, seed=None))

TODO:

Add some tests. Ideally the same one as VCRed one but with asyncio? Do we need everything?
Documentation
- Mention it in guide
- Add to reference
Find a solution to tests with VCR + aiohttp + stream=True (2 tests impacted)

HuggingFaceDocBuilderDev · 2023-06-21T14:43:44Z

The documentation is not available anymore as the PR was closed or merged.

codecov · 2023-06-27T16:57:02Z

Codecov Report

Patch coverage: 71.92% and project coverage change: -0.71 ⚠️

Comparison is base (618bb23) 82.71% compared to head (b1ee580) 82.00%.

❗ Current head b1ee580 differs from pull request most recent head adb970d. Consider uploading reports for the commit adb970d to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1524      +/-   ##
==========================================
- Coverage   82.71%   82.00%   -0.71%     
==========================================
  Files          58       60       +2     
  Lines        6357     6591     +234     
==========================================
+ Hits         5258     5405     +147     
- Misses       1099     1186      +87

Impacted Files	Coverage Δ
src/huggingface_hub/__init__.py	`75.75% <ø> (ø)`
src/huggingface_hub/utils/__init__.py	`100.00% <ø> (ø)`
src/huggingface_hub/inference/_async_client.py	`58.24% <58.24%> (ø)`
src/huggingface_hub/utils/_runtime.py	`56.00% <60.00%> (+0.13%)`	⬆️
src/huggingface_hub/inference/_client.py	`78.33% <85.18%> (-4.75%)`	⬇️
src/huggingface_hub/inference/_common.py	`92.17% <92.17%> (ø)`
src/huggingface_hub/inference/_text_generation.py	`96.25% <100.00%> (ø)`

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

LysandreJik

Overall I understand why you chose to go this way with your implementation. I don't have any problems with the generation of the async file from the sync file.

I have a small issue with the way that it is generated in that it seems to be very specific to the existing sync client. If you add a method to that client, then you'll likely need to get your hands dirty with the script to adapt the file generation. This seems harder than just editing the async client file to ensure that the two files have the same methods. I understand that this is not the case for tasks methods (_make_tasks_methods_async), so maybe I misunderstand and you already set it up this way.

In any case, happy to have this for now and up to you whether you want to update it down the line to have something different after maintaining it for a bit.

As said in the comments, I'd be clearer about the async file not being editable by contributors as it's being generated.

utils/generate_async_inference_client.py

LysandreJik · 2023-07-04T12:09:08Z

src/huggingface_hub/inference/_async_client.py

+# WARNING
+# This entire file has been generated automatically based on `src/huggingface_hub/inference/_client.py`.
+# To re-generate it, run `make style` or `python ./utils/generate_async_inference_client.py --update`.
+# WARNING


I think I'd either add a pretty visible disclaimer that any code written in this file will be overwritten by make style, or I would move this file entirely to a _generated folder to ensure that it's not being modified by someone (with the correct links for imports as well).

I like the idea of a _generated/ folder! Far better than just a tiny disclaimer that no-one will read. Thanks for the suggestion :)

LysandreJik · 2023-07-04T12:14:28Z

tests/test_inference_async_client.py

@@ -0,0 +1,138 @@
+"""Contains tests for AsyncInferenceClient.


This should have the copyright as well

Should we test that both the async and sync clients output the exact same things, but with one being synchronous and the other being asynchronous? (maybe that's already the case).

If you intend on the two clients having the same methods, with the same signatures, would it make sense to include a test for that?

Should we test that both the async and sync clients output the exact same things, but with one being synchronous and the other being asynchronous?

Yes that's the case for text_generation and sentence_similarity tests. I chose to test those two since text_generation is by far the "most complicated" code. I don't think it's worth testing every single task as long as we know that they are generated via the exact same code.

About exact values, that's kinda what we are doing. But since the tests are using VCR (i.e. not calling the production endpoint), it's not really a complete test anyway. I tried making VCR load the same yaml file for both a sync and a async test but I didn't manage to make it work with VCR.py. In the end I don't think it's worth investigating to much in this.

If you intend on the two clients having the same methods, with the same signatures, would it make sense to include a test for that?

I can add such a test!

Wauplin · 2023-07-04T13:45:52Z

Thanks for the review and the feedback @LysandreJik.

As discussed on Slack, the plan is to be pragmatic with the generation script:

IMO it makes sense at the moment to quickly get a AsyncInferenceClient without too much manual duplicated work. In particular I expect that adding new methods should not need to update the generating script.
I (reasonably) hope that we don't get too much maintenance on InferenceClient once all tasks have been implemented (same as InferenceAPI that got almost unchanged for 2 years).
If at any point we realize maintaining a script is harder than manually maintaining the async code, I am completely fine with going fully manual. Generating the code a first time was necessary for consistency but once it's there the future changes should hopefully be minimal.
Let's keep an extra eye opened when adding new tasks! The script should be ok but let's be careful on next PRs :)

Since we are aligned on this, let's move forward and merge this PR :)

julien-c · 2023-07-04T15:19:05Z

Yay! 🎉

Wauplin added 11 commits June 21, 2023 12:28

prepare for async

7fcb534

more stuff

15003b2

comments

6b1fc86

add aiohttp to deps

497fca4

prepare script to generate AsyncInferenceClient

19f2ae6

some progress

cfee5aa

more progress

8daba37

make methods async

098893a

await post calls

4b420ca

remove examples in AsyncClient

37ce4b4

working generating script?

c60f79e

Wauplin added 10 commits June 27, 2023 09:58

Merge branch 'main' into inference-async-client

34bb2e3

make async

a3d35cd

explicit regexes

4d630b6

working AsyncInferenceClient

ba064d4

moving further

f421fdf

closer to a solution

020cf17

more explicit?

514e656

fix typing

130554d

fix tests

9736aaf

fix tests for real

0556631

Wauplin added 7 commits June 28, 2023 11:12

Examples in async client

1dfb0db

Merge branch 'main' into inference-async-client

6ddbcd0

adapt async text generation code

520bd91

better handling of errors

be7a50f

first async tests

19a5b20

fix tests for python3.7

deae6ae

add sentence_similarity tests

e1ccda7

Wauplin added 2 commits June 28, 2023 13:49

reference

cffb9e5

docs

8dcf411

Wauplin changed the title ~~[WIP] Support asyncio with AsyncInferenceClient~~ Support asyncio with AsyncInferenceClient Jun 28, 2023

Wauplin marked this pull request as ready for review June 28, 2023 11:54

fix setup.py

5d4ce6c

osanseviero self-requested a review June 28, 2023 12:13

Wauplin requested a review from LysandreJik June 28, 2023 16:28

Wauplin added 4 commits June 29, 2023 10:54

Merge branch 'main' into inference-async-client

803f0b5

docs

e45e01a

fix styling

1fff6bb

fix tests

4ac38de

Wauplin mentioned this pull request Jun 29, 2023

InferenceClient: next steps #1488

Closed

7 tasks

Wauplin added 5 commits June 29, 2023 15:07

Merge branch 'main' into inference-async-client

866da17

Merge branch 'main' into inference-async-client

0fec22e

fix zero-shot task

04a033d

Merge branch 'main' into inference-async-client

9386c8a

fix zero class

b1ee580

Wauplin mentioned this pull request Jul 3, 2023

Implement remaining tasks in InferenceClient #1539

Closed

12 tasks

LysandreJik approved these changes Jul 4, 2023

View reviewed changes

Wauplin added 7 commits July 4, 2023 14:47

Merge branch 'main' into inference-async-client

ec1104d

Move _async_client.py to a _generated/ folder

750759e

add copyright to tests

42feafc

fix imports

c321ec0

fix import

3730075

fix async text_generation return type when stream=True

153e2d9

test async methods signatures

adb970d

Wauplin merged commit 6527e07 into main Jul 4, 2023

Wauplin deleted the inference-async-client branch July 4, 2023 14:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support asyncio with AsyncInferenceClient #1524

Support asyncio with AsyncInferenceClient #1524

Wauplin commented Jun 21, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jun 21, 2023 •

edited

Loading

codecov bot commented Jun 27, 2023 •

edited

Loading

LysandreJik left a comment

LysandreJik Jul 4, 2023

Wauplin Jul 4, 2023

LysandreJik Jul 4, 2023

LysandreJik Jul 4, 2023

Wauplin Jul 4, 2023

Wauplin commented Jul 4, 2023

julien-c commented Jul 4, 2023

		@@ -0,0 +1,138 @@
		"""Contains tests for AsyncInferenceClient.

Support asyncio with AsyncInferenceClient #1524

Support asyncio with AsyncInferenceClient #1524

Conversation

Wauplin commented Jun 21, 2023 • edited Loading

HuggingFaceDocBuilderDev commented Jun 21, 2023 • edited Loading

codecov bot commented Jun 27, 2023 • edited Loading

Codecov Report

LysandreJik left a comment

Choose a reason for hiding this comment

LysandreJik Jul 4, 2023

Choose a reason for hiding this comment

Wauplin Jul 4, 2023

Choose a reason for hiding this comment

LysandreJik Jul 4, 2023

Choose a reason for hiding this comment

LysandreJik Jul 4, 2023

Choose a reason for hiding this comment

Wauplin Jul 4, 2023

Choose a reason for hiding this comment

Wauplin commented Jul 4, 2023

julien-c commented Jul 4, 2023

Wauplin commented Jun 21, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jun 21, 2023 •

edited

Loading

codecov bot commented Jun 27, 2023 •

edited

Loading