[HF][streaming][4/n] Image2Text (no streaming, but lots of fixing) #855

rossdanlm · 2024-01-10T09:14:35Z

[HF][streaming][4/n] Image2Text (no streaming, but lots of fixing)

This model parser does not support streaming (surprising!):

TypeError: ImageToTextPipeline._sanitize_parameters() got an unexpected keyword argument 'streamer'

In general, I mainly just did a lot of fixing up to make sure that this worked as expected. Things I fixed:

Now works for multiple images (it did before, but didn't process responses for each properly, just put the entire response)
Constructing responses to be in pure text output
Specified the completion params that are supported (only 2: https://github.com/huggingface/transformers/blob/701298d2d3d5c7bde45e71cce12736098e3f05ef/src/transformers/pipelines/image_to_text.py#L97-L102C13)

Next diff I will add support for b64 encoded image format --> we need to convert this to a PIL, see https://github.com/huggingface/transformers/blob/701298d2d3d5c7bde45e71cce12736098e3f05ef/src/transformers/pipelines/image_to_text.py#L83

Test Plan

Rebase onto and test it: 5f3b667.

Follow the README from AIConfig Editor https://github.com/lastmile-ai/aiconfig/tree/main/python/src/aiconfig/editor#dev, then run these command

aiconfig_path=/Users/rossdancraig/Projects/aiconfig/cookbooks/Gradio/huggingface.aiconfig.json
parsers_path=/Users/rossdancraig/Projects/aiconfig/cookbooks/Gradio/hf_model_parsers.py
alias aiconfig="python3 -m 'aiconfig.scripts.aiconfig_cli'"
aiconfig edit --aiconfig-path=$aiconfig_path --server-port=8080 --server-mode=debug_servers --parsers-module-path=$parsers_path

Then in AIConfig Editor run the prompt (streaming not supported for this model so just took screenshots)

These are the images I tested

Before

After

Stack created with Sapling. Best reviewed with ReviewStack.

TSIA Adding streaming functionality to text summarization model parser ## Test Plan Rebase onto and test it with 11ace0a. Follow the README from AIConfig Editor https://github.com/lastmile-ai/aiconfig/tree/main/python/src/aiconfig/editor#dev, then run these command ```bash aiconfig_path=/Users/rossdancraig/Projects/aiconfig/cookbooks/Gradio/huggingface.aiconfig.json parsers_path=/Users/rossdancraig/Projects/aiconfig/cookbooks/Gradio/hf_model_parsers.py alias aiconfig="python3 -m 'aiconfig.scripts.aiconfig_cli'" aiconfig edit --aiconfig-path=$aiconfig_path --server-port=8080 --server-mode=debug_servers --parsers-module-path=$parsers_path ``` Then in AIConfig Editor run the prompt (it will be streaming format by default) https://github.com/lastmile-ai/aiconfig/assets/151060367/e91a1d8b-a3e9-459c-9eb1-2d8e5ec58e73

TSIA Adding streaming output support for text translation model parser. I also fixed a bug where we didn't pass in `"translation"` key into the pipeline ## Test Plan Rebase onto and test it: 5b74344. Follow the README from AIConfig Editor https://github.com/lastmile-ai/aiconfig/tree/main/python/src/aiconfig/editor#dev, then run these command ```bash aiconfig_path=/Users/rossdancraig/Projects/aiconfig/cookbooks/Gradio/huggingface.aiconfig.json parsers_path=/Users/rossdancraig/Projects/aiconfig/cookbooks/Gradio/hf_model_parsers.py alias aiconfig="python3 -m 'aiconfig.scripts.aiconfig_cli'" aiconfig edit --aiconfig-path=$aiconfig_path --server-port=8080 --server-mode=debug_servers --parsers-module-path=$parsers_path ``` With Streaming https://github.com/lastmile-ai/aiconfig/assets/151060367/d7bc9df2-2993-4709-bf9b-c5b7979fb00f Without Streaming https://github.com/lastmile-ai/aiconfig/assets/151060367/71eb6ab3-5d6f-4c5d-8b82-f3daf4c5e610

…completion params) Ok this one is weird. Today, streaming is only ever supported on text outputs in Transformers library. See `BaseStreamer` in here: https://github.com/search?q=repo%3Ahuggingface%2Ftransformers%20BaseStreamer&type=code In the future it may support other formats, but not yet. For example, OpenAI supports it: https://community.openai.com/t/streaming-from-text-to-speech-api/493784 Anyways, I basically here only did some updates to docs to clarify why completion params were null. Jonathan and I synced about this briefly ofline, but I forgot again so wanted to capture it here so no one forgets

This model parser does not support streaming (surprising!): ``` TypeError: ImageToTextPipeline._sanitize_parameters() got an unexpected keyword argument 'streamer' ``` In general, I mainly just did a lot of fixing up to make sure that this worked as expected. Things I fixed: 1. Now works for multiple images (it did before, but didn't process responses for each properly, just put the entire response) 2. Constructing responses to be in pure text output 3. Specified the completion params that are supported (only 2: https://github.com/huggingface/transformers/blob/701298d2d3d5c7bde45e71cce12736098e3f05ef/src/transformers/pipelines/image_to_text.py#L97-L102C13) Next diff I will add support for b64 encoded image format --> we need to convert this to a PIL, see https://github.com/huggingface/transformers/blob/701298d2d3d5c7bde45e71cce12736098e3f05ef/src/transformers/pipelines/image_to_text.py#L83 ## Test Plan Rebase onto and test it: 5f3b667. Follow the README from AIConfig Editor https://github.com/lastmile-ai/aiconfig/tree/main/python/src/aiconfig/editor#dev, then run these command ```bash aiconfig_path=/Users/rossdancraig/Projects/aiconfig/cookbooks/Gradio/huggingface.aiconfig.json parsers_path=/Users/rossdancraig/Projects/aiconfig/cookbooks/Gradio/hf_model_parsers.py alias aiconfig="python3 -m 'aiconfig.scripts.aiconfig_cli'" aiconfig edit --aiconfig-path=$aiconfig_path --server-port=8080 --server-mode=debug_servers --parsers-module-path=$parsers_path ``` Then in AIConfig Editor run the prompt (streaming not supported for this model so just took screenshots) These are the images I tested ![fox_in_forest](https://github.com/lastmile-ai/aiconfig/assets/151060367/ca7d1723-9e12-4cc8-9d8d-41fa9f466919) ![trex](https://github.com/lastmile-ai/aiconfig/assets/151060367/2f556ead-a808-4aea-9378-a2537c715e1f) Before <img width="1268" alt="Screenshot 2024-01-10 at 04 00 22" src="https://github.com/lastmile-ai/aiconfig/assets/151060367/4426f2b9-0b83-48e2-8af1-865f157ae12c"> After <img width="1277" alt="Screenshot 2024-01-10 at 04 02 01" src="https://github.com/lastmile-ai/aiconfig/assets/151060367/2ed172a8-ed26-4c1b-9a9e-5c240376a278">

saqadri · 2024-01-10T14:53:05Z

...sions/HuggingFace/python/src/aiconfig_extension_hugging_face/local_inference/image_2_text.py

@@ -93,10 +103,11 @@ async def deserialize(
 await aiconfig.callback_manager.run_callbacks(CallbackEvent("on_deserialize_start", __name__, {"prompt": prompt, "params": params}))

 # Build Completion data
- completion_params = self.get_model_settings(prompt, aiconfig)
+ model_settings = self.get_model_settings(prompt, aiconfig)
+ completion_params = refine_completion_params(model_settings)


Good catch!

saqadri · 2024-01-10T14:53:52Z

...sions/HuggingFace/python/src/aiconfig_extension_hugging_face/local_inference/image_2_text.py

- prompt.outputs = [output]
- await aiconfig.callback_manager.run_callbacks(CallbackEvent("on_run_complete", __name__, {"result": prompt.outputs}))
+ prompt.outputs = outputs
+ print(f"{prompt.outputs=}")


Remove print?

Fixed in #862

saqadri · 2024-01-10T14:56:32Z

...sions/HuggingFace/python/src/aiconfig_extension_hugging_face/local_inference/image_2_text.py

+ # HuggingFace Text summarization does not support function
+ # calls so shouldn't get here, but just being safe


nit: Hugging Face image-to-text...

Fixed in #862

[HF][5/n] Image2Text: Allow base64 inputs for images Before we didn't allow base64, only URI (either local or http or https). This is good becuase our text2Image model parser outputs into a base64 format, so this will allow us to chain model prompts! ## Test Plan Rebase and test on 0d7ae2b. Follow the README from AIConfig Editor https://github.com/lastmile-ai/aiconfig/tree/main/python/src/aiconfig/editor#dev, then run these command ```bash aiconfig_path=/Users/rossdancraig/Projects/aiconfig/cookbooks/Gradio/huggingface.aiconfig.json parsers_path=/Users/rossdancraig/Projects/aiconfig/cookbooks/Gradio/hf_model_parsers.py alias aiconfig="python3 -m 'aiconfig.scripts.aiconfig_cli'" aiconfig edit --aiconfig-path=$aiconfig_path --server-port=8080 --server-mode=debug_servers --parsers-module-path=$parsers_path ``` Then in AIConfig Editor run the prompt (streaming not supported so just took screenshots) These are the images I tested (with bear being in base64 format) ![fox_in_forest](https://github.com/lastmile-ai/aiconfig/assets/151060367/ca7d1723-9e12-4cc8-9d8d-41fa9f466919) ![bear-eating-honey](https://github.com/lastmile-ai/aiconfig/assets/151060367/a947d89e-c02a-4c64-8183-ff1c85802859) <img width="1281" alt="Screenshot 2024-01-10 at 04 57 44" src="https://github.com/lastmile-ai/aiconfig/assets/151060367/ea60cbc5-e6ab-4bf2-82e7-17f3182fdc5c"> --- Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/lastmile-ai/aiconfig/pull/856). * __->__ #856 * #855 * #854 * #853 * #851

Small fixes from comments from Sarmad + me from these diffs: - #854 - #855 - #821 Main things I did - rename `refine_chat_completion_params` --> `chat_completion_params` - edit `get_text_output` to not check for `OutputDataWithValue` - sorted the init file to be alphabetical - fixed some typos/print statements - made some error messages a bit more intuitive with prompt name - sorted some imports - fixed old class name `HuggingFaceAutomaticSpeechRecognition` --> `HuggingFaceAutomaticSpeechRecognitionTransformer` ## Test Plan These are all small nits and shouldn't change functionality

HF transformers: Small fixes nits Small fixes from comments from Sarmad + me from these diffs: - #854 - #855 - #821 Main things I did - rename `refine_chat_completion_params` --> `chat_completion_params` - edit `get_text_output` to not check for `OutputDataWithValue` - sorted the init file to be alphabetical - fixed some typos/print statements - made some error messages a bit more intuitive with prompt name - sorted some imports - fixed old class name `HuggingFaceAutomaticSpeechRecognition` --> `HuggingFaceAutomaticSpeechRecognitionTransformer` ## Test Plan These are all small nits and shouldn't change functionality

rossdanlm linked an issue Jan 10, 2024 that may be closed by this pull request

Fast Follows for image2Text HF model parser #835

Closed

rossdanlm marked this pull request as ready for review January 10, 2024 09:25

rossdanlm requested review from saqadri, rholinshead, suyoglastmileai, Ankush-lastmile and jonathanlastmileai as code owners January 10, 2024 09:25

rossdanlm force-pushed the pr855 branch from f85a0f0 to a5a26aa Compare January 10, 2024 09:40

rossdanlm mentioned this pull request Jan 10, 2024

[HF][5/n] Image2Text: Allow base64 inputs for images #856

Merged

rossdanlm force-pushed the pr855 branch from a5a26aa to 8df750b Compare January 10, 2024 10:02

Rossdan Craig rossdan@lastmileai.dev added 4 commits January 10, 2024 05:08

rossdanlm force-pushed the pr855 branch from 8df750b to 19d7844 Compare January 10, 2024 10:09

saqadri approved these changes Jan 10, 2024

View reviewed changes

saqadri merged commit 19d7844 into main Jan 10, 2024

rossdanlm deleted the pr855 branch January 10, 2024 18:38

rossdanlm mentioned this pull request Jan 10, 2024

HF transformers: Small fixes nits #862

Merged

This was referenced Jan 11, 2024

Add model setting completion params for Image2Text prompt schema #875

Merged

Investigate ASR being slow and hanging #897

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HF][streaming][4/n] Image2Text (no streaming, but lots of fixing) #855

[HF][streaming][4/n] Image2Text (no streaming, but lots of fixing) #855

rossdanlm commented Jan 10, 2024 •

edited

Loading

saqadri Jan 10, 2024

saqadri Jan 10, 2024

rossdanlm Jan 10, 2024

saqadri Jan 10, 2024

rossdanlm Jan 10, 2024

		# HuggingFace Text summarization does not support function
		# calls so shouldn't get here, but just being safe

[HF][streaming][4/n] Image2Text (no streaming, but lots of fixing) #855

[HF][streaming][4/n] Image2Text (no streaming, but lots of fixing) #855

Conversation

rossdanlm commented Jan 10, 2024 • edited Loading

Test Plan

saqadri Jan 10, 2024

Choose a reason for hiding this comment

saqadri Jan 10, 2024

Choose a reason for hiding this comment

rossdanlm Jan 10, 2024

Choose a reason for hiding this comment

saqadri Jan 10, 2024

Choose a reason for hiding this comment

rossdanlm Jan 10, 2024

Choose a reason for hiding this comment

rossdanlm commented Jan 10, 2024 •

edited

Loading