Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hf inference] ASR remote inference model parser impl #1020

Merged
merged 1 commit into from
Jan 25, 2024
Merged

Conversation

Ankush-lastmile
Copy link
Member

@Ankush-lastmile Ankush-lastmile commented Jan 25, 2024

[hf inference] ASR remote inference model parser impl

Implementation of the HuggingFaceAutomaticSpeechRecognition Model parser using the inference endpoint to run inference. Python API takes in bytes as well as path, skip binary for now.

Very similar to #1018

Testplan

Screenshot 2024-01-24 at 10 37 05 PM
  1. Temporarily add model parser to Gradio Cookbook model parser registry.
    asr = HuggingFaceAutomaticSpeechRecognitionRemoteInference()
    AIConfigRuntime.register_model_parser(
        asr, asr.id()
    )
  1. run AIConfig Edit on Gradio example

python3 -m 'aiconfig.scripts.aiconfig_cli' edit --aiconfig-path=cookbooks/Gradio/huggingface.aiconfig.json --parsers-module-path=cookbooks/Gradio/hf_model_parsers.py --server-mode=debug_servers

Copy link
Contributor

@rholinshead rholinshead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor changes, mostly from copy/paste

Comment on lines 216 to 219
if len(inputs) > 1:
raise ValueError(
f"Multiple audio inputs are not supported for the HF Automatic Speech Recognition Inference api. Please specify a single audio input attachment."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok, that's what I was thinking. Instead of doing this, we should just make the validate_and_retrieve function return a single value, not array

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated. refactpred validate_and_retrieve returns a single value, not an array


# HuggingFace Automatic Speech Recognition outputs should only ever be string
# format so shouldn't get here, but just being safe
return json.dumps(output_data, indent=2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on your comment on the image_2_text one, maybe this should raise a ValueError instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, updated. Had previously copy pasted what you had for consistency.

@Ankush-lastmile
Copy link
Member Author

  • Fixed a couple of nits on comments
  • Renamed any references of image to audio
  • Refactored validate_and_retrieve_audio_from_attachments to also validate only one audio attachment

Testplan,

Same as in original pr description, output had the same output so omitting the screenshot

f"Attachment has no mime type. Specify the audio mimetype in the aiconfig"
)

if not attachment.mime_type.startswith("audio/"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be "audio" without trailing slash since we default to just "audio" and this would invalidate that. Alternatively, could default to "audio/*" but not sure if that will work the same

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I see what you mean. Technically this doesn't break anything yet but will be needed.

Updated, thanks for catching

Copy link
Contributor

@rholinshead rholinshead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepting to unblock, but please update the mimetype validation before landing

Implementation of the HuggingFaceAutomaticSpeechRecognition Model parser using the inference endpoint to run inference. Python API takes in bytes as well as path, skip binary for now.

Very similar to #1018

## Testplan
<img width="1000" alt="Screenshot 2024-01-24 at 10 37 05 PM" src="https://github.com/lastmile-ai/aiconfig/assets/141073967/808956ce-e3be-4528-9f34-c8d31d704ddb">

1. Temporarily add model parser to Gradio Cookbook model parser registry.
```
    asr = HuggingFaceAutomaticSpeechRecognitionRemoteInference()
    AIConfigRuntime.register_model_parser(
        asr, asr.id()
    )
```

2. run AIConfig Edit on Gradio example

`python3 -m 'aiconfig.scripts.aiconfig_cli' edit --aiconfig-path=cookbooks/Gradio/huggingface.aiconfig.json --parsers-module-path=cookbooks/Gradio/hf_model_parsers.py --server-mode=debug_servers`
@Ankush-lastmile
Copy link
Member Author

update the mimetype validation to check audio, not audio/

@Ankush-lastmile Ankush-lastmile merged commit 50ac544 into main Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants