Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable new models in audio-to-text #163

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

eliteprox
Copy link
Collaborator

@eliteprox eliteprox commented Aug 17, 2024

This change adds support for new whisper models distil-whisper/distil-large-v3 and openai/whisper-medium.

The FLOAT16 optimization parameter is also added as an option to optimize speed and memory usage.

{
	"pipeline": "audio-to-text",
	"model_id": "openai/whisper-medium",
	"price_per_unit": 999,
	"warm": true,
	"optimization_flags": {
		"FLOAT16": true
	}
}

Credit to @ad-astra-video for intially exploring these models and optimizations

logger.info("AudioToTextPipeline using float16 precision for %s", model_id)
kwargs["torch_dtype"] = torch.float16

if bfloat16_enabled:
logger.info("AudioToTextPipeline using bfloat16 precision for %s", model_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eliteprox, thanks for the pull request! 🚀 It looks good overall. However, please keep in mind that the default models openai/whisper-large-v3 and distil-whisper/distil-large-v3 use weights in either float16 or bfloat16 formats. The torch_dtype parameter is primarily for the calculations during runtime. You can verify this by checking the model files in these repositories: Hugging Face - distil-large-v3. Notice the presence of files with the .fp32.safetensors extension, indicating the format being used.

If the standard .safetensors (fp16) format meets your needs, you might consider removing the FLOAT16 environment variable and instead switch based on the model extension. This approach was implemented by Yondon in this commit. I will leave that decision to you based on your research 👍🏻. Feel free to merge when you think this pull request is done 🚀.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tip, I updated the logic to load recommended float values by model. Tested that they download and load correctly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants