Skip to content

Conversation

@xenova
Copy link
Collaborator

@xenova xenova commented Dec 14, 2024

This PR adds support for Moonshine, a family of speech-to-text models optimized for fast and accurate automatic speech recognition (ASR) on resource-constrained devices. They are well-suited to real-time, on-device applications like live transcription and voice command recognition, and will be perfect for in-browser usage. This PR is using a dev branch of transformers by @eustlb (huggingface/transformers#34784), and a dev branch of Optimum for ONNX conversion.

Example usage:

With pipeline API:

import { pipeline } from "@huggingface/transformers";

const transcriber = await pipeline("automatic-speech-recognition", "onnx-community/moonshine-tiny-ONNX");
const output = await transcriber("https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav");
console.log(output);
// { text: 'And so my fellow Americans ask not what your country can do for you as what you can do for your country.' }

Without pipeline API:

import { MoonshineForConditionalGeneration, AutoProcessor, read_audio } from "@huggingface/transformers";

// Load model and processor
const model_id = "onnx-community/moonshine-tiny-ONNX";
const model = await MoonshineForConditionalGeneration.from_pretrained(model_id, {
    dtype: "q4",
});
const processor = await AutoProcessor.from_pretrained(model_id);

// Load audio and prepare inputs
const audio = await read_audio("https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav", 16000);
const inputs = await processor(audio);

// Generate outputs
const outputs = await model.generate({ ...inputs, max_new_tokens: 100 });

// Decode outputs
const decoded = processor.batch_decode(outputs, { skip_special_tokens: true });
console.log(decoded[0]);
// And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.

closes #990

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@xenova xenova mentioned this pull request Dec 14, 2024
2 tasks
@xenova
Copy link
Collaborator Author

xenova commented Dec 14, 2024

Model works with WebGPU too, and I've adapted this real-time demo to work with model. Significantly faster than the whisper version. 🔥

@xenova xenova merged commit aa60302 into main Dec 15, 2024
4 checks passed
@xenova xenova deleted the add-moonshine branch December 15, 2024 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for moonshine ASR models

3 participants