-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
U2 updates #25
U2 updates #25
Conversation
@@ -35,18 +36,28 @@ classifier = pipeline( | |||
) | |||
``` | |||
|
|||
This pipeline expects the audio data as a NumPy array. All the preprocessing of the raw audio data will be conveniently | |||
handled for us by the pipeline. Let's pick an example to try it out: | |||
All the preprocessing of the raw audio data will be conveniently handled for us by the pipeline, including any resampling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Once huggingface/transformers#23445 is merged)
|
||
minds = load_dataset("PolyAI/minds14", name="en-AU", split="train") | ||
minds = minds.cast_column("audio", Audio(sampling_rate=16_000)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think it's cleaner if we don't have to do any data pre-/post-processing and let the pipeline handle this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, IMO it is good to reinforce the idea of sampling rates.
|
||
```py | ||
example = minds[0] | ||
example["transcription"] | ||
"ich möchte gerne Geld auf mein Konto einzahlen" | ||
``` | ||
|
||
Find a pre-trained ASR model for German language on the 🤗 Hub, instantiate a pipeline, and transcribe the example: | ||
Next, we can find a pre-trained ASR model for German language on the 🤗 Hub, instantiate a pipeline, and transcribe the example. | ||
Here, we'll use the checkpoint [maxidl/wav2vec2-large-xlsr-german](https://huggingface.co/maxidl/wav2vec2-large-xlsr-german): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally like having links to the checkpoints on the Hub so that I can look at the model cards
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lovely, how about we swap the community checkpoint to an official DE checkpoint for XLSR: https://huggingface.co/facebook/wav2vec2-large-xlsr-53-german
the right format for a model | ||
- if the result isn't ideal, this still gives you a quick baseline for future fine-tuning | ||
- once you fine-tune a model on your custom data and share it on Hub, the whole community will be able to use it quickly | ||
and effortlessly via the `pipeline()` method making AI more accessible. | ||
|
||
and effortlessly via the `pipeline()` method, making AI more accessible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consistency with previous bullet points
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool! 2 small nits about the sampling rate and the XLSR model used for transcription.
Think it makes sense to use the official checkpoints where possible, helps build credibility IMO.
|
||
minds = load_dataset("PolyAI/minds14", name="de-DE", split="train") | ||
minds = minds.cast_column("audio", Audio(sampling_rate=16_000)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personal preference: It makes sense to explicitly resample here, just to reinforce the idea of sampling rates to the attendee.
We can later on explicitly write that sampling_rate
handles different rates automagically.
WDYT?
|
||
```py | ||
example = minds[0] | ||
example["transcription"] | ||
"ich möchte gerne Geld auf mein Konto einzahlen" | ||
``` | ||
|
||
Find a pre-trained ASR model for German language on the 🤗 Hub, instantiate a pipeline, and transcribe the example: | ||
Next, we can find a pre-trained ASR model for German language on the 🤗 Hub, instantiate a pipeline, and transcribe the example. | ||
Here, we'll use the checkpoint [maxidl/wav2vec2-large-xlsr-german](https://huggingface.co/maxidl/wav2vec2-large-xlsr-german): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lovely, how about we swap the community checkpoint to an official DE checkpoint for XLSR: https://huggingface.co/facebook/wav2vec2-large-xlsr-53-german
|
||
minds = load_dataset("PolyAI/minds14", name="en-AU", split="train") | ||
minds = minds.cast_column("audio", Audio(sampling_rate=16_000)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, IMO it is good to reinforce the idea of sampling rates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Sorry about delay. Feel free to merge
Small updates to U2 (some formatting, some updating of the code samples)