Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

U2 updates #25

Closed
wants to merge 3 commits into from
Closed

U2 updates #25

wants to merge 3 commits into from

Conversation

sanchit-gandhi
Copy link
Contributor

Small updates to U2 (some formatting, some updating of the code samples)

sanchit-gandhi added 2 commits May 18, 2023 09:38
@@ -35,18 +36,28 @@ classifier = pipeline(
)
```

This pipeline expects the audio data as a NumPy array. All the preprocessing of the raw audio data will be conveniently
handled for us by the pipeline. Let's pick an example to try it out:
All the preprocessing of the raw audio data will be conveniently handled for us by the pipeline, including any resampling.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


minds = load_dataset("PolyAI/minds14", name="en-AU", split="train")
minds = minds.cast_column("audio", Audio(sampling_rate=16_000))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think it's cleaner if we don't have to do any data pre-/post-processing and let the pipeline handle this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, IMO it is good to reinforce the idea of sampling rates.


```py
example = minds[0]
example["transcription"]
"ich möchte gerne Geld auf mein Konto einzahlen"
```

Find a pre-trained ASR model for German language on the 🤗 Hub, instantiate a pipeline, and transcribe the example:
Next, we can find a pre-trained ASR model for German language on the 🤗 Hub, instantiate a pipeline, and transcribe the example.
Here, we'll use the checkpoint [maxidl/wav2vec2-large-xlsr-german](https://huggingface.co/maxidl/wav2vec2-large-xlsr-german):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally like having links to the checkpoints on the Hub so that I can look at the model cards

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lovely, how about we swap the community checkpoint to an official DE checkpoint for XLSR: https://huggingface.co/facebook/wav2vec2-large-xlsr-53-german

the right format for a model
- if the result isn't ideal, this still gives you a quick baseline for future fine-tuning
- once you fine-tune a model on your custom data and share it on Hub, the whole community will be able to use it quickly
and effortlessly via the `pipeline()` method making AI more accessible.

and effortlessly via the `pipeline()` method, making AI more accessible
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistency with previous bullet points

@MKhalusova MKhalusova self-requested a review May 19, 2023 15:49
Copy link
Member

@Vaibhavs10 Vaibhavs10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! 2 small nits about the sampling rate and the XLSR model used for transcription.
Think it makes sense to use the official checkpoints where possible, helps build credibility IMO.


minds = load_dataset("PolyAI/minds14", name="de-DE", split="train")
minds = minds.cast_column("audio", Audio(sampling_rate=16_000))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personal preference: It makes sense to explicitly resample here, just to reinforce the idea of sampling rates to the attendee.

We can later on explicitly write that sampling_rate handles different rates automagically.

WDYT?


```py
example = minds[0]
example["transcription"]
"ich möchte gerne Geld auf mein Konto einzahlen"
```

Find a pre-trained ASR model for German language on the 🤗 Hub, instantiate a pipeline, and transcribe the example:
Next, we can find a pre-trained ASR model for German language on the 🤗 Hub, instantiate a pipeline, and transcribe the example.
Here, we'll use the checkpoint [maxidl/wav2vec2-large-xlsr-german](https://huggingface.co/maxidl/wav2vec2-large-xlsr-german):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lovely, how about we swap the community checkpoint to an official DE checkpoint for XLSR: https://huggingface.co/facebook/wav2vec2-large-xlsr-53-german


minds = load_dataset("PolyAI/minds14", name="en-AU", split="train")
minds = minds.cast_column("audio", Audio(sampling_rate=16_000))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, IMO it is good to reinforce the idea of sampling rates.

Copy link
Contributor

@MKhalusova MKhalusova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Sorry about delay. Feel free to merge

@MKhalusova MKhalusova closed this Aug 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants