-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
about the output of the apply #23
Comments
I will try to give some leads, although I did not participate in the development so take my comments with a grain of salt. For your first question, I suggest you look into the description of the RTTM file format(Annex A page 12) if you need details about the exact meaning of each column but as an overview it should be:
So your line tells you speech was detected in file 2b533492-bfa4-49f3-b01a-32546f6044bf_2.wav from time 3.473s to time 7.607s (3.473 + 4.134), the rest is not really relevant. You can read .npy files by using numpy in python: import numpy as np
snr = np.load('detailed_snr_labels/2b533492-bfa4-49f3-b01a-32546f6044bf_2.npy') The content should be snr values for each frame. Frames have a duration of 16.875 ms #14 (comment) |
get it!thanks for your reply!
…---- Replied Message ----
| From | Loann ***@***.***> |
| Date | 08/06/2024 21:17 |
| To | ***@***.***> |
| Cc | ***@***.***>***@***.***> |
| Subject | Re: [marianne-m/brouhaha-vad] about the output of the apply (Issue #23) |
I will try to give some leads, although I did not participate in the development so take some of it with a grain of salt.
For your first question, I suggest you look into the description of the RTTM file format if you need details about the exact meaning of each column but as an overview it should be:
SPEAKER : the model identified speech from somebody (all lines here will be SPEAKER)
2b533492-bfa4-49f3-b01a-32546f6044bf_2 : name of the audio file where speech was found
1 : channel (here it should always be 1 as the model works with mono channel)
3.473 : timecode in seconds of where the speech was identified
4.134 : duration of the speech
A : label of the speaker, it should always be A here as I think the model is not trained to differentiate between speakers
So your line tells you speech was detected in file 2b533492-bfa4-49f3-b01a-32546f6044bf_2.wav from time 3.473s to time 7.607s (3.473 + 4.134), the rest is not really relevant.
You can read .npy files by using numpy in python:
importnumpyasnpsnr=np.load('detailed_snr_labels/2b533492-bfa4-49f3-b01a-32546f6044bf_2.npy')
The content should be snr values for each frame which have durations of 16.875 ms #14 (comment)
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
I got "SPEAKER 2b533492-bfa4-49f3-b01a-32546f6044bf_2 1 3.473 4.134 A " by run the command of "python brouhaha/main.py apply", but I can't understand the meaning of all the columns, so can you give me some message about them?
What's more, how can I read the .npy file?
Sincerely waiting for your reply.
The text was updated successfully, but these errors were encountered: