Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about the output of the apply #23

Open
dalaolili opened this issue Jul 23, 2024 · 2 comments
Open

about the output of the apply #23

dalaolili opened this issue Jul 23, 2024 · 2 comments

Comments

@dalaolili
Copy link

I got "SPEAKER 2b533492-bfa4-49f3-b01a-32546f6044bf_2 1 3.473 4.134 A " by run the command of "python brouhaha/main.py apply", but I can't understand the meaning of all the columns, so can you give me some message about them?
What's more, how can I read the .npy file?
Sincerely waiting for your reply.

@LoannPeurey
Copy link

LoannPeurey commented Aug 6, 2024

I will try to give some leads, although I did not participate in the development so take my comments with a grain of salt.

For your first question, I suggest you look into the description of the RTTM file format(Annex A page 12) if you need details about the exact meaning of each column but as an overview it should be:

  • SPEAKER : the model identified speech from somebody (all lines here will be SPEAKER)
  • 2b533492-bfa4-49f3-b01a-32546f6044bf_2 : name of the audio file where speech was found
  • 1 : channel (here it should always be 1 as the model works with mono channel)
  • 3.473 : timecode in seconds of where the speech was identified
  • 4.134 : duration of the speech
  • A : label of the speaker, it should always be A here as I think the model is not trained to differentiate between speakers

So your line tells you speech was detected in file 2b533492-bfa4-49f3-b01a-32546f6044bf_2.wav from time 3.473s to time 7.607s (3.473 + 4.134), the rest is not really relevant.

You can read .npy files by using numpy in python:

import numpy as np
snr = np.load('detailed_snr_labels/2b533492-bfa4-49f3-b01a-32546f6044bf_2.npy')

The content should be snr values for each frame. Frames have a duration of 16.875 ms #14 (comment)

@dalaolili
Copy link
Author

dalaolili commented Aug 8, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants