-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
snr and c50 detailed arrays are in an unexpected length #14
Comments
|
Hi to both of you,
These are the parameters that determine the output range. For the SNR, it is bounded between -15 and 80 dB. Best, |
Awesome! Please let me know of your progress! I feel like the reason behind this strange behavior is obvious, but I can't see it yet! |
Closing for now! Feel free to re-open if you're still struggling to figure out what the model returns :) |
Hi, thanks for your work. I ran brouhaha on a file of length 3:37:57.326, which is 13077.326 seconds. I examined the c50 and detailed_snr_labels
.npy
files, and their shape was(756644,)
. I expected that 756644 * 16 / 1000 would equal the length of the clip (16ms per frame, as per the paper), but I saw it is not the case.The ratio between the length of the audio file and the length of the arrays came out to 17.28ms per frame. I manually verified this by looking graphing the SNR and seeing that it lines up with speech starting and ending, only when I used 17.28/1000 as the conversion factor from frames to seconds. Where does the number come from? It doesn't come out to a whole number of samples in 16KHz (it's around 276.5 samples per frame, though maybe padding can explain the .5?)
An interesting side-note is that the
.rttm
file has correct timings, so it's not that everything is off.The text was updated successfully, but these errors were encountered: