You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have reproduced the same for TIMIT dataset. Now I have converted that model to ONNX and then to tflite for d vector computaion and speaker identification on mobile.
As I verified sincnet tflite model in python it worked for me, but now I have to do the same inference on mobile device.
So I am trying to convert raw audio into tensor and same numpy computation in c/c++.
I have not found any direct way of audio into tensor conversion, as there is no direct implementation of torchaudio for mobile so I am looking to this mfcc computation of audio and then convert into tensor of same dimension.
I can compute MFCC feature using some c++ library as well as some c++ code. Now I want to know Have you tried to compute d vector (or train model for speaker_id) using MFCC instead of torch tensor(soundfile feature and then to torch tensor)?
As I can found you did something like mfcc comparison as mentioned in below link: pytorch/audio#328
So Can you please confirm:
What if I directly compute mfcc feature from audio using some c++ library and then can load sincnet model on mobile for final d vector calculation? Does it affect my d vector values or Does it lead to some major difference in speaker identification final accuracy than using torch tensor?
Can you suggest a method to convert raw audio to tensor on mobile device similar to what has been done here?
I hope my question is clear.
Thanks a lot.
The text was updated successfully, but these errors were encountered:
Hi,
Thanks a lot for your work.
I have reproduced the same for TIMIT dataset. Now I have converted that model to ONNX and then to tflite for d vector computaion and speaker identification on mobile.
As I verified sincnet tflite model in python it worked for me, but now I have to do the same inference on mobile device.
So I am trying to convert raw audio into tensor and same numpy computation in c/c++.
I have not found any direct way of audio into tensor conversion, as there is no direct implementation of torchaudio for mobile so I am looking to this mfcc computation of audio and then convert into tensor of same dimension.
I can compute MFCC feature using some c++ library as well as some c++ code. Now I want to know Have you tried to compute d vector (or train model for speaker_id) using MFCC instead of torch tensor(soundfile feature and then to torch tensor)?
As I can found you did something like mfcc comparison as mentioned in below link:
pytorch/audio#328
So Can you please confirm:
I hope my question is clear.
Thanks a lot.
The text was updated successfully, but these errors were encountered: