You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello! I read the infer.py file and as I understand, it firstly divide complete audio into chunks and fed these chunks into the model. At the end, it stack all the outputs to make rttm file. out_chunks.append(ys[0].data) ...... out_chunks = [np.insert(o, o.shape[1], np.zeros((max_n_speakers - o.shape[1], o.shape[0])), axis=1) for o in out_chunks] outdata = np.vstack(out_chunks)
I'm a little confused about how you can make sure the speaker orders of each chunks are consistent for the EDA model? Because the attractors in EDA are dynamically generated based on the chunk. One speaker may disappear in another chunk of the same audio?
The text was updated successfully, but these errors were encountered:
Hello! I read the infer.py file and as I understand, it firstly divide complete audio into chunks and fed these chunks into the model. At the end, it stack all the outputs to make rttm file.
out_chunks.append(ys[0].data)
......
out_chunks = [np.insert(o, o.shape[1], np.zeros((max_n_speakers - o.shape[1], o.shape[0])), axis=1) for o in out_chunks]
outdata = np.vstack(out_chunks)
I'm a little confused about how you can make sure the speaker orders of each chunks are consistent for the EDA model? Because the attractors in EDA are dynamically generated based on the chunk. One speaker may disappear in another chunk of the same audio?
The text was updated successfully, but these errors were encountered: