Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about infer.py for EDA #30

Open
Achronferry opened this issue Oct 12, 2021 · 1 comment
Open

Question about infer.py for EDA #30

Achronferry opened this issue Oct 12, 2021 · 1 comment

Comments

@Achronferry
Copy link

Achronferry commented Oct 12, 2021

Hello! I read the infer.py file and as I understand, it firstly divide complete audio into chunks and fed these chunks into the model. At the end, it stack all the outputs to make rttm file.
out_chunks.append(ys[0].data)
......
out_chunks = [np.insert(o, o.shape[1], np.zeros((max_n_speakers - o.shape[1], o.shape[0])), axis=1) for o in out_chunks]
outdata = np.vstack(out_chunks)
I'm a little confused about how you can make sure the speaker orders of each chunks are consistent for the EDA model? Because the attractors in EDA are dynamically generated based on the chunk. One speaker may disappear in another chunk of the same audio?

@shota-horiguchi
Copy link
Contributor

Chunking during inference is not expected. Please make sure that the chunk size is large enough so that the recording is not split during inference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants