split utterances with several speaker #23

dasch124 · 2024-02-01T12:44:29Z

In several texts (e.g. Urfa-107_Cotton_Business) one ELAN segment contains utterances of several speakers. It would be good to separate those:

manually split ELAN segments
replace speaker initial ("A:") at the beginning of the segment with the Speaker id from the recordings list

We can then transform this into @who attributes.

If the original context should be restored, curators can afterwards add <annotationBlock> elements around the separated <u> elements after tokenization.

The text was updated successfully, but these errors were encountered:

miriamaltawil · 2024-02-09T13:43:43Z

Dear Daniel and Veronika,
I tried to separate all the segments that contained different speakers and I added the speakers ID. I pushed the elan file on github.
I hope it is fine now. The problems occurs when there are two people speaking at the same time and the voices overlap. In that case (I think it happens twice), I could not separate the segments and for the moment I left them together, even though this is also not a good solution. Before moving on solving this issue, I would first like to know if what I did until now looks good for you.

dasch124 added data-processing data curation labels Feb 1, 2024

dasch124 changed the title ~~split utterances with speaker changes~~ split utterances with several speaker Feb 1, 2024

dasch124 assigned miriamaltawil and VeronikaEngler Feb 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

split utterances with several speaker #23

split utterances with several speaker #23

dasch124 commented Feb 1, 2024 •

edited

Loading

miriamaltawil commented Feb 9, 2024

split utterances with several speaker #23

split utterances with several speaker #23

Comments

dasch124 commented Feb 1, 2024 • edited Loading

miriamaltawil commented Feb 9, 2024

dasch124 commented Feb 1, 2024 •

edited

Loading