Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split utterances with several speaker #23

Open
dasch124 opened this issue Feb 1, 2024 · 1 comment
Open

split utterances with several speaker #23

dasch124 opened this issue Feb 1, 2024 · 1 comment

Comments

@dasch124
Copy link
Member

dasch124 commented Feb 1, 2024

In several texts (e.g. Urfa-107_Cotton_Business) one ELAN segment contains utterances of several speakers. It would be good to separate those:

  • manually split ELAN segments
  • replace speaker initial ("A:") at the beginning of the segment with the Speaker id from the recordings list

We can then transform this into @who attributes.

If the original context should be restored, curators can afterwards add <annotationBlock> elements around the separated <u> elements after tokenization.

@dasch124 dasch124 changed the title split utterances with speaker changes split utterances with several speaker Feb 1, 2024
@miriamaltawil
Copy link
Collaborator

Dear Daniel and Veronika,
I tried to separate all the segments that contained different speakers and I added the speakers ID. I pushed the elan file on github.
I hope it is fine now. The problems occurs when there are two people speaking at the same time and the voices overlap. In that case (I think it happens twice), I could not separate the segments and for the moment I left them together, even though this is also not a good solution. Before moving on solving this issue, I would first like to know if what I did until now looks good for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants