feat: Add audacity marker file support and creating annotation from rttm file #92

yojul · 2023-06-30T16:05:43Z

Problem

When evaluating speaker diarization pipelines, one might want to use Annotation objects and creating annotation from rttm (or other format) as well as serializing/writing annotation to rttm (or other format).
Audacity marker track feature is a very convenient (and free) way to create ground truth segmentation for speaker diarization. The format is a .txt file very similar to already implemented LAB file support but tab separated.

Solution

Refactor methods for various file format support

Create a generic _serialize method to replace multiple to_{format} methods.
Create a generic _write method method to replace multiple write_{format} methods.
to_<format> and write_<format> are now partial methods from generic methods.

Currently supported formats are :

Rttm : annotation.to_rttm() and annotation.write_rttm(file).
Audacity : annotation.to_audacity() and annotation.write_audacity(file).
Lab : annotation.to_lab() and annotation.write_lab(file).

Therefore, to add a new format one only need to implement _iter_{format} methods similarly to _iter_rttm or _iter_lab.

Creating annotation from audacity or rttm

Similarly to the from_df class methods, I created from_audacity and from_rttm class methods to create easily annotations from those file formats.

Usage :

with open('file.rttm') as f : 
      annotation = Annotation.from_rttm(f)

…rom generic _serialize and _write methods

…rttm files

…m or audacity

hbredin · 2023-07-16T14:07:00Z

Thanks for this PR.

Note that RTTM files may contain annotations for multiple audio files (hence the second uri field) in which case I am not sure what the Annotation.from_rttm method should do:

raise an error?
return a {uri: Annotation} mapping?

One okayish solution could be to add an option as_dict: bool = False to force returning a dict (second option) and raise an error if set to False and RTTM file contains multiple audio files...

yojul · 2023-07-17T09:23:38Z

Thank you for your feedback.

For more consistency with other "from" methods and the Annotation object itself, I suggest that from_rttm works as follow :

if no uri is specified, the default_uri is taken from the first line of the rttm file. Then, if there is more than one uri in the file, it raises an Exception asking to specify a uri
if uri is specified as parameter, from_rttm is only reading the lines with the specified uri.

Thus, it insures consistency with Annotation uri and rttm uri and that from_rttm is still creating a single Annotation object (as other similar methods).

I also added a condition to only read lines starting with "SPEAKER" corresponding to speech segments.

Jules SINTES added 4 commits June 30, 2023 16:53

refactor: Rewrite to_rttm/lab and write_rttm/lab as partial methods f…

732329a

…rom generic _serialize and _write methods

feat: Add method to support audacity marker file serialization

7aa2fb0

feat: Add classmethods to create annotation from audacity marker and …

e6d8634

…rttm files

fix: Handle whitespace unconsitancy when creating annotation from rtt…

1c65829

…m or audacity

hbredin mentioned this pull request Jul 16, 2023

add from_rttm method to the Annotation class #93

Closed

fix : add exception in from_rttm to handle multiple uri in rttm file

fb603b7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add audacity marker file support and creating annotation from rttm file #92

feat: Add audacity marker file support and creating annotation from rttm file #92

yojul commented Jun 30, 2023

hbredin commented Jul 16, 2023

yojul commented Jul 17, 2023

feat: Add audacity marker file support and creating annotation from rttm file #92

Are you sure you want to change the base?

feat: Add audacity marker file support and creating annotation from rttm file #92

Conversation

yojul commented Jun 30, 2023

Problem

Solution

Refactor methods for various file format support

Creating annotation from audacity or rttm

hbredin commented Jul 16, 2023

yojul commented Jul 17, 2023