feat: Add audacity marker file support and creating annotation from rttm file #92
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
When evaluating speaker diarization pipelines, one might want to use Annotation objects and creating annotation from rttm (or other format) as well as serializing/writing annotation to rttm (or other format).
Audacity marker track feature is a very convenient (and free) way to create ground truth segmentation for speaker diarization. The format is a .txt file very similar to already implemented LAB file support but tab separated.
Solution
Refactor methods for various file format support
_serialize
method to replace multipleto_{format}
methods._write
method method to replace multiplewrite_{format}
methods.to_<format>
andwrite_<format>
are now partial methods from generic methods.Currently supported formats are :
annotation.to_rttm()
andannotation.write_rttm(file)
.annotation.to_audacity()
andannotation.write_audacity(file)
.annotation.to_lab()
andannotation.write_lab(file)
.Therefore, to add a new format one only need to implement
_iter_{format}
methods similarly to_iter_rttm
or_iter_lab
.Creating annotation from audacity or rttm
Similarly to the
from_df
class methods, I createdfrom_audacity
andfrom_rttm
class methods to create easily annotations from those file formats.Usage :