-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.ctm in data simulator annotator compliant with RT-09 specification #8004
Conversation
Signed-off-by: popcornell <cornellsamuele@gmail.com>
NeMo contributors, please do not merge this PR until I make sure all the CTM in NeMo is following the official CTM format. |
Signed-off-by: popcornell <cornellsamuele@gmail.com>
Signed-off-by: popcornell <cornellsamuele@gmail.com>
Seems that also this is not compliant (speaker id instead of channel): This is instead kinda compliant (but lacks for missing fields):
|
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
for more information, see https://pre-commit.ci
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
for more information, see https://pre-commit.ci
Signed-off-by: Taejin Park <tango4j@gmail.com>
for more information, see https://pre-commit.ci
@erastorgueva-nv Elena, I have found a line that renders CTM and I replaced with the @stevehuang52 We are also making slight changes to data simulator. Please review and approve. |
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
if type(beg_time) != float: | ||
beg_time = round(float(beg_time), output_precision) | ||
if type(duration) != float: | ||
duration = round(float(duration), output_precision) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, beg_time
and duration
do not get rounded if they are floats already. Please remove the if-statements, I don't think they are necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to always round the number. Also checking whether beg_time is either float or string containing floating point number.
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving this, since this PR went through several rounds of reviews and feedbacks.
jenkins |
@tango4j / reviewers, merge when ready. Also reminder, NeMo devs need to explicitly write "jenkins" in order to execute the CI |
Oh, when did it change the protocol? |
jenkins |
jenkins |
Signed-off-by: Taejin Park <tango4j@gmail.com>
jenkins |
Signed-off-by: Taejin Park <tango4j@gmail.com>
jenkins |
jenkins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving changes
…VIDIA#8004) * .ctm fix for data simulation Signed-off-by: popcornell <cornellsamuele@gmail.com> * .ctm fix, channel should be 1 not 0 Signed-off-by: popcornell <cornellsamuele@gmail.com> * .ctm fix, only two na, type and confidence Signed-off-by: popcornell <cornellsamuele@gmail.com> * Revised all the parts in NeMo touching CTM files Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updated tutorial, nemo-docs and tests for CTM formats Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed the docstrings in create_alignment_manifest.py Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Some missing refactored variables for type_of_token Signed-off-by: Taejin Park <tango4j@gmail.com> * Another un-fixed part in data_simulation_utils.py Signed-off-by: Taejin Park <tango4j@gmail.com> * Reflected comments from PR Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Reflected another precision related comments from PR Signed-off-by: Taejin Park <tango4j@gmail.com> * Updated tests to use decimal rounding of 2 Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Changed beg_time to start_time and fixed unit tests Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed typos and errors in manifest_utils.py Signed-off-by: Taejin Park <tango4j@gmail.com> * Resolved another merge conflict Signed-off-by: Taejin Park <tango4j@gmail.com> * Fixed the test errors Signed-off-by: Taejin Park <tango4j@gmail.com> * Fixed the missed commented lines Signed-off-by: Taejin Park <tango4j@gmail.com> --------- Signed-off-by: popcornell <cornellsamuele@gmail.com> Signed-off-by: Taejin Park <tango4j@gmail.com> Co-authored-by: Taejin Park <tango4j@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
…VIDIA#8004) * .ctm fix for data simulation Signed-off-by: popcornell <cornellsamuele@gmail.com> * .ctm fix, channel should be 1 not 0 Signed-off-by: popcornell <cornellsamuele@gmail.com> * .ctm fix, only two na, type and confidence Signed-off-by: popcornell <cornellsamuele@gmail.com> * Revised all the parts in NeMo touching CTM files Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updated tutorial, nemo-docs and tests for CTM formats Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed the docstrings in create_alignment_manifest.py Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Some missing refactored variables for type_of_token Signed-off-by: Taejin Park <tango4j@gmail.com> * Another un-fixed part in data_simulation_utils.py Signed-off-by: Taejin Park <tango4j@gmail.com> * Reflected comments from PR Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Reflected another precision related comments from PR Signed-off-by: Taejin Park <tango4j@gmail.com> * Updated tests to use decimal rounding of 2 Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Changed beg_time to start_time and fixed unit tests Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed typos and errors in manifest_utils.py Signed-off-by: Taejin Park <tango4j@gmail.com> * Resolved another merge conflict Signed-off-by: Taejin Park <tango4j@gmail.com> * Fixed the test errors Signed-off-by: Taejin Park <tango4j@gmail.com> * Fixed the missed commented lines Signed-off-by: Taejin Park <tango4j@gmail.com> --------- Signed-off-by: popcornell <cornellsamuele@gmail.com> Signed-off-by: Taejin Park <tango4j@gmail.com> Co-authored-by: Taejin Park <tango4j@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Signed-off-by: Sasha Meister <ameister@nvidia.com>
…VIDIA#8004) * .ctm fix for data simulation Signed-off-by: popcornell <cornellsamuele@gmail.com> * .ctm fix, channel should be 1 not 0 Signed-off-by: popcornell <cornellsamuele@gmail.com> * .ctm fix, only two na, type and confidence Signed-off-by: popcornell <cornellsamuele@gmail.com> * Revised all the parts in NeMo touching CTM files Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updated tutorial, nemo-docs and tests for CTM formats Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed the docstrings in create_alignment_manifest.py Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Some missing refactored variables for type_of_token Signed-off-by: Taejin Park <tango4j@gmail.com> * Another un-fixed part in data_simulation_utils.py Signed-off-by: Taejin Park <tango4j@gmail.com> * Reflected comments from PR Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Reflected another precision related comments from PR Signed-off-by: Taejin Park <tango4j@gmail.com> * Updated tests to use decimal rounding of 2 Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Changed beg_time to start_time and fixed unit tests Signed-off-by: Taejin Park <tango4j@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed typos and errors in manifest_utils.py Signed-off-by: Taejin Park <tango4j@gmail.com> * Resolved another merge conflict Signed-off-by: Taejin Park <tango4j@gmail.com> * Fixed the test errors Signed-off-by: Taejin Park <tango4j@gmail.com> * Fixed the missed commented lines Signed-off-by: Taejin Park <tango4j@gmail.com> --------- Signed-off-by: popcornell <cornellsamuele@gmail.com> Signed-off-by: Taejin Park <tango4j@gmail.com> Co-authored-by: Taejin Park <tango4j@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Redid this PR from #7999
An attempt to fix #7445 so that the data simulator .ctm are compliant with RT-09 specification (see https://web.archive.org/web/20170119114252/http://www.itl.nist.gov/iad/mig/tests/rt/2009/docs/rt09-meeting-eval-plan-v2.pdf):
I have put for fields unknown e.g. .
This makes it also easy to use the generated sessions with https://github.com/lhotse-speech/lhotse