Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cleaners always return [everyvoice.utils.lower, everyvoice.utils.collapse_whitespace, everyvoice.utils.nfc_normalize] regardless of user selection #321

Closed
MENGZHEGENG opened this issue Mar 8, 2024 · 3 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@MENGZHEGENG
Copy link
Collaborator

Regardless of user's selection in wizard,
thecleaners in everyvoice-shared-text.yaml always return
[everyvoice.utils.lower, everyvoice.utils.collapse_whitespace, everyvoice.utils.nfc_normalize]

@MENGZHEGENG
Copy link
Collaborator Author

This is particularly a problem for SENĆOŦEN, as F is part of it ipa representation, and automatically applying everyvoice.utils.lower will lead to issues.

@MENGZHEGENG MENGZHEGENG added the bug Something isn't working label Mar 15, 2024
@MENGZHEGENG MENGZHEGENG added this to the beta milestone Mar 15, 2024
@roedoejet
Copy link
Member

@marctessier recently found that this error caused training to fail entirely (@marctessier could you paste the error that you get here?). It would be good if we could catch these types of errors and return a helpful error message.

@marctessier
Copy link
Collaborator

marctessier commented Apr 19, 2024

This cause training STR to fail with the attached message when running EV stock. Also a core dump file is created.

BUT like mentioned If we remove everyvoice.utils.lower from the list of cleaners before running preprocessing . Training will work :-) !

ex: everyvoice-shared-text.yaml:cleaners: [everyvoice.utils.collapse_whitespace, everyvoice.utils.nfc_normalize]

STR_STOCK.e2205168.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants