-
Notifications
You must be signed in to change notification settings - Fork 1.8k
wip: advanced preprocessing [english] #3927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@longcw one thing: not sure if we need to enable some of these things by default |
|
|
||
| # Preserve years | ||
| if 1900 <= num <= 2099: | ||
| return str(num_str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we need split it: 2020 -> 20 20
| Format dollar amounts for TTS: | ||
| - $5 -> "five dollars" | ||
| - $12.50 -> "twelve dollars and fifty cents" | ||
| - $0.023 -> "zero point zero two three dollars" (speaks out each decimal digit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thinking - maybe should we go deeper and convert it to - "zero pint zero twenty three dollars"?
| async def format_phone_numbers(text: AsyncIterable[str]) -> AsyncIterable[str]: | ||
| """ | ||
| Format phone numbers for TTS: | ||
| - 555-123-4567 -> "5 5 5 1 2 3 4 5 6 7" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is only for US phone numbers. should we cover +1 phone numbers also?
| from typing import Literal, Optional, Union | ||
|
|
||
| # Number to word mappings for TTS preprocessing | ||
| ONES = ["", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we move these to a separate file, also the markdown and emoji filters can move to files since we are going to have more transforms.
Additional TTS preprocessing for: