Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a WordSeparator which uses rust_icu #334

Open
mgeisler opened this issue May 2, 2021 · 5 comments
Open

Provide a WordSeparator which uses rust_icu #334

mgeisler opened this issue May 2, 2021 · 5 comments

Comments

@mgeisler
Copy link
Owner

mgeisler commented May 2, 2021

With #332 merged, it is now possible to customize the way text is split into words via the WordSeparator trait. From the discussion in #220, I understand that the rust_icu crate is the gold standard for breaking text into words. We should add a WordSeparator implementation which uses that library.

@mgeisler
Copy link
Owner Author

mgeisler commented May 8, 2021

Hi @tavianator and @sirwindfield, you were active in #220 where we talked about rust_icu. Would any of you be interested in adding support for it?

@mainrs
Copy link

mainrs commented May 8, 2021

I don't have time right now (sadly) due to university to contribute to open source in general :( I'd be happy to review a PR though. It shouldn't take that much time compared to implementing it :)

@tavianator
Copy link

I can give it a shot soon

@mgeisler
Copy link
Owner Author

mgeisler commented May 9, 2021

@sirwindfield I understand completely! I only asked in case you and @tavianator hadn't seen the issue yet :-)

@tavianator, that would be awesome, thanks!

There's no stress — I think I'll make a new release in the next 1-2 weeks, but we can make another any time after that with rust_icu support (the releases are almost completely automated, so it's easy to make new ones).

My goal for the next release is to make the wrap algorithm pluggable via a trait (#325). I looked at it a little today and I think I'll be able to do that this week.

@mgeisler
Copy link
Owner Author

After #438, this would either be another variant in the WordSeparator enum, or perhaps it could use the new WordSeparator::Custom variant directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants