Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] Bindings for stringr::str_to_sentence #29256

Closed
Tracked by #30393
asfimport opened this issue Aug 12, 2021 · 0 comments
Closed
Tracked by #30393

[R] Bindings for stringr::str_to_sentence #29256

asfimport opened this issue Aug 12, 2021 · 0 comments

Comments

@asfimport
Copy link
Collaborator

asfimport commented Aug 12, 2021

There is more to this issue than meets the eye. The stringr::str_to_sentence() does 2 things:

  • capitalise the first word

  • if there are multiple sentences provided as a single string, attempts to find sentence breaks and capitalise the first word of each sentence.

    The stringr implementation wraps stringi::str_trans_totitle(), which in turns uses ICU’s BreakIterator to locate specific text boundaries. As a consequence stringr::str_to_title() is not able to identify a full stop / period (".") as a sentence end and does not capitalise words following it. Thus, there is a discrepancy between behaviour of the utf8_capitalize kernel (which capitalises the first word of a string without making any attempt to break into sentences) and the behaviour of stringr::str_to_sentence().

    For more extensive discussions around the stringi / stringr implementation see stringr issues 202 and 231.

    Due to the complexity of this issue and the relatively niche use cases, the recommendation is to postpone implementation.

Reporter: Nicola Crane / @thisisnic
Assignee: Dragoș Moldovan-Grünfeld / @dragosmg

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-13615. Please see the migration documentation for further details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant