You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is more to this issue than meets the eye. The stringr::str_to_sentence() does 2 things:
capitalise the first word
if there are multiple sentences provided as a single string, attempts to find sentence breaks and capitalise the first word of each sentence.
The stringr implementation wraps stringi::str_trans_totitle(), which in turns uses ICU’s BreakIterator to locate specific text boundaries. As a consequence stringr::str_to_title() is not able to identify a full stop / period (".") as a sentence end and does not capitalise words following it. Thus, there is a discrepancy between behaviour of the utf8_capitalize kernel (which capitalises the first word of a string without making any attempt to break into sentences) and the behaviour of stringr::str_to_sentence().
For more extensive discussions around the stringi / stringr implementation see stringr issues 202 and 231.
Due to the complexity of this issue and the relatively niche use cases, the recommendation is to postpone implementation.
There is more to this issue than meets the eye. The
stringr::str_to_sentence()
does 2 things:capitalise the first word
if there are multiple sentences provided as a single string, attempts to find sentence breaks and capitalise the first word of each sentence.
The
stringr
implementation wrapsstringi::str_trans_totitle()
, which in turns uses ICU’s BreakIterator to locate specific text boundaries. As a consequencestringr::str_to_title()
is not able to identify a full stop / period (".") as a sentence end and does not capitalise words following it. Thus, there is a discrepancy between behaviour of theutf8_capitalize
kernel (which capitalises the first word of a string without making any attempt to break into sentences) and the behaviour ofstringr::str_to_sentence()
.For more extensive discussions around the
stringi / stringr
implementation seestringr
issues 202 and 231.Due to the complexity of this issue and the relatively niche use cases, the recommendation is to postpone implementation.
Reporter: Nicola Crane / @thisisnic
Assignee: Dragoș Moldovan-Grünfeld / @dragosmg
Related issues:
PRs and other links:
Note: This issue was originally created as ARROW-13615. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: