Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] String title case kernel #28457

Closed
asfimport opened this issue May 10, 2021 · 4 comments
Closed

[C++] String title case kernel #28457

asfimport opened this issue May 10, 2021 · 4 comments

Comments

@asfimport
Copy link
Collaborator

asfimport commented May 10, 2021

Capitalizes the first character of each word in the string, like SQL initcap or Python str.title()

Reporter: Ian Cook / @ianmcook
Assignee: Eduardo Ponce / @edponce
Watchers: Rok Mihevc / @rok

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-12714. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Joris Van den Bossche / @jorisvandenbossche:
In Python, in addition to title() there is also capitalize(), which only capitalizes the first character of the string (and not every word in the string).

@asfimport
Copy link
Collaborator Author

Ian Cook / @ianmcook:
Thanks Joris; I created ARROW-12714 for capitalize()

@asfimport
Copy link
Collaborator Author

Antoine Pitrou / @pitrou:
Issue resolved by pull request 10869
#10869

@asfimport
Copy link
Collaborator Author

Eduardo Ponce / @edponce:
Adding the following notes for reference purposes.

Converting words into a title form or capitalization is not a trivial task because of the complexities of natural language (acronyms, always capitalized words, special cases, non-alpha symbols, etc.). Also, there are no standard rules across libraries and different behaviors can be observed for certain inputs. In this PR, we chose to match the rules of Python's title() for its simplicity. Nevertheless, we note that its behavior differs from that of R's stringr library (str_to_title) when a word begins with numbers. This was detected in ARROW-13853.

# R stringr
> str_to_title("1Foo1") # "1foo1"
# Python
>>> "1Foo1".title() # "1Foo1"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant