-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add unicode case folding for char/str #9084
Comments
For reference: "Case mapping or case conversion is a process whereby strings are converted to a particular form—uppercase, lowercase, or titlecase—possibly for display to the user. Case folding is primarily used for caseless comparison of text [...] As a result, case-folded text should be used solely for internal processing and generally should not be stored or displayed to the end user." |
Char upper/lower is done in #12561 For str, what would be more appropriate - an iterator or a method with the converted string? struct LowerChars<'a> {
chars: std::str::Chars<'a>
}
fn lower<'a>(s: &'a str) -> LowerChars<'a> {
LowerChars { chars: s.chars() }
}
impl<'a> Iterator<char> for LowerChars<'a> {
fn next(&mut self) -> Option<char> {
self.chars.next().map(|c| c.to_lowercase())
}
}
#[test]
fn test_to_uppercase(){
let sl = "foobär";
let su = "FOOBÄR";
let mut z = lower(sl).zip(lower(su));
assert!(z.all(|(x, y)| x == y ));
let greek = "στιγμας".chars().map(|c| c.to_uppercase()).collect::<~str>();
// fail!(greek);
assert_eq!(greek.as_slice(), "ΣΤΙΓΜΑΣ");
} |
@pzol: The normalization functions already return Iterators, so I'd choose them for case folding too. |
Kimundi so I'd suggest pub trait StrSlice<'a> {
...
fn lower_chars<'a>(s: &'a str) -> LowerChars<'a>;
fn upper_chars<'a>(s: &'a str) -> UpperChars<'a>;
} |
Yeah, that would fit in nicely |
Now, for char I only implemented common and simple case folding. That is, one char is always one char. for THIS however, multi-codepoint mapping could be done, i.e. one codepoint can become two codepoints. However, for case insensitive comparison afaik only ICU gets it right.
So in other words for str a method like case_insensitive_compare would be appropriate. |
The |
what about this: pub trait StrSlice<'a> {
...
fn equals_ignore_case(&self, needle: &str) -> bool;
fn starts_with_ignore_case(&self, needle: &str) -> bool;
fn ends_with_ignore_case(&self, needle: &str) -> bool;
// or
fn equals_ignore_case(&self, needle: &str) -> bool;
...
// or
fn compare(&self, needle: &str, ignore_case: bool) -> bool;
...
} I like the last one best. |
Not sure... Adding a bunch of method combinations smells like a missed abstraction that could be integrated somehow. I'd add at most a |
This is also relevant http://www.w3.org/International/wiki/Case_folding If my understanding is right, then a proper folding might require providing a language context in order to be correct. My suggestion would be start with a normal case folding with this table http://www.unicode.org/Public/UCD/latest/ucd/CaseFolding.txt |
cc me |
I'm pulling a massive triage effort to get us ready for 1.0. As part of this, I'm moving stuff that's wishlist-like to the RFCs repo, as that's where major new things should get discussed/prioritized. This issue has been moved to the RFCs repo: rust-lang/rfcs#791 |
Fix false positives of needless_match closes: rust-lang#9084 made needless_match take into account arm in the form of `_ if => ...` changelog: none
No description provided.
The text was updated successfully, but these errors were encountered: