-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix match_str_case_mismatch
on uncased chars
#7865
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @xFrednet (or someone else) soon. Please see the contribution instructions for more information. |
373d4a2
to
f453047
Compare
Properly consider uncased and titlecased characters. Fixes rust-lang#7863.
f453047
to
e953dff
Compare
👍 it can resolve my problem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR! Just one small NIT where I would also like to get some feedback from a second team member, then it should be ready to be merged. You did a fantastic job with the tests, those should hopefully cover everything 👍
CaseMethod::LowerCase => |input: &str| -> bool { input.chars().all(|c| c.to_lowercase().next() == Some(c)) }, | ||
CaseMethod::AsciiLowerCase => |input: &str| -> bool { !input.chars().any(|c| c.is_ascii_uppercase()) }, | ||
CaseMethod::UpperCase => |input: &str| -> bool { input.chars().all(|c| c.to_uppercase().next() == Some(c)) }, | ||
CaseMethod::AsciiUppercase => |input: &str| -> bool { !input.chars().any(|c| c.is_ascii_lowercase()) }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that to_lowercase
/ to_uppercase
may allocate a new struct in some cases. I think we can replace this by testing for the negation of what the code was previously testing.
CaseMethod::LowerCase => |input: &str| -> bool { !input.chars().any(char::is_uppercase) },
CaseMethod::AsciiLowerCase => |input: &str| -> bool { !input.chars().any(car::is_ascii_uppercase) },
CaseMethod::UpperCase => |input: &str| -> bool { !input.chars().any(char::is_lowercase) },
CaseMethod::AsciiUppercase => |input: &str| -> bool { !input.chars().any(char::is_ascii_uppercase) },
@Manishearth What are your thoughts on this? (Since you've worked extensively with Unicode, from what I know 🙃 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is with characters like ʁ
which have no opposite-case-equivalent, i.e. 'ʁ'.is_lowercase()
is true, but 'ʁ'.to_uppercase() == 'ʁ'
. This would cause the above UpperCase case to fail, for example. char::ToUppercase
struct is 16 bytes, so this seemed better than doing input.to_uppercase() == input
to allocate a whole new whole string.
I'm also worried about any cases where, say, char.to_lowercase() != char.to_lowercase().to_lowercase()
, but I think(?) these fns are idempotent. Also paging @Manishearth because I'm sure he'll be of much help :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not allocate: I think we can use the Changes_When_Uppercased/Lowercased properties instead.
I wonder if we can add an unstable internal function to core::unicode
for this so we can continue using Rust's unicode data. To do this you'd need to modify the unicode generator tool in rustc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be nice to have 👍. Side node regarding the allocation. The documentation states:
Returns an iterator that yields the lowercase mapping of this char as one or more chars.
If this char does not have a lowercase mapping, the iterator yields the same char.
Meaning that this would only allocate if the lint should actually be triggered and then only once, if we use any(|c| c.to_lowercase().next() != c)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that makes it slightly better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, how should we continue? I think we can merge this as it is and create a new issue for the possible enhancement. Does that sound line a plan? 🙃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I don't even think using changes_when_uppercased/etc matters that much for perf so we may not even need a followup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright 👍 Thank you for the feedback on this stuff 🙃
Thank you for the changes! I hope you also had fun working on Clippy 🙃 @bors r+ |
📌 Commit e953dff has been approved by |
☀️ Test successful - checks-action_dev_test, checks-action_remark_test, checks-action_test |
False positives would result because
char::is_lowercase
and friends will returnfalse
for non-alphabetic chars and alphabetic chars lacking case (such as CJK scripts). Care also has to be taken for handling titlecase characters (Dz
) and lowercased chars with no uppercase equivalent (ʁ
).For example, when verifying lowercase:
!any(char::is_ascii_uppercase)
instead ofall(char::is_ascii_lowercase)
for ASCII.all(|c| c.to_lowercase() == c)
instead ofall(char::is_lowercase)
for non-ASCIIFixes #7863.
changelog: Fix false positives in [
match_str_case_mismatch
] on uncased characters