Fix `match_str_case_mismatch` on uncased chars #7865

Herschel · 2021-10-23T07:06:03Z

False positives would result because char::is_lowercase and friends will return false for non-alphabetic chars and alphabetic chars lacking case (such as CJK scripts). Care also has to be taken for handling titlecase characters (ǲ) and lowercased chars with no uppercase equivalent (ʁ).

For example, when verifying lowercase:

Check !any(char::is_ascii_uppercase) instead of all(char::is_ascii_lowercase) for ASCII.
Check that all(|c| c.to_lowercase() == c) instead of all(char::is_lowercase) for non-ASCII

Fixes #7863.

changelog: Fix false positives in [match_str_case_mismatch] on uncased characters

rust-highfive · 2021-10-23T07:06:06Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @xFrednet (or someone else) soon.

Please see the contribution instructions for more information.

Properly consider uncased and titlecased characters. Fixes rust-lang#7863.

bachue · 2021-10-24T04:26:01Z

👍 it can resolve my problem

xFrednet

Thank you for the PR! Just one small NIT where I would also like to get some feedback from a second team member, then it should be ready to be merged. You did a fantastic job with the tests, those should hopefully cover everything 👍

xFrednet · 2021-10-24T13:37:26Z

clippy_lints/src/match_str_case_mismatch.rs

+        CaseMethod::LowerCase => |input: &str| -> bool { input.chars().all(|c| c.to_lowercase().next() == Some(c)) },
+        CaseMethod::AsciiLowerCase => |input: &str| -> bool { !input.chars().any(|c| c.is_ascii_uppercase()) },
+        CaseMethod::UpperCase => |input: &str| -> bool { input.chars().all(|c| c.to_uppercase().next() == Some(c)) },
+        CaseMethod::AsciiUppercase => |input: &str| -> bool { !input.chars().any(|c| c.is_ascii_lowercase()) },


My understanding is that to_lowercase / to_uppercase may allocate a new struct in some cases. I think we can replace this by testing for the negation of what the code was previously testing.

CaseMethod::LowerCase => |input: &str| -> bool { !input.chars().any(char::is_uppercase) }, CaseMethod::AsciiLowerCase => |input: &str| -> bool { !input.chars().any(car::is_ascii_uppercase) }, CaseMethod::UpperCase => |input: &str| -> bool { !input.chars().any(char::is_lowercase) }, CaseMethod::AsciiUppercase => |input: &str| -> bool { !input.chars().any(char::is_ascii_uppercase) },

@Manishearth What are your thoughts on this? (Since you've worked extensively with Unicode, from what I know 🙃 )

The issue is with characters like ʁ which have no opposite-case-equivalent, i.e. 'ʁ'.is_lowercase() is true, but 'ʁ'.to_uppercase() == 'ʁ'. This would cause the above UpperCase case to fail, for example. char::ToUppercase struct is 16 bytes, so this seemed better than doing input.to_uppercase() == input to allocate a whole new whole string.

I'm also worried about any cases where, say, char.to_lowercase() != char.to_lowercase().to_lowercase(), but I think(?) these fns are idempotent. Also paging @Manishearth because I'm sure he'll be of much help :-)

I'd rather not allocate: I think we can use the Changes_When_Uppercased/Lowercased properties instead.

I wonder if we can add an unstable internal function to core::unicode for this so we can continue using Rust's unicode data. To do this you'd need to modify the unicode generator tool in rustc

https://github.com/rust-lang/rust/blob/6b0b41729939c3f7520e9ed86b36fba2524c7970/src/tools/unicode-table-generator/src/main.rs#L129

That would be nice to have 👍. Side node regarding the allocation. The documentation states:

Returns an iterator that yields the lowercase mapping of this char as one or more chars.

If this char does not have a lowercase mapping, the iterator yields the same char.

Meaning that this would only allocate if the lint should actually be triggered and then only once, if we use any(|c| c.to_lowercase().next() != c)

Ah, that makes it slightly better

So, how should we continue? I think we can merge this as it is and create a new issue for the possible enhancement. Does that sound line a plan? 🙃

Yes. I don't even think using changes_when_uppercased/etc matters that much for perf so we may not even need a followup

Alright 👍 Thank you for the feedback on this stuff 🙃

xFrednet · 2021-10-25T21:34:06Z

Thank you for the changes! I hope you also had fun working on Clippy 🙃

@bors r+

bors · 2021-10-25T21:34:08Z

📌 Commit e953dff has been approved by xFrednet

bors · 2021-10-25T21:34:14Z

⌛ Testing commit e953dff with merge cb0132d...

bors · 2021-10-25T21:46:18Z

☀️ Test successful - checks-action_dev_test, checks-action_remark_test, checks-action_test
Approved by: xFrednet
Pushing cb0132d to master...

rust-highfive assigned xFrednet Oct 23, 2021

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties label Oct 23, 2021

Herschel force-pushed the fix-match-str-case-mismatch branch from 373d4a2 to f453047 Compare October 23, 2021 07:23

Fix match_str_case_mismatch false positives

e953dff

Properly consider uncased and titlecased characters. Fixes rust-lang#7863.

Herschel force-pushed the fix-match-str-case-mismatch branch from f453047 to e953dff Compare October 23, 2021 09:05

xFrednet reviewed Oct 24, 2021

View reviewed changes

xFrednet approved these changes Oct 25, 2021

View reviewed changes

bors merged commit cb0132d into rust-lang:master Oct 25, 2021

xFrednet mentioned this pull request Oct 26, 2021

False Positive from match_str_case_mismatch #7882

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `match_str_case_mismatch` on uncased chars #7865

Fix `match_str_case_mismatch` on uncased chars #7865

Herschel commented Oct 23, 2021 •

edited

Loading

rust-highfive commented Oct 23, 2021

bachue commented Oct 24, 2021

xFrednet left a comment

xFrednet Oct 24, 2021 •

edited

Loading

Herschel Oct 24, 2021 •

edited

Loading

Manishearth Oct 25, 2021

xFrednet Oct 25, 2021

Manishearth Oct 25, 2021

xFrednet Oct 25, 2021

Manishearth Oct 25, 2021

xFrednet Oct 25, 2021

xFrednet commented Oct 25, 2021

bors commented Oct 25, 2021

bors commented Oct 25, 2021

bors commented Oct 25, 2021

Fix match_str_case_mismatch on uncased chars #7865

Fix match_str_case_mismatch on uncased chars #7865

Conversation

Herschel commented Oct 23, 2021 • edited Loading

rust-highfive commented Oct 23, 2021

bachue commented Oct 24, 2021

xFrednet left a comment

Choose a reason for hiding this comment

xFrednet Oct 24, 2021 • edited Loading

Choose a reason for hiding this comment

Herschel Oct 24, 2021 • edited Loading

Choose a reason for hiding this comment

Manishearth Oct 25, 2021

Choose a reason for hiding this comment

xFrednet Oct 25, 2021

Choose a reason for hiding this comment

Manishearth Oct 25, 2021

Choose a reason for hiding this comment

xFrednet Oct 25, 2021

Choose a reason for hiding this comment

Manishearth Oct 25, 2021

Choose a reason for hiding this comment

xFrednet Oct 25, 2021

Choose a reason for hiding this comment

xFrednet commented Oct 25, 2021

bors commented Oct 25, 2021

bors commented Oct 25, 2021

bors commented Oct 25, 2021

Fix `match_str_case_mismatch` on uncased chars #7865

Fix `match_str_case_mismatch` on uncased chars #7865

Herschel commented Oct 23, 2021 •

edited

Loading

xFrednet Oct 24, 2021 •

edited

Loading

Herschel Oct 24, 2021 •

edited

Loading