Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix match_str_case_mismatch on uncased chars #7865

Merged
merged 1 commit into from
Oct 25, 2021

Conversation

Herschel
Copy link
Contributor

@Herschel Herschel commented Oct 23, 2021

False positives would result because char::is_lowercase and friends will return false for non-alphabetic chars and alphabetic chars lacking case (such as CJK scripts). Care also has to be taken for handling titlecase characters (Dz) and lowercased chars with no uppercase equivalent (ʁ).

For example, when verifying lowercase:

  • Check !any(char::is_ascii_uppercase) instead of all(char::is_ascii_lowercase) for ASCII.
  • Check that all(|c| c.to_lowercase() == c) instead of all(char::is_lowercase) for non-ASCII

Fixes #7863.

changelog: Fix false positives in [match_str_case_mismatch] on uncased characters

@rust-highfive
Copy link

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @xFrednet (or someone else) soon.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties label Oct 23, 2021
Properly consider uncased and titlecased characters.
Fixes rust-lang#7863.
@bachue
Copy link

bachue commented Oct 24, 2021

👍 it can resolve my problem

Copy link
Member

@xFrednet xFrednet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! Just one small NIT where I would also like to get some feedback from a second team member, then it should be ready to be merged. You did a fantastic job with the tests, those should hopefully cover everything 👍

Comment on lines +130 to +133
CaseMethod::LowerCase => |input: &str| -> bool { input.chars().all(|c| c.to_lowercase().next() == Some(c)) },
CaseMethod::AsciiLowerCase => |input: &str| -> bool { !input.chars().any(|c| c.is_ascii_uppercase()) },
CaseMethod::UpperCase => |input: &str| -> bool { input.chars().all(|c| c.to_uppercase().next() == Some(c)) },
CaseMethod::AsciiUppercase => |input: &str| -> bool { !input.chars().any(|c| c.is_ascii_lowercase()) },
Copy link
Member

@xFrednet xFrednet Oct 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that to_lowercase / to_uppercase may allocate a new struct in some cases. I think we can replace this by testing for the negation of what the code was previously testing.

        CaseMethod::LowerCase => |input: &str| -> bool { !input.chars().any(char::is_uppercase) },
        CaseMethod::AsciiLowerCase => |input: &str| -> bool { !input.chars().any(car::is_ascii_uppercase) },
        CaseMethod::UpperCase => |input: &str| -> bool { !input.chars().any(char::is_lowercase) },
        CaseMethod::AsciiUppercase => |input: &str| -> bool { !input.chars().any(char::is_ascii_uppercase) },

@Manishearth What are your thoughts on this? (Since you've worked extensively with Unicode, from what I know 🙃 )

Copy link
Contributor Author

@Herschel Herschel Oct 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is with characters like ʁ which have no opposite-case-equivalent, i.e. 'ʁ'.is_lowercase() is true, but 'ʁ'.to_uppercase() == 'ʁ'. This would cause the above UpperCase case to fail, for example. char::ToUppercase struct is 16 bytes, so this seemed better than doing input.to_uppercase() == input to allocate a whole new whole string.

I'm also worried about any cases where, say, char.to_lowercase() != char.to_lowercase().to_lowercase(), but I think(?) these fns are idempotent. Also paging @Manishearth because I'm sure he'll be of much help :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not allocate: I think we can use the Changes_When_Uppercased/Lowercased properties instead.

I wonder if we can add an unstable internal function to core::unicode for this so we can continue using Rust's unicode data. To do this you'd need to modify the unicode generator tool in rustc

https://github.com/rust-lang/rust/blob/6b0b41729939c3f7520e9ed86b36fba2524c7970/src/tools/unicode-table-generator/src/main.rs#L129

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be nice to have 👍. Side node regarding the allocation. The documentation states:

Returns an iterator that yields the lowercase mapping of this char as one or more chars.

If this char does not have a lowercase mapping, the iterator yields the same char.

Meaning that this would only allocate if the lint should actually be triggered and then only once, if we use any(|c| c.to_lowercase().next() != c)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that makes it slightly better

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, how should we continue? I think we can merge this as it is and create a new issue for the possible enhancement. Does that sound line a plan? 🙃

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I don't even think using changes_when_uppercased/etc matters that much for perf so we may not even need a followup

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright 👍 Thank you for the feedback on this stuff 🙃

@xFrednet
Copy link
Member

Thank you for the changes! I hope you also had fun working on Clippy 🙃

@bors r+

@bors
Copy link
Contributor

bors commented Oct 25, 2021

📌 Commit e953dff has been approved by xFrednet

@bors
Copy link
Contributor

bors commented Oct 25, 2021

⌛ Testing commit e953dff with merge cb0132d...

@bors
Copy link
Contributor

bors commented Oct 25, 2021

☀️ Test successful - checks-action_dev_test, checks-action_remark_test, checks-action_test
Approved by: xFrednet
Pushing cb0132d to master...

@bors bors merged commit cb0132d into rust-lang:master Oct 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties
Projects
None yet
Development

Successfully merging this pull request may close these issues.

False positive with numerals in match_str_case_mismatch
6 participants