Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New lint: disallowed_script_idents #7400

Merged
merged 1 commit into from
Jun 30, 2021
Merged

Conversation

popzxc
Copy link
Contributor

@popzxc popzxc commented Jun 25, 2021

This PR implements a new lint to restrict locales that can be used in the code,
as proposed in #7376.

Current concerns / unresolved questions:

  • Mixed usage of script (as a Unicode term) and locale (as something that is easier to understand for the broad audience). I'm not sure whether these terms are fully interchangeable and whether in the current form it is more confusing than helpful. script is now used everywhere.
  • Having to mostly copy-paste AllowedScript. Probably it's not a big problem, as the list of scripts is standardized and is unlikely to change, and even if we'd stick to the unicode_script::Script, we'll still have to implement custom deserialization, and I don't think that it will be shorter in terms of the amount of LoC. unicode::Script is used together with a filtering deserialize function.
  • Should we stick to the list of "recommended scripts" from UAX #31 in the configuration?

Please write a short comment explaining your change (or "none" for internal only changes)

changelog: [`disallowed_script_idents`]

r? @Manishearth

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties label Jun 25, 2021
Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to get a second set of eyes on this, cc @camsteffen

clippy_lints/src/restricted_locales.rs Outdated Show resolved Hide resolved
clippy_lints/src/restricted_locales.rs Outdated Show resolved Hide resolved
clippy_lints/src/restricted_locales.rs Outdated Show resolved Hide resolved
impl_lint_pass!(RestrictedLocales => [RESTRICTED_LOCALES]);

impl<'tcx> LateLintPass<'tcx> for RestrictedLocales {
fn check_pat(&mut self, cx: &LateContext<'tcx>, pat: &'tcx Pat<'_>) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that lint passes are always run, can you check if there's a measurable perf difference when linting a decently sized crate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that I cannot just build the release version of clippy and run it. clippy-driver fails with the following error: dyld: Library not loaded: @rpath/librustc_driver-22f067a38d2152ee.dylib.

If I understood things correctly from looking through this repo issues, I need the whole rust toolchain for that, which is not desirable. Is there a simpler way to run clippy on a custom crate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to have the rustc-dev component installed. You're probably using a different toolchain when you build clippy vs when you run it.

Check out https://github.com/rust-lang/rust-clippy/blob/master/rust-toolchain#L3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dunno. The compiler is (and was) overridden to be the same as for clippy itself. Components are also seem to be installed.

$ rustup +nightly-2021-06-17 component add llvm-tools-preview 
info: component 'llvm-tools-preview' for target 'aarch64-apple-darwin' is up to date
$ rustup +nightly-2021-06-17 component add rust-src          
info: component 'rust-src' is up to date
$ rustup +nightly-2021-06-17 component add rustc-dev         
info: component 'rustc-dev' for target 'aarch64-apple-darwin' is up to date
$ rustup show
# ...
nightly-2021-06-17-aarch64-apple-darwin (directory override for '.../actix-web')
# ...

Yet it still fails:

$ ../rust-clippy/target/release/clippy-driver --deny clippy::needless_borrow
dyld: Library not loaded: @rpath/librustc_driver-22f067a38d2152ee.dylib
  Referenced from: /<...>/actix-web/../rust-clippy/target/release/clippy-driver
  Reason: image not found
[1]    32340 abort      ../rust-clippy/target/release/clippy-driver --deny clippy::needless_borrow

If that's important, I build clippy via simple cargo build --release (maybe I need to do it some other way?).

clippy_lints/src/utils/conf.rs Outdated Show resolved Hide resolved
clippy_lints/src/restricted_locales.rs Outdated Show resolved Hide resolved
clippy_lints/src/restricted_locales.rs Outdated Show resolved Hide resolved
@popzxc popzxc changed the title New lint: restricted_locales New lint: restricted_scripts Jun 25, 2021
@camsteffen
Copy link
Contributor

This lint is analogous to non_ascii_idents. I think we should add "See also: non_ascii_idents" to the docs. And maybe the lint name should have some similarity, like disallowed_script_idents.

@Manishearth
Copy link
Member

Yeah, I like that!

/// Parses the `unicode_script::Script` object, filtering
/// bad values and undesired scripts (such as `Common` and `Unknown`).
#[allow(clippy::too_many_lines)]
fn parse_script(script_name: &str) -> Option<Script> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like there should be Script::from_alias provided by the unicode-script crate. @Manishearth?

In any case, this should be clearly documented as expecting an alias value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if someone is willing to make the PR I'd merge it!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an existing function Script::full_name which seems to be to what the Unicode docs refer to as "alias". So would you name the new function Script::from_full_name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add it now (so the new version will be released and used here), or for now it's better to implement conversion logic in clippy and replace it in one of the future releases?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to add it to unicode-script now. Would you be interested in doing that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, no problem, will take care of that tomorrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And that code is published now

///
/// For example, `Syloti Nagri` must be represented as `"Syloti_Nagri`".
///
/// [supported_scripts]: http://www.unicode.org/standard/supported.html
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should reference this page instead and say to use the value in the "Alias" column. Then you don't need to say "replace spaces with underscores".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it looks like it'll be more complicated to implement. I can see, for example, the alias Katakana_Or_Hiragana, which represents two scripts rather than one.
In theory it could've implemented through ScriptExtensions, but I don't see a way to construct it manually, which means that we'll have to return Vec<Script>, which is less efficient.

Do you find it significant to stick to the aliases?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it looks like it'll be more complicated to implement. I can see, for example, the alias Katakana_Or_Hiragana, which represents two scripts rather than one.
In theory it could've implemented through ScriptExtensions, but I don't see a way to construct it manually, which means that we'll have to return Vec<Script>, which is less efficient.

Okay, this had me puzzled for a bit but I finally figured it out - it is in fact one script but it is no longer used in the latest version of Unicode. I think we can just ignore special cases like this. I looked at the source of the unicode-script crate and it is definitely using the "Alias" values.

Do you find it significant to stick to the aliases?

Yes, it is better to use a pre-existing naming convention.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI I learned about Katakana_Or_Hiragana being a special case on this page.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, if you're looking for handling that special case you should use the AugmentedScriptSet property from unicode-security instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a note with a link to the page about it being a special case instead. I think that it's better than adding one more dependency.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are lots of scripts that are not used in current Unicode. Katakana_Or_Hiragana is just one of them. I don't think it's anything we need to document or worry about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK then, removed 😅

clippy_lints/src/restricted_scripts.rs Outdated Show resolved Hide resolved
clippy_lints/src/restricted_scripts.rs Outdated Show resolved Hide resolved
@popzxc popzxc force-pushed the restrict-locales branch from 3b46f95 to 58bb8f2 Compare June 26, 2021 08:54
@popzxc popzxc changed the title New lint: restricted_scripts New lint: disallowed_script_idents Jun 26, 2021
@popzxc popzxc force-pushed the restrict-locales branch from d317389 to 94dd5bb Compare June 28, 2021 16:18
@popzxc
Copy link
Contributor Author

popzxc commented Jun 30, 2021

So, apart from the perfing this lint (which I have some troubles with, unfortunately, so I'll be grateful if someone else have tried it), is there anything else I should improve?

Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good, and with the global bailing-out there's no need to test for perf

(Going to wait for a second review from @camsteffen)

Copy link
Contributor

@camsteffen camsteffen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@Manishearth
Copy link
Member

@bors r+

@bors
Copy link
Contributor

bors commented Jun 30, 2021

📌 Commit e2eb453 has been approved by Manishearth

bors added a commit that referenced this pull request Jun 30, 2021
New lint: `disallowed_script_idents`

This PR implements a new lint to restrict locales that can be used in the code,
as proposed in #7376.

Current concerns / unresolved questions:

- ~~Mixed usage of `script` (as a Unicode term) and `locale` (as something that is easier to understand for the broad audience). I'm not sure whether these terms are fully interchangeable and whether in the current form it is more confusing than helpful.~~ `script` is now used everywhere.
- ~~Having to mostly copy-paste `AllowedScript`. Probably it's not a big problem, as the list of scripts is standardized and is unlikely to change, and even if we'd stick to the `unicode_script::Script`, we'll still have to implement custom deserialization, and I don't think that it will be shorter in terms of the amount of LoC.~~ `unicode::Script` is used together with a filtering deserialize function.
- Should we stick to the list of "recommended scripts" from [UAX #31](http://www.unicode.org/reports/tr31/#Table_Recommended_Scripts) in the configuration?

*Please write a short comment explaining your change (or "none" for internal only changes)*

changelog: ``[`disallowed_script_idents`]``

r? `@Manishearth`
@bors
Copy link
Contributor

bors commented Jun 30, 2021

⌛ Testing commit e2eb453 with merge e3dffc0...

@popzxc popzxc force-pushed the restrict-locales branch from e2eb453 to 018be41 Compare June 30, 2021 16:06
@popzxc
Copy link
Contributor Author

popzxc commented Jun 30, 2021

Uhm. It seems that at the same time I squashed commits and bors was r+ed, which caused it to be stuck.

I think bors retry is required now @Manishearth 😅

@camsteffen
Copy link
Contributor

@bors r=Manishearth

@bors
Copy link
Contributor

bors commented Jun 30, 2021

📌 Commit 018be41 has been approved by Manishearth

@bors
Copy link
Contributor

bors commented Jun 30, 2021

⌛ Testing commit 018be41 with merge cadb93b...

@bors
Copy link
Contributor

bors commented Jun 30, 2021

☀️ Test successful - checks-action_dev_test, checks-action_remark_test, checks-action_test
Approved by: Manishearth
Pushing cadb93b to master...

@bors bors merged commit cadb93b into rust-lang:master Jun 30, 2021
@popzxc popzxc deleted the restrict-locales branch June 30, 2021 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants