-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New lint: disallowed_script_idents
#7400
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to get a second set of eyes on this, cc @camsteffen
impl_lint_pass!(RestrictedLocales => [RESTRICTED_LOCALES]); | ||
|
||
impl<'tcx> LateLintPass<'tcx> for RestrictedLocales { | ||
fn check_pat(&mut self, cx: &LateContext<'tcx>, pat: &'tcx Pat<'_>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that lint passes are always run, can you check if there's a measurable perf difference when linting a decently sized crate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that I cannot just build the release version of clippy and run it. clippy-driver
fails with the following error: dyld: Library not loaded: @rpath/librustc_driver-22f067a38d2152ee.dylib
.
If I understood things correctly from looking through this repo issues, I need the whole rust toolchain for that, which is not desirable. Is there a simpler way to run clippy on a custom crate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to have the rustc-dev component installed. You're probably using a different toolchain when you build clippy vs when you run it.
Check out https://github.com/rust-lang/rust-clippy/blob/master/rust-toolchain#L3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dunno. The compiler is (and was) overridden to be the same as for clippy itself. Components are also seem to be installed.
$ rustup +nightly-2021-06-17 component add llvm-tools-preview
info: component 'llvm-tools-preview' for target 'aarch64-apple-darwin' is up to date
$ rustup +nightly-2021-06-17 component add rust-src
info: component 'rust-src' is up to date
$ rustup +nightly-2021-06-17 component add rustc-dev
info: component 'rustc-dev' for target 'aarch64-apple-darwin' is up to date
$ rustup show
# ...
nightly-2021-06-17-aarch64-apple-darwin (directory override for '.../actix-web')
# ...
Yet it still fails:
$ ../rust-clippy/target/release/clippy-driver --deny clippy::needless_borrow
dyld: Library not loaded: @rpath/librustc_driver-22f067a38d2152ee.dylib
Referenced from: /<...>/actix-web/../rust-clippy/target/release/clippy-driver
Reason: image not found
[1] 32340 abort ../rust-clippy/target/release/clippy-driver --deny clippy::needless_borrow
If that's important, I build clippy via simple cargo build --release
(maybe I need to do it some other way?).
This lint is analogous to |
Yeah, I like that! |
/// Parses the `unicode_script::Script` object, filtering | ||
/// bad values and undesired scripts (such as `Common` and `Unknown`). | ||
#[allow(clippy::too_many_lines)] | ||
fn parse_script(script_name: &str) -> Option<Script> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like there should be Script::from_alias
provided by the unicode-script
crate. @Manishearth?
In any case, this should be clearly documented as expecting an alias value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if someone is willing to make the PR I'd merge it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an existing function Script::full_name
which seems to be to what the Unicode docs refer to as "alias". So would you name the new function Script::from_full_name
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to add it now (so the new version will be released and used here), or for now it's better to implement conversion logic in clippy
and replace it in one of the future releases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to add it to unicode-script
now. Would you be interested in doing that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, no problem, will take care of that tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And that code is published now
/// | ||
/// For example, `Syloti Nagri` must be represented as `"Syloti_Nagri`". | ||
/// | ||
/// [supported_scripts]: http://www.unicode.org/standard/supported.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should reference this page instead and say to use the value in the "Alias" column. Then you don't need to say "replace spaces with underscores".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it looks like it'll be more complicated to implement. I can see, for example, the alias Katakana_Or_Hiragana
, which represents two scripts rather than one.
In theory it could've implemented through ScriptExtension
s, but I don't see a way to construct it manually, which means that we'll have to return Vec<Script>
, which is less efficient.
Do you find it significant to stick to the aliases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it looks like it'll be more complicated to implement. I can see, for example, the alias
Katakana_Or_Hiragana
, which represents two scripts rather than one.
In theory it could've implemented throughScriptExtension
s, but I don't see a way to construct it manually, which means that we'll have to returnVec<Script>
, which is less efficient.
Okay, this had me puzzled for a bit but I finally figured it out - it is in fact one script but it is no longer used in the latest version of Unicode. I think we can just ignore special cases like this. I looked at the source of the unicode-script
crate and it is definitely using the "Alias" values.
Do you find it significant to stick to the aliases?
Yes, it is better to use a pre-existing naming convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI I learned about Katakana_Or_Hiragana being a special case on this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, if you're looking for handling that special case you should use the AugmentedScriptSet property from unicode-security instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a note with a link to the page about it being a special case instead. I think that it's better than adding one more dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are lots of scripts that are not used in current Unicode. Katakana_Or_Hiragana is just one of them. I don't think it's anything we need to document or worry about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK then, removed 😅
restricted_scripts
disallowed_script_idents
So, apart from the perfing this lint (which I have some troubles with, unfortunately, so I'll be grateful if someone else have tried it), is there anything else I should improve? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good, and with the global bailing-out there's no need to test for perf
(Going to wait for a second review from @camsteffen)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
@bors r+ |
📌 Commit e2eb453 has been approved by |
New lint: `disallowed_script_idents` This PR implements a new lint to restrict locales that can be used in the code, as proposed in #7376. Current concerns / unresolved questions: - ~~Mixed usage of `script` (as a Unicode term) and `locale` (as something that is easier to understand for the broad audience). I'm not sure whether these terms are fully interchangeable and whether in the current form it is more confusing than helpful.~~ `script` is now used everywhere. - ~~Having to mostly copy-paste `AllowedScript`. Probably it's not a big problem, as the list of scripts is standardized and is unlikely to change, and even if we'd stick to the `unicode_script::Script`, we'll still have to implement custom deserialization, and I don't think that it will be shorter in terms of the amount of LoC.~~ `unicode::Script` is used together with a filtering deserialize function. - Should we stick to the list of "recommended scripts" from [UAX #31](http://www.unicode.org/reports/tr31/#Table_Recommended_Scripts) in the configuration? *Please write a short comment explaining your change (or "none" for internal only changes)* changelog: ``[`disallowed_script_idents`]`` r? `@Manishearth`
Uhm. It seems that at the same time I squashed commits and bors was r+ed, which caused it to be stuck. I think |
@bors r=Manishearth |
📌 Commit 018be41 has been approved by |
☀️ Test successful - checks-action_dev_test, checks-action_remark_test, checks-action_test |
This PR implements a new lint to restrict locales that can be used in the code,
as proposed in #7376.
Current concerns / unresolved questions:
Mixed usage ofscript
(as a Unicode term) andlocale
(as something that is easier to understand for the broad audience). I'm not sure whether these terms are fully interchangeable and whether in the current form it is more confusing than helpful.script
is now used everywhere.Having to mostly copy-pasteAllowedScript
. Probably it's not a big problem, as the list of scripts is standardized and is unlikely to change, and even if we'd stick to theunicode_script::Script
, we'll still have to implement custom deserialization, and I don't think that it will be shorter in terms of the amount of LoC.unicode::Script
is used together with a filtering deserialize function.Please write a short comment explaining your change (or "none" for internal only changes)
changelog:
[`disallowed_script_idents`]
r? @Manishearth