-
Notifications
You must be signed in to change notification settings - Fork 217
NonASCIICharacterChecker should inspect Token.rawText, not Token.text #274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NonASCIICharacterChecker should inspect Token.rawText, not Token.text #274
Conversation
Codecov Report
@@ Coverage Diff @@
## master #274 +/- ##
=====================================
Coverage 0% 0%
=====================================
Files 59 59
Lines 1451 1451
Branches 142 139 -3
=====================================
Misses 1451 1451
Continue to review full report at Codecov.
|
|// non-ascii in string via unicode escape - ok | ||
|class OK { | ||
| val s = "%s" | ||
|}""".stripMargin.format("\\ud83c\\udf4e") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If somebody knows how to correctly produce the desired string in a cleaner way I'd be happy to update.
I found that using \ud83c\udf4e
in triple-quote string resulted in 🍎
(i.e. escape was applied), and using \\ud83c\\udf4e
resulted in \\ud83c\\udf4e
(i.e. double back slashes are left in literal form). Unsure how to cleanly get desired outcome of literal \ud83c\udf4e
within a triple-quote string.
@latkin Do you think it would be possible do add an option to allow international characters in string literals? While I agree that both val s = "🍎"
case "value" ⇒ println("matched") are bad, when writing in a language other than English non-ASCII characters in string literals are needed: val greeting = "olá" A regex like Please not that I'm suggesting it only for string literals, not for identifiers. |
Hi, Thanks for this. If you do a squash, and rebase onto master, I'll merge this. |
@marconilanna that is certainly a reasonable request, but this PR does not aim to change the rule's functionality. It only aims to correct a bug in how the rule (strict ASCII or otherwise) is applied. I will leave it to project owners whether to adjust the restrictions, add an alternative rule, etc. @matthewfarwell will take care of that shortly, thanks |
It is reasonable to enforce a rule that prevents non-ASCII text from appearing directly in source code. However current implementation also flags use of unicode escape sequences, which consist of only ASCII chars (e.g. \u1f34e). NonASCIICharacterChecker should inspect Token.rawText, which represents the literal source prior to applying unicode escapes. Token.text, which is currently being used, already has unicode escapes applied, and thus doesn't represent the actual content of the source code.
57413d2
to
405fecc
Compare
@matthewfarwell done - squashed to 1 commit and rebased to latest master |
Cool. Thanks! |
Fixes #273
Per Token.scala:
NonASCIICharacterChecker currently looks at
text
(source code with escapes sequences applied), it should look atrawText
(source code in raw form).