Skip to content

Conversation

latkin
Copy link
Contributor

@latkin latkin commented Aug 22, 2017

Fixes #273

Per Token.scala:

 * @param text -- the text associated with the token after unicode escaping
 * @param rawText -- the text associated with the token before unicode escaping

NonASCIICharacterChecker currently looks at text (source code with escapes sequences applied), it should look at rawText (source code in raw form).

@codecov-io
Copy link

codecov-io commented Aug 22, 2017

Codecov Report

Merging #274 into master will not change coverage.
The diff coverage is 0%.

Impacted file tree graph

@@          Coverage Diff          @@
##           master   #274   +/-   ##
=====================================
  Coverage       0%     0%           
=====================================
  Files          59     59           
  Lines        1451   1451           
  Branches      142    139    -3     
=====================================
  Misses       1451   1451
Impacted Files Coverage Δ
...lastyle/scalariform/NonASCIICharacterChecker.scala 0% <0%> (ø) ⬆️
...scalastyle/scalariform/AbstractMethodChecker.scala 0% <0%> (ø) ⬆️
src/main/scala/org/scalastyle/Checker.scala 0% <0%> (ø) ⬆️
src/main/scala/org/scalastyle/Output.scala 0% <0%> (ø) ⬆️
...calastyle/scalariform/CovariantEqualsChecker.scala 0% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 36bb802...405fecc. Read the comment docs.

|// non-ascii in string via unicode escape - ok
|class OK {
| val s = "%s"
|}""".stripMargin.format("\\ud83c\\udf4e")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If somebody knows how to correctly produce the desired string in a cleaner way I'd be happy to update.

I found that using \ud83c\udf4e in triple-quote string resulted in 🍎 (i.e. escape was applied), and using \\ud83c\\udf4e resulted in \\ud83c\\udf4e (i.e. double back slashes are left in literal form). Unsure how to cleanly get desired outcome of literal \ud83c\udf4e within a triple-quote string.

@marconilanna
Copy link
Contributor

marconilanna commented Oct 10, 2017

@latkin Do you think it would be possible do add an option to allow international characters in string literals?

While I agree that both

val s = "🍎"

case "value"  println("matched")

are bad, when writing in a language other than English non-ASCII characters in string literals are needed:

val greeting = "olá"

A regex like [\p{Alnum}\p{Punct}] should probably be sufficient for most cases.

Please not that I'm suggesting it only for string literals, not for identifiers.

@matthewfarwell
Copy link
Member

Hi,

Thanks for this. If you do a squash, and rebase onto master, I'll merge this.

@latkin
Copy link
Contributor Author

latkin commented Oct 11, 2017

@marconilanna that is certainly a reasonable request, but this PR does not aim to change the rule's functionality. It only aims to correct a bug in how the rule (strict ASCII or otherwise) is applied. I will leave it to project owners whether to adjust the restrictions, add an alternative rule, etc.

@matthewfarwell will take care of that shortly, thanks

It is reasonable to enforce a rule that prevents non-ASCII text from appearing
directly in source code. However current implementation also flags use of
unicode escape sequences, which consist of only ASCII chars (e.g. \u1f34e).

NonASCIICharacterChecker should inspect Token.rawText, which represents the
literal source prior to applying unicode escapes. Token.text, which is currently
being used, already has unicode escapes applied, and thus doesn't represent the
actual content of the source code.
@latkin latkin force-pushed the latkin-fix-escaped-nonascii branch from 57413d2 to 405fecc Compare October 11, 2017 20:34
@latkin
Copy link
Contributor Author

latkin commented Oct 11, 2017

@matthewfarwell done - squashed to 1 commit and rebased to latest master

@matthewfarwell matthewfarwell merged commit 93acfba into scalastyle:master Oct 12, 2017
@matthewfarwell
Copy link
Member

Cool. Thanks!

@latkin latkin deleted the latkin-fix-escaped-nonascii branch October 12, 2017 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants