Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn about non-printable characters in string literals #63682

Open
matklad opened this issue Aug 18, 2019 · 9 comments
Open

Warn about non-printable characters in string literals #63682

matklad opened this issue Aug 18, 2019 · 9 comments
Labels
A-diagnostics Area: Messages for errors, warnings, and lints A-lints Area: Lints (warnings about flaws in source code) such as unused_mut. A-parser Area: The parsing of Rust source code to an AST C-enhancement Category: An issue proposing an enhancement or a PR with one. E-medium Call for participation: Medium difficulty. Experience needed to fix: Intermediate. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@matklad
Copy link
Member

matklad commented Aug 18, 2019

Currently a string literal with control characters like \0 or \v is accepted without any warnings. The only exception is \r, which gives a hard error.

It makes more sense to treat all non-printable, non \t, \n ASCII characters as a warning.

Steps to fix:

  1. Add NonPrintableAsii to EscapeError
  2. Produce this error somewhere around here
  3. Add lexer-level tests
  4. Handle this "error" in unespcape_error_reporting. Note that, unlike other real errors, this one should be just a warning.
  5. Adjust the affected ui tests

I am not sure how to make this warning work with #[allow] lint infrastructure: we definitely can't do this in the lexer.

@matklad
Copy link
Member Author

matklad commented Aug 18, 2019

cc @petrochenkov

@rustbot modify labels: +A-parser, +A-diagnostics, +E-medium

@rustbot rustbot added A-diagnostics Area: Messages for errors, warnings, and lints A-parser Area: The parsing of Rust source code to an AST E-medium Call for participation: Medium difficulty. Experience needed to fix: Intermediate. labels Aug 18, 2019
@Centril Centril added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue. A-lints Area: Lints (warnings about flaws in source code) such as unused_mut. labels Aug 18, 2019
@petrochenkov
Copy link
Contributor

petrochenkov commented Aug 18, 2019

I am not sure how to make this warning work with #[allow] lint infrastructure: we definitely can't do this in the lexer.

Attach the lint to CRATE_NODE_ID, then it'll be able to be enabled/disabled at crate level, but not with higher granularity (some existing early lints behave this way).

@basil-cow
Copy link
Contributor

@rustbot claim

@rustbot rustbot self-assigned this Nov 16, 2019
@basil-cow
Copy link
Contributor

basil-cow commented Nov 19, 2019

I'm not sure how to handle this: callers assume that errors unescape_str produces are hard errors, so if we don't span_err on it, that inconsistency causes an ICE later in type deduction (maybe inconsistent array length?). At least that what I think is happening, not entirely sure. Should I change output type to something like Ok(c) | Warn(c, warn) | Err(err)?

@matklad
Copy link
Member Author

matklad commented Nov 19, 2019

Hm, indeed, unescape_str and friends interface is not fully-prepared for emitting warnings.

One approach here is to extend the interface. I think something like this will be cleanest:

struct EscapeError {
    fallback: char,
    kind: EscapeErrorKind,
}

impl EscapeErrorKind {
    fn is_warning(&self) -> bool;
}

That is, even if we emit error, we still produce a fallback char that should be inserted into the string.

However, this particular warning can be handled with less machinery. At the call-site, we check if char is an unprintable character, and range is of length one. Range length seems seems like a hacky, but correct way to figure out if the char was escaped.

The benefit of the first approach is that it centralizes the knowledge about this warning to the rustc_lexer crate. The drawback is that it makes an already highly awkward interface even more cumbersome.

The benefit of the second approach is simplicity. Moreover, it can be argued that warnings not necessary belong to rustc_lexer, as they don't affect the language definition.

I personally would prefer to start with the second approach: it seems significantly easier to implement, and is very isolated, so changing it to the second one in the future would be easy. OTOH, starting with 2 and simplifying to 1 would be much harder to do.

bors added a commit that referenced this issue Nov 20, 2019
Added `non-printable-ascii` lint

Closes #63682.
@crlf0710 crlf0710 added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Jun 11, 2020
@Alexendoo
Copy link
Member

Triage: Hi, are you still working on this issue @Areredify?

@basil-cow
Copy link
Contributor

@Alexendoo nope, you can grab it, check out my pr earlier for reference if you want

@basil-cow
Copy link
Contributor

oh, sorry, it's a triage, I thought you wanted to grab it 😅

@Alexendoo
Copy link
Member

@rustbot release-assignment

@rustbot rustbot removed their assignment Aug 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-diagnostics Area: Messages for errors, warnings, and lints A-lints Area: Lints (warnings about flaws in source code) such as unused_mut. A-parser Area: The parsing of Rust source code to an AST C-enhancement Category: An issue proposing an enhancement or a PR with one. E-medium Call for participation: Medium difficulty. Experience needed to fix: Intermediate. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants