Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raw Byte strings do not handle \r #60604

Closed
matklad opened this issue May 7, 2019 · 8 comments
Closed

Raw Byte strings do not handle \r #60604

matklad opened this issue May 7, 2019 · 8 comments
Assignees
Labels
A-parser Area: The parsing of Rust source code to an AST

Comments

@matklad
Copy link
Member

matklad commented May 7, 2019

Everywhere in the language we treat \r\n as \n, and forbid bare \r. In particular, in raw strings we do translate \r\n to \n, which makes sense, given that line ending depends on git config.

However, for raw byte string literals we don't do this, and this looks like a bug?

16:22:13|~/tmp
λ echo -e "fn main() {\n    let (s1, s2) = (br\"\r\", br\"\r\n\");\n    println!(\"{:?} {:?}\", s1, s2);\n}" > main.rs

16:25:49|~/tmp
λ rustc main.rs

16:25:50|~/tmp
λ ./main
[13] [13, 10]
@matklad
Copy link
Member Author

matklad commented May 7, 2019

cc @petrochenkov

@Centril Centril added the A-parser Area: The parsing of Rust source code to an AST label May 7, 2019
@matklad
Copy link
Member Author

matklad commented May 7, 2019

I guess the best way to solve this is to move \r\n -> \n logic to the unescape module.

@petrochenkov
Copy link
Contributor

I wanted to create an issue about this as well!

I think translation of \r\n into \n and prohibition of bare \r (*) should be consistently done for all literals.
Perhaps even before tokenization (so that proc macros cannot observe \r\n) (not sure about this, it can potentially mess up byte position in files and in code map or something).

(*) The prohibition of bare \r is mostly a conservative and maybe unnecessary measure. Since \r was used as a newline on some legacy (but not truly ancient) systems, it could be reasonable to treat it as one and translate it to \n, and it is also reasonable to pass \r as is, so the error is mostly a refusal to choose.
(It should be covered by a generic lint for confusing invisible characters if the error is removed.)

As a variation of raw byte string literals, include_bytes!() also doesn't translate \r\n, but it can also contain non UTF-8, so I thinks it's ok to keep it as an escape hatch for entirely untranslated bytes.

@matklad
Copy link
Member Author

matklad commented May 7, 2019

Perhaps even before tokenization

+1 here. Having both \n and \r\n in the source is especially painful in the IDE, where you need to insert line breaks during refactorings. Figuring this out on a case-by-case basis is pretty horrible. We have an open issue (rust-lang/rust-analyzer#775) about python-style translation of \r\n <-> \n at the IO boundary (in IDE, the translation needs to be bidirectional)

@petrochenkov
Copy link
Contributor

python-style translation of \r\n <-> \n at the IO boundary

Not only Python, C++ also assumes that the newline translation happens at IO boundary during source file loading.

My only concern is that it would prevent mmaping source files directly, but I don't know how important that may be.

@matklad
Copy link
Member Author

matklad commented May 7, 2019

Not an expert in mmap, but my understanding is that, in the context of compilers, it won't bring a lot of benefits (files are small and read sequentially and fully), but would cost a lot of complexity (errors during mmap are delivered via signals). Anyway, that's waaaay beyond the scope of this issue :)

I'll look into fixing this this week.

@Xanewok
Copy link
Member

Xanewok commented May 13, 2019

I'm working on it.

@matklad
Copy link
Member Author

matklad commented Jul 22, 2019

closed by #60793

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-parser Area: The parsing of Rust source code to an AST
Projects
None yet
Development

No branches or pull requests

4 participants