Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libsyntax JSON output bytes counts ignore CRLF normalization #65029

Closed
Rantanen opened this issue Oct 2, 2019 · 1 comment · Fixed by #65074
Closed

libsyntax JSON output bytes counts ignore CRLF normalization #65029

Rantanen opened this issue Oct 2, 2019 · 1 comment · Fixed by #65074
Labels
A-parser Area: The parsing of Rust source code to an AST C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@Rantanen
Copy link
Contributor

Rantanen commented Oct 2, 2019

Recently there was a change in rustc to normalize newlines (CRLF -> LF) early during processing (#62948). This caused a change in the JSON output offsets. While the line/column values were unaffected (since they do not include the newlines in the counts), the byte_start/byte_end values were affected as they are calculated from the spans that use post-normalization positions.

As a result, the byte_start/byte_end values cannot be used to index the actual bytes of the original file, if the original file content was affected by the CRLF normalization. This is most evident in rustfix on Windows.

The original discussion is in the rustfix issue at rust-lang/rustfix#176

@jonas-schievink jonas-schievink added A-parser Area: The parsing of Rust source code to an AST C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 2, 2019
@petrochenkov
Copy link
Contributor

cc @matklad

Centril added a commit to Centril/rust that referenced this issue Oct 8, 2019
Fix the start/end byte positions in the compiler JSON output

Track the changes made during normalization in the `SourceFile` and use this information to correct the `start_byte` and `end_byte` fields in the JSON output.

This should ensure the start/end byte fields can be used to index the original file, even if Rust normalized the source code for parsing purposes. Both CRLF to LF and BOM removal are handled with this one.

The rough plan was discussed with @matklad in rust-lang/rustfix#176 - although I ended up going with `u32` offset tracking so I wouldn't need to deal with `u32 + i32` arithmetics when applying the offset to the span byte positions.

Fixes rust-lang#65029
Centril added a commit to Centril/rust that referenced this issue Oct 8, 2019
Fix the start/end byte positions in the compiler JSON output

Track the changes made during normalization in the `SourceFile` and use this information to correct the `start_byte` and `end_byte` fields in the JSON output.

This should ensure the start/end byte fields can be used to index the original file, even if Rust normalized the source code for parsing purposes. Both CRLF to LF and BOM removal are handled with this one.

The rough plan was discussed with @matklad in rust-lang/rustfix#176 - although I ended up going with `u32` offset tracking so I wouldn't need to deal with `u32 + i32` arithmetics when applying the offset to the span byte positions.

Fixes rust-lang#65029
Centril added a commit to Centril/rust that referenced this issue Oct 25, 2019
Fix the start/end byte positions in the compiler JSON output

Track the changes made during normalization in the `SourceFile` and use this information to correct the `start_byte` and `end_byte` fields in the JSON output.

This should ensure the start/end byte fields can be used to index the original file, even if Rust normalized the source code for parsing purposes. Both CRLF to LF and BOM removal are handled with this one.

The rough plan was discussed with @matklad in rust-lang/rustfix#176 - although I ended up going with `u32` offset tracking so I wouldn't need to deal with `u32 + i32` arithmetics when applying the offset to the span byte positions.

Fixes rust-lang#65029
@bors bors closed this as completed in 1f93be1 Oct 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-parser Area: The parsing of Rust source code to an AST C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants