-
-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
html output break utf-8 #275
Labels
C-upstream-bug
Category: This is a bug of compiler or dependencies (the fix may require action in the upstream)
Comments
taiki-e
added
the
C-upstream-bug
Category: This is a bug of compiler or dependencies (the fix may require action in the upstream)
label
May 5, 2023
matthiaskrgr
added a commit
to matthiaskrgr/rust
that referenced
this issue
Jan 8, 2024
coverage: `llvm-cov` expects column numbers to be bytes, not code points Normally the compiler emits column numbers as a 1-based number of Unicode code points. But when we embed coverage mappings for `-Cinstrument-coverage`, those mappings will ultimately be read by the `llvm-cov` tool. That tool assumes that column numbers are 1-based numbers of *bytes*, and relies on that assumption when slicing up source code to apply highlighting (in HTML reports, and in text-based reports with colour). For the very common case of all-ASCII source code, bytes and code points are the same, so the difference isn't noticeable. But for code that contains non-ASCII characters, emitting column numbers as code points will result in `llvm-cov` slicing strings in the wrong places, producing mangled output or fatal errors. (See taiki-e/cargo-llvm-cov#275 as an example of what can go wrong.)
matthiaskrgr
added a commit
to matthiaskrgr/rust
that referenced
this issue
Jan 8, 2024
coverage: `llvm-cov` expects column numbers to be bytes, not code points Normally the compiler emits column numbers as a 1-based number of Unicode code points. But when we embed coverage mappings for `-Cinstrument-coverage`, those mappings will ultimately be read by the `llvm-cov` tool. That tool assumes that column numbers are 1-based numbers of *bytes*, and relies on that assumption when slicing up source code to apply highlighting (in HTML reports, and in text-based reports with colour). For the very common case of all-ASCII source code, bytes and code points are the same, so the difference isn't noticeable. But for code that contains non-ASCII characters, emitting column numbers as code points will result in `llvm-cov` slicing strings in the wrong places, producing mangled output or fatal errors. (See taiki-e/cargo-llvm-cov#275 as an example of what can go wrong.)
matthiaskrgr
added a commit
to matthiaskrgr/rust
that referenced
this issue
Jan 8, 2024
coverage: `llvm-cov` expects column numbers to be bytes, not code points Normally the compiler emits column numbers as a 1-based number of Unicode code points. But when we embed coverage mappings for `-Cinstrument-coverage`, those mappings will ultimately be read by the `llvm-cov` tool. That tool assumes that column numbers are 1-based numbers of *bytes*, and relies on that assumption when slicing up source code to apply highlighting (in HTML reports, and in text-based reports with colour). For the very common case of all-ASCII source code, bytes and code points are the same, so the difference isn't noticeable. But for code that contains non-ASCII characters, emitting column numbers as code points will result in `llvm-cov` slicing strings in the wrong places, producing mangled output or fatal errors. (See taiki-e/cargo-llvm-cov#275 as an example of what can go wrong.)
matthiaskrgr
added a commit
to matthiaskrgr/rust
that referenced
this issue
Jan 8, 2024
coverage: `llvm-cov` expects column numbers to be bytes, not code points Normally the compiler emits column numbers as a 1-based number of Unicode code points. But when we embed coverage mappings for `-Cinstrument-coverage`, those mappings will ultimately be read by the `llvm-cov` tool. That tool assumes that column numbers are 1-based numbers of *bytes*, and relies on that assumption when slicing up source code to apply highlighting (in HTML reports, and in text-based reports with colour). For the very common case of all-ASCII source code, bytes and code points are the same, so the difference isn't noticeable. But for code that contains non-ASCII characters, emitting column numbers as code points will result in `llvm-cov` slicing strings in the wrong places, producing mangled output or fatal errors. (See taiki-e/cargo-llvm-cov#275 as an example of what can go wrong.)
rust-timer
added a commit
to rust-lang-ci/rust
that referenced
this issue
Jan 9, 2024
Rollup merge of rust-lang#119033 - Zalathar:unicode, r=davidtwco coverage: `llvm-cov` expects column numbers to be bytes, not code points Normally the compiler emits column numbers as a 1-based number of Unicode code points. But when we embed coverage mappings for `-Cinstrument-coverage`, those mappings will ultimately be read by the `llvm-cov` tool. That tool assumes that column numbers are 1-based numbers of *bytes*, and relies on that assumption when slicing up source code to apply highlighting (in HTML reports, and in text-based reports with colour). For the very common case of all-ASCII source code, bytes and code points are the same, so the difference isn't noticeable. But for code that contains non-ASCII characters, emitting column numbers as code points will result in `llvm-cov` slicing strings in the wrong places, producing mangled output or fatal errors. (See taiki-e/cargo-llvm-cov#275 as an example of what can go wrong.)
This should now be fixed upstream in nightly as of rust-lang/rust#119033. |
@Zalathar Great! Thanks for fixing this! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
C-upstream-bug
Category: This is a bug of compiler or dependencies (the fix may require action in the upstream)
I have lines like this in my codebase:
Generating html with such command
cargo llvm-cov --html test
.For letter 'Я' resulted html code looks like this:
so html generator breaks valid utf-8 sequence for 'Я' (
\320\257
) into two parts and insert html code between,making it invalid.
The text was updated successfully, but these errors were encountered: