Skip to content

Understanding SourceAnnotation.range #24

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lpil opened this issue Jan 14, 2020 · 16 comments
Closed

Understanding SourceAnnotation.range #24

lpil opened this issue Jan 14, 2020 · 16 comments
Labels
C-bug Category: bug

Comments

@lpil
Copy link

lpil commented Jan 14, 2020

Hello!

I'm testing this library as an alternative to codespan_reporting and I'm having trouble understanding the SourceAnnotation.range value. I thought it was a byte offset (as in codespan_reporting, that I can get from my lalrpop parser) but it seems to have behaviour that suggests it is not.

When the annotation is on the first line of the source the annotation is positioned as I expect:

line 1:

error: Type mismatch
 --> /Users/louis/src/gleam/gleam/test/hello_world/src/other.gleam:1:13
  |
1 | fn main() { "" + 1 }
  |             ^^
  |

However when the annotation is not on the first line of the source it seems to move back by one character per line.

line 8:

error: Type mismatch
 --> /Users/louis/src/gleam/gleam/test/hello_world/src/other.gleam:1:6
  |
...
8 | fn main() { "" + 1 }
  |      ^^
  |

SourceAnnotation.range doesn't seem to be documented- what does it accept?

Thank you

@zbraniecki
Copy link
Contributor

Is there a chance that it's due to line endings? Like CLRF? Can you provide an example test.rs?

@lpil
Copy link
Author

lpil commented Jan 15, 2020

Sure thing, this is the file I was using: https://github.com/gleam-lang/gleam/blob/1cfcbdfac2fccd4107c3502cd4878bd4db1bc2ff/test/hello_world/src/other.gleam

Line endings should be the linux style \n

@zbraniecki
Copy link
Contributor

Can you also provide an .rs file you're using that produces the wrong output?

@lpil
Copy link
Author

lpil commented Jan 26, 2020

I'm afraid I don't have time to produce a minimal example, but a patch to my program can be found here https://github.com/gleam-lang/gleam/pull/376/files

@Nadrieril
Copy link
Member

I've got a similar problem: on a file with Unix line endings, the annotation gets shifted by one for each line. If I convert the file to dos line endings, then the annotation is correctly positioned. It seems like annotate-snippets counts the end of line char as two chars maybe ? Unicode in the source file also messes up annotations, so it could be a difference between char-indexing and byte-indexing.

@Nadrieril
Copy link
Member

Experiment confirms that indexing is indeed by char and not by byte index. Furthermore somehow \n is counted as two chars on its own, but counts as one char as part of \r\n (so that line endings always count as two chars). This is the function I use to convert from byte idxs to a char index, that works with both \n and \r\n line endings:

fn char_idx_from_byte_idx(input: &str, idx: usize) -> usize {
    let char_idx = input
        .char_indices()
        .enumerate()
        .find(|(_, (i, _))| *i == idx)
        .unwrap()
        .0;
    // Unix-style newlines are counted as two chars (see
    // https://github.com/rust-lang/annotate-snippets-rs/issues/24).
    let nbr_newlines =
        input[..idx].chars().filter(|c| *c == '\n').count();
    let nbr_carriage_returns =
        input[..idx].chars().filter(|c| *c == '\r').count();
    char_idx + nbr_newlines - nbr_carriage_returns
}

@brendanzab
Copy link
Member

Yeah, I'm actually running into this as well when testing the library out on Pikelet! Weirdly though it doesn't seem to show up in the tests. It appears like this:

error: invalid numeric literal
  |
0 | record {
1 |     A = U32,
2 |     a = -23,
  |       ^^^ failed to parse literal
3 | } : Record {
4 |     A : Type,
5 |     a : A,
6 | }
  |

The bug seems like it might occur here: https://github.com/rust-lang/annotate-snippets-rs/blob/master/src/display_list/from_snippet.rs#L178-L199 - at least it seems like ugly bugs would occur here wrt. multibyte characters and tab characters. It'd probably be better to try to try to decouple byte position from character index in this library.

@Nadrieril
Copy link
Member

Ah yeah, .lines() doesn't distinguish \r\n and \n endings, which is good, but .chars() does, which messes up character counts. I'm afraid there might be no way of using .lines() there.
Specifically, this line

let line_length = line.chars().count() + 1;
assumes that a line ending is one char, and this length is later added to current_index (with an additional increment, I don't know why) there
current_index += line_length + 1;

Together, those lines effectively count each end of line as two chars, as I had observed

@zbraniecki
Copy link
Contributor

thank you for the report! I'm a bit swamped with work lately, but will try to find time to write a PR unless someone beats me to it!

@zbraniecki
Copy link
Contributor

@lpil - can you test against master? I think the #27 fixed it and I can't reproduce your problem, but would like to verify before I release a new version.

@zbraniecki zbraniecki added the C-bug Category: bug label Mar 27, 2020
@zbraniecki
Copy link
Contributor

@lpil I'm actually going to release a new version anyway, but when you have a chance, please check if this bug is fixed for you

@lpil
Copy link
Author

lpil commented Mar 30, 2020

Thank you, though I have not used this library recent and thus do not have any code to test with.

@gipsond
Copy link

gipsond commented Mar 31, 2020

I was getting this bug with v0.6.1, but not anymore. Unfortunately I got a similar problem when I tried to update to v0.7.0.

Here's with v0.6.1:

error: invalid immediate, couldn't parse number (was: 30_2)
 --> .\add.asm:1:6
  |
...
5 | .ORIG x30_2
  |       ^^^^^ invalid immediate here
  |

and with v0.7.0, removing the DisplayListFormatter and adding default FormatOptions to the snippet:

error: invalid immediate, couldn't parse number (was: 30_2)
 --> .\add.asm:1:11
  |
...
5 |   .ORIG x30_2
  |  ___________^
6 | | AND R1, R1, R1
  | |____^ invalid immediate here
  |

add.asm has 4 lines before that error, each ending in CRLF. It seems like the offset is in the other direction now.

@zbraniecki
Copy link
Contributor

Can you file it as a new issue? Thank you!

@Nadrieril
Copy link
Member

I can confirm that 0.7.0 fixed this for me !

@zbraniecki
Copy link
Contributor

Thank you! Marking as fixed, and let's reopen if needed!

epage added a commit to epage/annotate-snippets-rs that referenced this issue Sep 27, 2024
chore: Ensure pre-commit gets non-system Python
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: bug
Projects
None yet
Development

No branches or pull requests

5 participants