Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Obtaining source positions #309

Open
mwaldrich opened this issue Dec 3, 2020 · 6 comments
Open

Obtaining source positions #309

mwaldrich opened this issue Dec 3, 2020 · 6 comments

Comments

@mwaldrich
Copy link

How do you obtain source positions in combine?

I found the position combinator, combine::combinator::position, but that yields a position of type <I as combine::StreamOnce>::Position. For example:

parser! {
    fn function['a, 'b, I](lang: &'b Lang<'a,I>)(I) -> (Id, Function)
    where [I: Stream<Item = char>]
    {
        lang.reserved("function")
        .with(position()) // <--------- Here
        .and(id(lang))
...

It seems like this is the best way to get position information in the case of a successful parse (e.g. not an error case). How do you use this type? How do you extract a line number from it? A column number?

@Marwes
Copy link
Owner

Marwes commented Dec 4, 2020

I::Position can be whatever you need it to be. By default for &str and &[u8] it is just an offset into the slice (as that is the cheapest), but if you want line + column information for a &str you can wrap it via the position stream https://docs.rs/combine/4.4.0/combine/stream/position/index.html.

let input = position::Stream::new(input_str);
parser.parse(input)

@abonander
Copy link

@Marwes I found the PointerOffset positioner for str to be difficult to use. I wanted to be able to immediately get the byte position in the source string when constructing the object I'm parsing (it needs to remember the byte range in the string because the next step after parsing will be to replace those placeholders with generated bind parameters for the given database): https://github.com/launchbadge/sqlx/pull/1076/files#diff-c300d87b76b58c3089d028475ece7eef0f7ad9c200a18a73a79828765f84e4e3R176

However, because you need to call .translate_position() with the source string to get a meaningful value, that would necessitate passing the string down through parser functions to where it's needed, which is annoying and also, I feel, kind of an anti-pattern.

IndexPositioner is almost what I want given that its RangePositioner impl ends up calling str::len(), however it doesn't track byte positions correctly for individual chars since it'll invariably count each char, even multibyte ones, as just 1. The docs should probably discourage using it for streams based on str because that could be a significant footgun for users that forget to test non-ASCII strings.

What I ended up doing was just writing my own Positioner impl that's specialized for str that uses char::len_utf8(): https://github.com/launchbadge/sqlx/pull/1076/files#diff-c300d87b76b58c3089d028475ece7eef0f7ad9c200a18a73a79828765f84e4e3R108-R138

This seems like it'd be good to add to the library, I'd gladly open a PR if you're interested.

@Gallagator
Copy link

@Marwes
I'm also having some problems with the source positions. I'm making a parser which returns the line &column number but Rust is complaining about mismatched types. It says: expected associated type <Input as StreamOnce>::Position found struct SourcePosition .
Here is the code in question:
fn jumpable<Input>() -> impl Parser<Input, Output = (u32, u32)> where Input: Stream<Token = char>, Input::Error: ParseError<Input::Token, Input::Range, Input::Position>, { position() .map(| SourcePosition { line: l, column: c } | { (c,l) }) }
How can I fix this? I'm very confused.
Sorry I couldn't get it to format nicely so here's a screenshot.
image

@Marwes
Copy link
Owner

Marwes commented May 12, 2021

This seems like it'd be good to add to the library, I'd gladly open a PR if you're interested.

Yes, that seems like a good addition. Never thought about IndexPositioner not being very useful for strings as I have always done the translation at top/end of the parse (no need to translate things until you know the parser as a whole has failed).

@abonander

@Marwes
Copy link
Owner

Marwes commented May 12, 2021

@Gallagator If you want to use a specific type in the parser you need to specific it in the type signature (same thing that you did with char.

Input: Stream<Token = char, Position = SourcePosition>

@Gallagator
Copy link

@Marwes Thanks a lot! I was very confused!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants