Spans from custom tokens. #512
Unanswered
aDotInTheVoid
asked this question in
Q&A
Replies: 2 comments 2 replies
-
Yes, |
Beta Was this translation helpful? Give feedback.
2 replies
-
btw starting a higher level conversation on improving the lexing+parsing story over at #591 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I've been trying to write a winnow parser that uses a custom lexer.
I've been having some trouble having the parser correctly pick up the spans that the lexer
assigns to tokens.
NB: A full, compilling version of this can be found here
Part 0: The lexer.
The lexer exposes a relativly simple interface.
Each token carries it's own span, which is needed to preserve information as
whitespace is stripped, as I don't want to need to skip whitespace everywhere in
the parser. (This is one of the motivations for using a custom lexer).
In order to use this with the primitives, I've used the following implementations.
To demonstrate this, I'll be using a simple sexpr parser. It's generic over the
stream that it parsers, as long as it implements the standard traits.
I'll be using the same input in all examples. Ideally it should parse to the tree below.
Approach 1: Using
Located
.My first guess was to just use the
Located
type.Unfortunatly this doesn't give the results I want.
The spans here are token offsets, not byte offsets (as provided from the lexer).
Approach 2: Manually implementing Location.
Given that
Located
only advances the.location()
by 1 each time a token is consumed, it's clear I need a differentLocation
impl. (unless I want to implement.offset
, but that seems like a bad idea, given that finding the offset between any 2 tokens is harder that finding the location of a single token).This also requires a load of boilerplate to implement
Offset
,Stream
andStreamIsPartial
(not shown here).Unfortunatly it doesn't work, as it can't find the location for the end of input, as there's no token left.
Approach 3: State!
But given that we've got our own struct, we can just store it there, and access it later.
This also needs a
Offset
,Stream
andStreamIsPartial
implementation.This works alot better.
unfortunatly it's got the spans wrong. Because we only reference the
.start
of spans, it can't see that the lexer has decided that the whitespace isn't a part of the token.More pictorally it parsers into
instead of
Conclusion.
At this point, I investigated what was going on with
with_span
, to see if it could be cajoled into doing what I wanted.it's impl:
The problem here is that it uses the same call for the
start
andend
.If the parser is in this situation:
we want it to use the lower location for the
.end
ofbb
and the upper forthe
.start
ofcccc
.Doing this would require changing the
Location
trait, and the implmentation of.span
and.with_span
. While in theory I could do this myself, I thaught it'dbe worth asking your thaught on this, and how much of this makes sense to live
upstream.
Beta Was this translation helpful? Give feedback.
All reactions