-
-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replaced the String
in Token
with &'a str
#559
Conversation
Unfortunately due to borrow checker limitations, this required moving `input` fields out of both `Parser` and `Tokenizer`, as with the immutable borrow in place, there is no way to tell Rust that a mutable borrow won't touch the input So, the input is now an (immutable ref) argument to all the methods that used to refer to `&self.input` And there were a lot... But now `Token` doesn't carry an owned `String` Also shrunk `Tokenizer` by doing all math in terms of byte offsets into its input (using existing `SourceCodePosition` fields) instead of storing a separate `Vec<char>` with char indices
… used on a newline and would've had no effect, it is semantically correct to keep the original `+= 1`)
…self.context.lock()` as the lock must be dropped to allow `self` to be borrowed in the interm)
Really nice! As you saw in the I think this will also enable a lot of downstream optimizations if we push this even further (with things referring to the original source instead of cloning stuff).
Cool! I read somewhere that production-grade parsers typically only keep byte offsets around, instead of using something like
Yeah, it's not great. It's maybe not too bad either, so I will just merge your PR as is. Thank you very much for this valuable contribution! |
Right, sounds like you could store just the byte offset in a Span, and separately store the cumsum of line lengths. Then to find where a byte offset goes, binary search to find the largest line length cumsum less than the byte offset — that's your line number. Position in the line is byte offset minus that cumsum. This doesn't sound that bad, just annoying. Worth me giving it a shot? I suppose the questions is where to store the cumsum of line lengths. Probably in the same place the lines themselves are stored? (I don't actually know where that is, but obviously it exists because it's used to print error messages.) |
Hey, it looks like only the removal of |
Your branch is properly integrated (check out the commit history of this repo). I rebased your branch on top of master instead of creating a merge commit. The rebase creates new commits which are not identical (and have a different hash) compared to your local commits. This is probably why you see "N commits ahead of..". |
Unfortunately due to borrow checker limitations, this required moving
input
fields out of bothParser
andTokenizer
, as with the immutable borrow in place, there is no way to tell Rust that a mutable borrow won't touch the input. The underlying issue is that returning aToken<'_>
that borrows from&self
really trips up the borrow checker in a way that a non-borrowingToken
doesn't.So, the input is now provided as a shared ref argument to all the methods that used to refer to
&self.input
(now either a&str
or a&[Token]
)And there were a lot... a-lot-a-lot...
But now
Token
doesn't carry an ownedString
Also shrunk
Tokenizer
by doing all math in terms of byte offsets into its input (using existingSourceCodePosition
fields) instead of storing a separateVec<char>
with char indicesAlso made
ForeignFunction.name
a&'static str
instead of aString
.No new tests, but all existing tests pass