add position to text pos #30

faassen · 2024-11-14T14:36:44Z

xmlparser errors return a TextPos, which Stream::gen_text_pos mentions is expensive to calculate. But I'd like to use a library like Ariadne or Miette to report parser errors. Both of these are offset based, i.e. a position.

I figured I'd just get the stream from the tokenizer when there's an error, using stream().pos(). I have to obtain the stream when there's an error, because stream() generates a copy, and I don't want a stale copy. But since Tokenizer is an iterator and is moved when I start iteration, I can't access it:

for token in tokenizer {
     match token {
        ...
       Err(e) => {
             let position = tokenizer.stream().pos(); // not allowed
        }
    }
}

The easiest way to fix this would be to simply retain the position in TextPos when it is calculated. Would you accept a PR that implements this?

For my use case, I'd actually prefer it if the TextPos calculation doesn't happen at all, as I don't need this to happen, but I don't see a way to do this without breaking backwards compatibilty.

Without this I think the only way forward is for me to write some really ugly code to calculate the position back from the text pos and the input as I don't see another way to obtain it.

The text was updated successfully, but these errors were encountered:

RazrFalcon · 2024-11-14T17:56:43Z

You can simply use:

while let Some(token) = tokenizer.next() {
}

This way tokenizer will not be consumed.

faassen · 2024-11-15T13:41:04Z

Thanks, I should have realized that and I feel dumb now!

But after I changed that I realized that doing this in the error handler:

      let position = tokenizer.stream().pos();

is too late, as the stream already has moved on. I now have this:

let mut position;
loop {
   position = tokenizer.stream().pos();
   if let Some(token) = tokenizer.next() {
       match token {
            Ok(token) => ...,
            Err(e) => {
                 // use position here
             }
   }
}

Obtaining a copy of the stream for each token so I can retain the position information just for the purposes of error handling means the error handling requirements have a performance impact on parsing overall, which isn't ideal but I can live with it.

I'm going to be bold and propose a design where Xot wouldn't incur the TextPos calculations but roxmltree can still have them:

xmlparser::Error only maintains position, and does no expensive TextPos calculation
in roxmltree, whenever an error with a TextPos needs to be produced, ctx is available so that ctx.doc.text could be used to construct TextPos.

Oh, after investigating all this I realized that xmlparser is an internal fork of roxmltree. Have they diverged or is the intent to keep the code identical? What are your plans for 'xmlparser' as a separate crate in general?

RazrFalcon · 2024-11-15T15:37:01Z

Why do you need tokenizer.stream().pos() to begin with, if each token has its position already?

As for any changes, I'm no longer maintaining this crate, as well as roxmltree. So no changes from me. It's up to the new maintainers to decide.

As for roxmltree, it has a modified fork of xmlparser which is a bit simpler and event/push based and not iterator/pull based like xmlparser. Which makes code simpler and faster.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add position to text pos #30

add position to text pos #30

faassen commented Nov 14, 2024

RazrFalcon commented Nov 14, 2024

faassen commented Nov 15, 2024

RazrFalcon commented Nov 15, 2024

add position to text pos #30

add position to text pos #30

Comments

faassen commented Nov 14, 2024

RazrFalcon commented Nov 14, 2024

faassen commented Nov 15, 2024

RazrFalcon commented Nov 15, 2024