Help dealing with some error cases #135

Alxandr · 2020-04-25T22:19:12Z

I had no idea how to title this issue, so feel free to rename it if you can come up with any better :P. Anyways, I just started using logos, and so far I'm really liking it. I'm converting a lexer for a fairly small language from Go to Rust, and while I could convert their lexer, I would really like to use the generated code from logos if I can. However, I just ran into an issue that I'm not entirely sure how to resolve (or if it is even resolvable).

I have the following definition for a number (same as in JSON without the leading -):

  #[regex(r"(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+-]?\d+)?")]
  Number(&'a str),

This works fine for conforming numbers, but when I'm trying to convert the tests that deal with error cases I'm not sure I am able to achieve what I want. For instance, here is a test from the original go code that I would like to convert:

SingleTest(t, "1.+3", "snippet:1:3 Couldn't lex number, junk after decimal point: '+'", Tokens{})

This tests that the string "1.+3" results in a lexer error for the whole number string.

I guess the way to solve this is to modify the regex to allow for some garbage that I want to error on? Or are there any better ways to solve this?

[Edit]
In case it's of interest; here's the original lexer loop for numbers in Go: https://github.com/google/go-jsonnet/blob/master/internal/parser/lexer.go#L473

maciejhirsz · 2020-04-26T07:59:23Z

Error handling has been discussed to some extent in #104. It's pretty high up in the priority right now, I'd like to figure out a good API for this so we don't have to change/break it later on if initial implementation turns out to be too limiting.

BTW: you might want to switch that \d to [0-9]. \d in Logos is unicode sensitive IIRC and might match things like unicode eastern arabic numerals on top of [0-9]. This is also something I might want to change later.

Alxandr · 2020-04-26T09:20:41Z

Thanks. But for now, my best solution is to go through the go code and convert it (including the errors) into regexes?

maciejhirsz · 2020-04-26T10:21:18Z

If you want to be verbose about it, that is one way of doing this. Note that with the regex you have here, input 1.+3 will first produce Number("1"), and then Error on . (unless you have a token definition that begins with .).

You could force an error with something like:

  #[regex(r"(0|[1-9][0-9]*)(\.[0-9]+)?([eE][+-]?[0-9]+)?")]
  Number(&'a str),

  #[regex(r"(0|[1-9][0-9]*)(\.[^0-9])")]
  #[error]
  Error,

Replaced all \d with [0-9], also you don't need to mark groups as non-capturing, there is no capture mechanism in Logos so (?:foo) and (foo) are equivalent.

Alxandr · 2020-04-26T10:41:39Z

I went with the more verbose

  #[regex(r"(?:0|[1-9][0-9]*)(?:\.[0-9]+)?(?:[eE][+-]?[0-9]+)?")]
  Number(&'a str),

  #[regex(r"(?:0|[1-9][0-9]*)\.[^0-9]")]
  ErrorNumJunkAfterDecimalPoint(&'a str),

  #[regex(r"(?:0|[1-9][0-9]*)(?:\.[0-9]+)?[eE][^+-0-9]")]
  ErrorNumJunkAfterExponent(&'a str),

  #[regex(r"(?:0|[1-9][0-9]*)(?:\.[0-9]+)?[eE][+-][^0-9]")]
  ErrorNumJunkAfterExponentSign(&'a str),

I would like to know which class of errors I'm dealing with, and this works it seems.

Alxandr · 2020-04-26T10:45:50Z

Btw; another good case for where you might want something like this (if you're collecting for cases for dealing with errors) is detecting unterminated strings.

[Edit]
Not to mention (what I'm looking at now); invalid escape sequences. Trying to lex "\"hi\\\n\"".

maciejhirsz · 2020-04-26T10:59:49Z

Once you've played with this for a while, could you look at the discussion in #104? Don't have to read the whole thing, the last few comments are probably most relevant.

It would be super useful in nailing the final API design if you could verify whether any of the ideas surfaced there could help you do error handling with less code.

Alxandr closed this as completed Apr 26, 2020

maciejhirsz mentioned this issue Apr 26, 2020

Allow lexers to provide more user-friendly errors #104

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help dealing with some error cases #135

Help dealing with some error cases #135

Alxandr commented Apr 25, 2020 •

edited

Loading

maciejhirsz commented Apr 26, 2020

Alxandr commented Apr 26, 2020

maciejhirsz commented Apr 26, 2020

Alxandr commented Apr 26, 2020

Alxandr commented Apr 26, 2020 •

edited

Loading

maciejhirsz commented Apr 26, 2020 •

edited

Loading

Help dealing with some error cases #135

Help dealing with some error cases #135

Comments

Alxandr commented Apr 25, 2020 • edited Loading

maciejhirsz commented Apr 26, 2020

Alxandr commented Apr 26, 2020

maciejhirsz commented Apr 26, 2020

Alxandr commented Apr 26, 2020

Alxandr commented Apr 26, 2020 • edited Loading

maciejhirsz commented Apr 26, 2020 • edited Loading

Alxandr commented Apr 25, 2020 •

edited

Loading

Alxandr commented Apr 26, 2020 •

edited

Loading

maciejhirsz commented Apr 26, 2020 •

edited

Loading