Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help dealing with some error cases #135

Closed
Alxandr opened this issue Apr 25, 2020 · 6 comments
Closed

Help dealing with some error cases #135

Alxandr opened this issue Apr 25, 2020 · 6 comments

Comments

@Alxandr
Copy link

Alxandr commented Apr 25, 2020

I had no idea how to title this issue, so feel free to rename it if you can come up with any better :P. Anyways, I just started using logos, and so far I'm really liking it. I'm converting a lexer for a fairly small language from Go to Rust, and while I could convert their lexer, I would really like to use the generated code from logos if I can. However, I just ran into an issue that I'm not entirely sure how to resolve (or if it is even resolvable).

I have the following definition for a number (same as in JSON without the leading -):

  #[regex(r"(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+-]?\d+)?")]
  Number(&'a str),

This works fine for conforming numbers, but when I'm trying to convert the tests that deal with error cases I'm not sure I am able to achieve what I want. For instance, here is a test from the original go code that I would like to convert:

SingleTest(t, "1.+3", "snippet:1:3 Couldn't lex number, junk after decimal point: '+'", Tokens{})

This tests that the string "1.+3" results in a lexer error for the whole number string.

I guess the way to solve this is to modify the regex to allow for some garbage that I want to error on? Or are there any better ways to solve this?

[Edit]
In case it's of interest; here's the original lexer loop for numbers in Go: https://github.com/google/go-jsonnet/blob/master/internal/parser/lexer.go#L473

@maciejhirsz
Copy link
Owner

Error handling has been discussed to some extent in #104. It's pretty high up in the priority right now, I'd like to figure out a good API for this so we don't have to change/break it later on if initial implementation turns out to be too limiting.

BTW: you might want to switch that \d to [0-9]. \d in Logos is unicode sensitive IIRC and might match things like unicode eastern arabic numerals on top of [0-9]. This is also something I might want to change later.

@Alxandr
Copy link
Author

Alxandr commented Apr 26, 2020

Thanks. But for now, my best solution is to go through the go code and convert it (including the errors) into regexes?

@maciejhirsz
Copy link
Owner

If you want to be verbose about it, that is one way of doing this. Note that with the regex you have here, input 1.+3 will first produce Number("1"), and then Error on . (unless you have a token definition that begins with .).

You could force an error with something like:

  #[regex(r"(0|[1-9][0-9]*)(\.[0-9]+)?([eE][+-]?[0-9]+)?")]
  Number(&'a str),

  #[regex(r"(0|[1-9][0-9]*)(\.[^0-9])")]
  #[error]
  Error,

Replaced all \d with [0-9], also you don't need to mark groups as non-capturing, there is no capture mechanism in Logos so (?:foo) and (foo) are equivalent.

@Alxandr
Copy link
Author

Alxandr commented Apr 26, 2020

I went with the more verbose

  #[regex(r"(?:0|[1-9][0-9]*)(?:\.[0-9]+)?(?:[eE][+-]?[0-9]+)?")]
  Number(&'a str),

  #[regex(r"(?:0|[1-9][0-9]*)\.[^0-9]")]
  ErrorNumJunkAfterDecimalPoint(&'a str),

  #[regex(r"(?:0|[1-9][0-9]*)(?:\.[0-9]+)?[eE][^+-0-9]")]
  ErrorNumJunkAfterExponent(&'a str),

  #[regex(r"(?:0|[1-9][0-9]*)(?:\.[0-9]+)?[eE][+-][^0-9]")]
  ErrorNumJunkAfterExponentSign(&'a str),

I would like to know which class of errors I'm dealing with, and this works it seems.

@Alxandr Alxandr closed this as completed Apr 26, 2020
@Alxandr
Copy link
Author

Alxandr commented Apr 26, 2020

Btw; another good case for where you might want something like this (if you're collecting for cases for dealing with errors) is detecting unterminated strings.

[Edit]
Not to mention (what I'm looking at now); invalid escape sequences. Trying to lex "\"hi\\\n\"".

@maciejhirsz
Copy link
Owner

maciejhirsz commented Apr 26, 2020

Once you've played with this for a while, could you look at the discussion in #104? Don't have to read the whole thing, the last few comments are probably most relevant.

It would be super useful in nailing the final API design if you could verify whether any of the ideas surfaced there could help you do error handling with less code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants