Defining custom tokens in parser options #312

gamesaucer · 2022-07-07T20:04:25Z

gamesaucer
Jul 7, 2022

I was wondering whether it's possible, given a list of tokens for e.g. operators, to create a rule that tries to match from that list. For example:

{
  const operators = [ '**', '==', '+', '-', '*', '/' ]
}

Operator = token:$SourceCharacter+ &{ return operators.includes(token) } { return token }
SourceCharacter = .

This version doesn't work though, since the parser will just keep matching characters and doesn't know when to stop. I've tried several approaches, but I'm not quite sure how to match a list of tokens like that when there isn't necessarily a restriction on the characters those tokens can contain (except perhaps a space, but that still leads to issues when you want to not have spaces between every morphological unit of the language).

The difficulty I'm having with this really makes me wish for metaprogramming functionality in peggy, so I can just generate rules from a list like this. Perhaps there's something I can do with the Plugin API? But I'm not familiar enough to be able to tell.

Answered by Mingun

Jul 8, 2022

If you don't known which characters are not allowed in the your tokens, the only possible approach is to check match after parsing each character. Probably, it would be better to generate a concrete grammar instead to trying to write the generic one.

View full answer

Mingun · 2022-07-08T04:11:18Z

Mingun
Jul 8, 2022

If you don't known which characters are not allowed in the your tokens, the only possible approach is to check match after parsing each character. Probably, it would be better to generate a concrete grammar instead to trying to write the generic one.

1 reply

gamesaucer Jul 11, 2022
Author

That's not exactly the answer I hoped to hear, but it's a reasonable one. I unfortunately can't really write a concrete grammar without solving this problem in some form, unless I'm okay with limiting the functionality of the language I want to make. So I'll have to think about that and perhaps roll my own solution if I can't find a way to describe it using a PEG grammar.

arily · 2023-08-22T07:09:44Z

arily
Aug 22, 2023

Hi,
I'm making a parser, and I want to know

Scope
  = "public"
  / "protected"
  / "private"

is there a way to write like this with low performance impact?

Scope
  = option.kw.Public
  / option.kw.Protected
  / option.kw.Private

// doable at this moment
Scope = type:$SourceCharacter+ &{ return [option.kw.Public, option.kw.Protected, option.kw.Private].includes(type) }

by supplying the keywords from options, we can eliminate one source of truth, which means less place to change, thus make our parser safer.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Defining custom tokens in parser options #312

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Defining custom tokens in parser options #312

gamesaucer Jul 7, 2022

Replies: 2 comments · 1 reply

Mingun Jul 8, 2022

gamesaucer Jul 11, 2022 Author

arily Aug 22, 2023

gamesaucer
Jul 7, 2022

Replies: 2 comments 1 reply

Mingun
Jul 8, 2022

gamesaucer Jul 11, 2022
Author

arily
Aug 22, 2023