Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reusing same pattern in a rule #194

Closed
jcubic opened this issue Oct 15, 2021 · 16 comments
Closed

Reusing same pattern in a rule #194

jcubic opened this issue Oct 15, 2021 · 16 comments
Labels
enhancement New feature or request

Comments

@jcubic
Copy link

jcubic commented Oct 15, 2021

Here is an example of grammar that I would like to have:

heredoc = "<<<" marker:[\w]+ "\n" text:[\s\S]+ "\n" marker {
    return text.join('');
}

Right now it doesn't work but It would be nice to be able to use marker later in the rule. So I properly parse:

<<<HELLO
1
2
3
HELLO
@Mingun
Copy link
Member

Mingun commented Oct 15, 2021

This is trivially can be implemented by this grammar:

heredoc = "<<<" begin:marker "\n" @$_+ "\n" end:marker (
    &{ return begin === end; }
  / '' { error(`Expected matched marker "${begin}", but marker "${end}" was found`); }
);
_ = (!"\n" .);
marker 'Marker' = $_+;

I do not think that we need a special case for handling such situations unless someone can give reasonable arguments. For example, it can be hard to customize error message. Of course, you can try to make a PR realizing this feature.

Also, remember, that peggy does not use RegExps in their grammar to match symbols, constructions like [\s\S] will not work

@jcubic
Copy link
Author

jcubic commented Oct 15, 2021

It seems this only works with a single string. You can't write multiline heredocs. I've tried to modify the rule but it seems that if I use $.+ the matching is greedy and it can't find the end marker because .+ also matches the marker.

@jcubic
Copy link
Author

jcubic commented Oct 15, 2021

This seems to do the job:

heredoc = "<<<" begin:marker "\n" text:$($any_char+ "\n")+ end:marker (
    &{ return begin === end; }
  / '' { error(`Expected matched marker "${begin}", but marker "${end}" was found`); }
) {
    return {
    	type: 'literal',
        value: text
   };
}
any_char = (!"\n" .);
marker 'Marker' = $any_char+;

@Mingun
Copy link
Member

Mingun commented Oct 15, 2021

You just need to exclude the end marker sequence from a content rule:

heredoc = "<<<" begin:marker "\n" @content "\n" end:marker EOF (
    &{ return begin === end; }
  / '' { error(`Expected matched marker "${begin}", but marker "${end}" was found`); }
);
_ = (!"\n" .);
marker 'Marker' = $_+;
content = $(!("\n" marker EOF) .)*;
EOF = !.;

This will work only if heredoc comment is not embedded into other source, otherwise you should find how to replace EOF rule (just remember, that actually heredoc is ended by the "\n" marker EOF sequence, so it shouldn't be contained in the content).

@jcubic
Copy link
Author

jcubic commented Oct 15, 2021

Both mine and your implementation don't work if there is code after heredoc. I'm not able to parse:

foo = <<<END
xxxx
xxxx
END

echo "Welcome stranger, can you tell me what is your name?"

It only works if there is only heredoc and nothing else. The parsing doesn't stop when it finds END marker

@jcubic
Copy link
Author

jcubic commented Oct 15, 2021

Here is my playground on Codepen it should generate JavaScript code. It saves the grammar and code in localStorage, but I've updated it with my latest grammar and example so you can try with real code.
Heredoc is in line 361, The only difference is I don't use _ because I already use it for whitespace.

@jcubic
Copy link
Author

jcubic commented Oct 15, 2021

This is becoming really complex and not so trivial to create. Maybe it's worth adding something like start:marker ... end:start (or some other syntax) that will allow back-references.

@Mingun
Copy link
Member

Mingun commented Oct 15, 2021

Creating a back reference is not a problem, the real problem is how to distinguish between heredoc content and the marker and at the same time give a suggestion to the user if he/she makes a misprint in the end marker:

{ let begin = null; }
heredoc = "<<<" beginMarker "\n" @content endMarker;
_ = (!"\n" .);
marker 'Marker' = $_+;

beginMarker = m:marker { begin = m; };
endMarker = "\n" end:marker &{ return begin === end; };

content = $(!endMarker .)*;

And giving right suggestion in such situations not an easy task (if even possible) if you would try to implement some helper in the generator

@hildjj
Copy link
Contributor

hildjj commented Oct 15, 2021

After playing around with Mingun's answer, I got it to work just fine. I don't mind the added state variable begin, since you can't nest heredocs in this case. If you did need to nest something like this, you could make begin an array that you treated like a stack.

Attaching my full grammar for later reference:

{ let begin = null; }

commands = (@line eol)*

line
  = assignment
  / command
  / _

assignment = name:var _ '=' _ rhs:rhs { return { 
  type: 'assign', value: { name, rhs } 
}}

rhs
  = var
  / string
  / heredoc

command
 = echo
 
echo = 'echo' _ value:string { return { type: 'echo', value } }

string = '"' value:$DoubleStringCharacter* '"' { return { 
  type: 'string',value
}}

heredoc = "<<<" beginMarker eol value:content endMarker { return {
  type: 'string',
  value
}}

char = (!eol .)
marker = $char+

beginMarker = m:marker { begin = m; }
endMarker = eol end:marker &{ return begin === end; }

content = $(!endMarker .)*

DoubleStringCharacter = !('"' / "\\" / eol) . { return text() }
var = $([a-z]i[a-z0-9]i+)
eol = '\n'
_ = [ \t]*

@hildjj
Copy link
Contributor

hildjj commented Oct 15, 2021

Reference full feature request: pegjs/pegjs#670

@hildjj hildjj added the enhancement New feature or request label Oct 15, 2021
@jcubic
Copy link
Author

jcubic commented Oct 15, 2021

That one is different because it was for different rules as array raw data. That one can be solved by using a validator/option.

But here you reference the pattern in the same rule at runtime, the value user uses not the one I've hardcoded in the grammar.

@hildjj
Copy link
Contributor

hildjj commented Oct 16, 2021

See #196 for a fully-worked example of a stack when parsing XML.

@jcubic
Copy link
Author

jcubic commented Oct 16, 2021

I'm fine with this approach If you don't think it's a good idea to add this feature then you can close.

@hildjj
Copy link
Contributor

hildjj commented Oct 16, 2021

Let's keep it open a few more days in case someone else thinks it's a great idea and doesn't like the work-around.

@hildjj
Copy link
Contributor

hildjj commented Oct 25, 2021

Hearing nobody else yet, let's close this. Anyone who feels strongly about the feature, please still comment here and we'll re-open it.

@hildjj hildjj closed this as completed Oct 25, 2021
@jcubic
Copy link
Author

jcubic commented Dec 17, 2021

@Mingun I again needed to add validation with error (now I fully understand how this works), but I think that your example is not documented. The README don't shows that you can use {} anywhere in the code and that you can use this pattern:

name:(rule / '' { error('xxx') })

Are there any other patterns like this? If yes then they should be documented separately (maybe on Wiki list of useful single rule examples on one page), no one have time to read full grammar to search for patterns that you can use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants