Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of the ranges (with delimiters) #291

Merged
merged 9 commits into from
Feb 21, 2023
Merged

Conversation

Mingun
Copy link
Member

@Mingun Mingun commented Jun 11, 2022

Reincarnation of the pegjs/pegjs#209.
Previous attempt, closed by technical reasons: #208

This is an implementation of the ranges proposal with batteries, i.e.:

  • ability to use numeric constants to specify minimum, maximum or exact repetition count:
    more2 = "a"|2..|;
    upTo3 = "a"|..3|;
  • ability to use preceding label as a range boundary:
    list = count:n5 @n5|count|;
    n5 = n:[0-9]|5| { return parseInt(n); };
  • ability to use function as a boundary:
    list = "a"|{ return options.listSize; }|;

The syntax chosen saves the < and > characters for the template definitions where their are more natural.
[ and ] already used for character class definitions and
( and ) already used for grouping.

@reverofevil
Copy link

What if there was a way to define parser-functions, and it would be just repeat(x, 2..) and repeat(x, ..3)?

I think by creating a new syntax for every feature it will resemble Perl and its regular expressions quite soon (arguably, it already resembles it way too much).

The thing not mentioned in PR description, but probably the most awaited by me, is delimited repetition (a|.., b|).

  1. Is there a CI version of this PR to play with it?
  2. What AST does it generate?
  3. I remember there was a discussion in PEG.js issues to add it as a % b operator. Does this PR subsume that feature request?
  4. I didn't find tests attempting to break parsing of || brackets. Can we be sure that some .. | .. | | .. , .. | .. , .. | | will get parsed properly?

@Mingun
Copy link
Member Author

Mingun commented Jun 11, 2022

  1. You can checkout this branch, build minified version of a parser
    npm run build
    # or only
    npm run rollup
    npm run terser
    npm run deploy
    and go to docs/online.html
  2. You can use a recently added peggy --ast option to look at it
  3. I personally against any syntax which does not clearly indicate where the repeat expression ends. Using a % b falls into this category. Last time I tried to summarize possible options here
  4. Yes, but you should take into account, that you cannot put two suffix operators one after another, you must wrap the first in parentheses:
    // This is all forbidden
    start1 = .. | .. | | .. , .. | .. , .. | |;
    start2 = .**;
    start3 = .++;
    start4 = .??;
    
    // This will work
    start5 = .(. | .. |) | .. , .. | .. , .. | |;
    start6 = (.*)*;
    start7 = (.+)+;
    start8 = (.?)?;

@hildjj
Copy link
Contributor

hildjj commented Jun 11, 2022

What if there was a way to define parser-functions

I think we will soon need to determine what our long-term syntax extensibility approach is, before we run out of adequate syntax to act as the extensibility point. I don't think repeat(n) will work, since that looks like a rule named repeat followed by a group. <repeat min="a" max="b">...</repeat> would work, but that's pretty ugly, and even I don't like XML that much.

Copy link
Contributor

@hildjj hildjj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry this took so long to get to.

The only substantive thing here is to figure out what happens with names being passed to nested repeats.

Once we get that nailed, and get this rebased, I'm ready to land it and start working toward a release.

docs/documentation.html Outdated Show resolved Hide resolved
lib/compiler/opcodes.js Show resolved Hide resolved
lib/peg.d.ts Outdated Show resolved Hide resolved
lib/peg.d.ts Outdated Show resolved Hide resolved
docs/js/examples.peggy Show resolved Hide resolved
src/parser.pegjs Outdated Show resolved Hide resolved
@hildjj hildjj mentioned this pull request Jul 20, 2022
23 tasks
@hildjj hildjj added the blocks-release Blocks an imminent release. High Priorty. label Jul 20, 2022
@hildjj
Copy link
Contributor

hildjj commented Feb 15, 2023

@Mingun do you have time to work on this? If not, I'll try to rebase it at least.

@Mingun
Copy link
Member Author

Mingun commented Feb 15, 2023

I'll look at weekend

```
expression|  exact |
expression|   ..   |
expression|min..   |
expression|   ..max|
expression|min..max|
```

Introduce two new opcodes:
* IF_LT <min>, <then part length>, <else part length>
* IF_GE <max>, <then part length>, <else part length>

Introduce a new AST node -- `repeated`, that contains expression and the minimum and maximum number of it repetition.
If `node.min.value` is `null` or isn't positive -- check of the minimum length isn't made.
If `node.max.value` is `null`, check of the maximum length isn't made.
If `node.min` is `null` then it is equals to the `node.max` (exact repetitions case)
Added two new opcodes:
- IF_LT_DYNAMIC: same as IF_LT, but the argument is a reference to the stack variable instead of constant
- IF_GE_DYNAMIC: same as IF_GE, but the argument is a reference to the stack variable instead of constant
@Mingun
Copy link
Member Author

Mingun commented Feb 19, 2023

Ok, the bug seems to be fixed, I'll check it once again on this week and add a changelog entry for plugins' authors.

@Mingun
Copy link
Member Author

Mingun commented Feb 21, 2023

The diff between the 7-month old version and the current:

  • added forgotten tests for function-based boundaries
  • fixes some minor errors in tests for variable-based boundaries
  • added a notice for plugins' authors
  • fix some minor misprints in tests

All done, ready for merge.

@hildjj
Copy link
Contributor

hildjj commented Feb 21, 2023

This is fantastic work. Thank you very much.

I'm going to merge as-is, then when I do a full review of the documentation page, there may be a few nits I clean up while fixing other things.

@hildjj hildjj merged commit 2904d18 into peggyjs:main Feb 21, 2023
@Mingun Mingun deleted the ranges branch February 21, 2023 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocks-release Blocks an imminent release. High Priorty.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants