Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

peg: add support for "true" and "false" primitives to always/never match #1187

Merged
merged 2 commits into from
Jun 18, 2023

Conversation

CosmicToast
Copy link
Contributor

The use cases involve user-expandable grammars.
For example, consider the IRC nickname specification.

They SHOULD NOT contain any dot character ('.', 0x2E).
Servers MAY have additional implementation-specific nickname restrictions.

To implement this, we can do something along these lines:

(def nickname @{:main '(some :allowed)
                :allowed (! (+ :forbidden/dot :forbidden/user))
	        # for lax mode, (put nickname :forbidden/dot false)
	        :forbidden/dot "."
		# to add your own requirements
		# (put nickname :forbidden/user 'something)
		:forbidden/user false})

Additionally, it's common in parsing theory to allow matches of the empty string (epsilon). true essentially allows for this.

Note that this does not strictly add new functionality, you could emulate this previously using 0 and (! 0) respectively, but this should be faster and more intuitive.
The speed improvement primarily comes from (! 0) which is now a single step.

The use cases involve user-expandable grammars.
For example, consider the IRC nickname specification.
> They SHOULD NOT contain any dot character ('.', 0x2E).
> Servers MAY have additional implementation-specific nickname restrictions.

To implement this, we can do something along these lines:
```janet
(def nickname @{:main '(some :allowed)
                :allowed (! (+ :forbidden/dot :forbidden/user))
	        # for lax mode, (put nickname :forbidden/dot false)
	        :forbidden/dot "."
		# to add your own requirements
		# (put nickname :forbidden/user 'something)
		:forbidden/user false})
```

Additionally, it's common in parsing theory to allow matches of the
empty string (epsilon). `true` essentially allows for this.

Note that this does not strictly add new functionality, you could
emulate this previously using `0` and `(! 0)` respectively, but this
should be faster and more intuitive.
The speed improvement primarily comes from `(! 0)` which is now a single
step.
@sogaiu
Copy link
Contributor

sogaiu commented Jun 11, 2023

As a comparison, to and thru were also possible to emulate before but they turned out to be very handy and easier to understand at a glance.

I suppose to and thru reduce cognitive load in at least two ways -- what you're looking at is shorter and it's less confusing / easier to understand at a glance.

Here, it's perhaps not as puzzling (though perhaps it's less puzzling once you've learned what it means compared to (if-not ...) for the case of to / thru), but it isn't shorter. Faster is nicer though :)


As a minor point, please consider using astyle to format via make format :)

@sogaiu
Copy link
Contributor

sogaiu commented Jun 11, 2023

TYVM!

@sogaiu
Copy link
Contributor

sogaiu commented Jun 12, 2023

Somewhat related...I had failed to notice that one of the website doc examples here uses 0:

(def my-grammar
 '{:a (* "a" :b "a")
   :b (* "b" (+ :a 0) "b")
   :main (* "(" :b ")")})

@sogaiu
Copy link
Contributor

sogaiu commented Jun 12, 2023

Would another possible approach be to add a couple of key-value pairs to default-peg-grammar? So the keys would be :true and :false.

I guess that:

  • wouldn't be as fast
  • not be as elegant
  • might conflict with existing grammars

@bakpakin
Copy link
Member

For pegs there is actually no perf penalty for indirection like that - but perhaps names like :always and :never might make more sense. I don't mind the short notation but I do see that many people prefer more verbose and descriptive style with pegs

@CosmicToast
Copy link
Contributor Author

I do think true/false makes sense to use here, because it is fairly intuitive, and it's nicer to have built-in types have meaning rather than be a compiler error (i.e PEG grammars are a bit closer to being total, which is a nice-to-have).
Additionally, it means that this approach is 100% compatible - no prior grammar uses these because it used to be a compiler error to use them.
Additionally, I can't really think of anything else that a true or a false could mean in the context of a PEG literal, except maybe exactly 1 and -1 respectively.

On the other hand, :always and :never are also not a huge compatibility concern, since any grammar that defines them would be overriding them anyway.
Perhaps both options could be done?
i.e in addition to this, define {:always true :never false}, so both sets of intuitive preference are accounted for.

@sogaiu
Copy link
Contributor

sogaiu commented Dec 15, 2023

To follow up a bit on this, was working on wrapping my head around related details and found that possibly the original example grammar could use a bit of tweaking.

This is what I ended up with (including some example invocations):

(def nickname
  ~@{:main (capture (sequence (some :allowed) -1))
     :allowed (if-not (choice :forbidden/dot :forbidden/user) 1)
     # for lax mode, (put nickname :forbidden/dot false)
     :forbidden/dot "."
     # to add your own requirements
     # (put nickname :forbidden/user 'something)
     :forbidden/user false})

(peg/match nickname "alice")
# =>
@["alice"]

(peg/match nickname "good.alice")
# =>
nil

# operating on a clone of the table might be better sometimes?
(put nickname :forbidden/dot false)

(peg/match nickname "alice")
# =>
@["alice"]

(peg/match nickname "good.alice")
# =>
@["good.alice"]

@iacore
Copy link
Contributor

iacore commented Dec 16, 2023

it's nicer to have built-in types have meaning rather than be a compiler error (i.e PEG grammars are a bit closer to being total, which is a nice-to-have)

i thought having compiler error is a good thing!

overall, I don't think this feature is needed.

@sogaiu
Copy link
Contributor

sogaiu commented Dec 16, 2023

I think true / false as peg specials bring something to the table -- a bit like having to and thru (see here for a bit more on that).

On a side note, I attempted an implementaion of them in margaret.

@CosmicToast
Copy link
Contributor Author

i thought having compiler error is a good thing!

This lacks nuance.

First, let's establish that, as a blanket statement, it's simply incorrect.
If more compiler errors are always better, then the best compiler would be a shell script that returns an error.
Consequently, compiler errors are good because (and therefore when) they serve a purpose.

This purpose is typically to prevent what would have been a runtime error, but without requiring that the user actually hit that code path.
In other words, compiler errors are only useful when the code would otherwise trigger a runtime error.
This raises the question of when runtime errors are good.

Runtime errors are good when the behavior is either blatantly incorrect, or virtually guaranteed not to be the intended one.
In this case, the usage is fairly intuitive, so the latter doesn't fit, and the patch fixed the former.
Which brings us to whether the behavior becoming correct is good.

Generally speaking, within reason, it's good when language built-ins are total.
Totality in mathematics essentially means that all inputs are valid.
In this case, true and false are reasonable inputs, as they are associated with intuitive usage, and therefore it's good to render the function more total.

To drive the point home, consider: what if existing functions were less total?
Say you can do (+ 1 2), and (+f 1.1 2.2), but neither (+ 1.1 2) nor (+f 1 2.2)?
What if we made those compiler errors? Would that be a positive?

overall, I don't think this feature is needed.

As for whether the feature is needed, a better time to discuss that would have been before it was merged, half a year ago.
If you want it removed now, you might want to ask for it being removed, which would in turn be a breaking change.
Regardless, maybe let's not continue necroposting this now-ancient PR?
(I would vote for closing the comments at this stage, if someone wants to add something further, they can reference this PR separately, and if someone aims to remove the feature, it should probably be a separate issue/PR as well).

@sogaiu
Copy link
Contributor

sogaiu commented Dec 17, 2023

I appreciate the detailed explanation wrt compiler errors and overall I agree with the most recent comment -- including that a better time to discuss necessity would have been way earlier.

However, I don't agree about closing further comments outright.

This is the most natural place to follow-up on the example content and in this case, the original grammar was not functional as-is:

(def nickname @{:main '(some :allowed)
                :allowed (! (+ :forbidden/dot :forbidden/user))
	        # for lax mode, (put nickname :forbidden/dot false)
	        :forbidden/dot "."
		# to add your own requirements
		# (put nickname :forbidden/user 'something)
		:forbidden/user false})

I've definitely found it helpful to be able to add comments to PRs and issues later when further details have surfaced. It has sometimes helped with later investigation as things tend to be more consolidated. Searching on this forge is not very good to put it mildly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants