Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A unification of access paths #574

Open
masak opened this issue Jul 15, 2022 · 2 comments
Open

A unification of access paths #574

masak opened this issue Jul 15, 2022 · 2 comments

Comments

@masak
Copy link
Owner

masak commented Jul 15, 2022

Re-reading this old comment made something click for me. In short, access paths need to be a protocol.

I don't think "protocol" has ever been defined clearly in the Alma dogma, so let's make an attempt here: protocols exist because we see we are doing roughly the same thing with various different language features, and we want to (openly and extensibly) provide the ability for future language extensions to do the same. The meaning of "protocol" is not too different from the meaning of "interface" or "extension point", but there's more of an emphasis on both ends of a server/client relationship (or provider/consumer) in protocols.

Separate example: loops naturally provide a protocol, enough to allow a consumer to (say) restart an iteration, go to the next iteration, or abort the loop. The statements redo, next, and last (respectively) are consumers of these capabilities (respectively) in the protocol. Extra bonus points for making labels do the right thing. If a macro both uses a loop and is a loop, there should be a simple way to "forward" the loop protocol from the inner loop to the outer (if you're thinking in terms of capabilities/provider) or from the outer loop to the inner (if you're thinking in terms of usage/consumer).

Access paths are a protocol. The roadmap makes a vague attempt to collect the various protocols we know about, but doesn't mention access paths, so I'm fairly sure this is something that didn't occur to me before. Let's see exactly how by going through what the protocol provides, and who are the consumers.

An access path is a type of postfix expression: school.getAll({ type: "student" })[42].name. This expression exhibits all our three postfix types: property (twice), call, and indexing. Anything with at least one of these postfixes is an access path.

Conceptually, the access path protocol means that the semantics of doing access lookup can be overridden. All other things being equal, however, it's not, and the normal thing happens, namely that those prefix ops are evaluated according to their normal semantics. A compiler is able to know this statically, meaning that the abstraction is "zero-cost", and the access path protocol doesn't leave any checks or extra instructions in generated code.

Access paths have a base expression, a trunk, and a tip. In the example, these are respectively school, .getAll({ type: "student" })[42], and .name. The trunk can be empty.

exists

The prefix:<exists> operator (#292) is a consumer of the access path protocol. It re-uses the standard semantics for the base expression and the trunk, but it provides its own instructions for the tip.

The way it does this is that it selectively overrides an interpreter for a set of events that indicate base expression, trunk postfixes, and tip postfixes. Again, exists only cares about and overrides tip postfixes.

I will provide a more detailed design of the events and the interpreter later. The main point here is that exists can selectively override what happens at the tip.

Incidentally, exists also declares itself a provider of the access path protocol. This is so that later, delete exists will work.

delete

Similarly to exists, prefix:<delete> is a consumer of the access path protocol, and provides its own instructions for the tip, namely to do deletion.

Unlike exists, delete does not transparently re-provide the access path protocol. Maybe you can think of it as too side-effect-y to do so; the semantics of delete is deletion, not an access path.

(The return value of the delete, however, is whatever the original lookup was. Or early abortion/error if the original lookup failed abruptly either at the trunk or at the tip.)

delete exists

The delete macro is aware of the exists macro (but not the other way around). The delete macro can reliably check whether the access path it has is actually an exists macro invocation.

In such a case, instead of just changing the tip semantics to carry out deletion, it

  • Instantiates an interpreter for the exists semantics
  • Uses that interpreter to get a boolean b from the tip
  • Does its usual deletion
  • Returns b as a result

In other words, delete knows how to work with exists in order to both do its own deletion and return upwards the existence boolean.

(Edit: Coming back and reading this later, it strikes me that the above description is an after advice; the delete macro adds the deletion side effect on top of the original behavior/return value. This is true both in the just-delete case, and in the delete exists case.) (Edit: So, maybe delete doesn't have to be explicitly aware of exists after all?)

exists delete

Does not work because exists expects an access path and delete does not provide the access path protocol.

(Edit: Additionally, even if it did work, is not all that useful, is it. Reading from the inside out, we delete something, and then ask whether it exists? Of course it doesn't, you silly cat.)

Assignment

Note that assignment to lexical variables is not included here. Only access paths. (Edit: Still, shoutout to #214 because this covers the "lvalues" use case for everything but lexical variables.)

This takes care of an awkward transformation somewhere in the compiler; we're essentially outsourcing the assignment of access paths to the access paths protocol.

The result of an assignment is the evaluated value of the rhs. Assignments do not interact well with either exists or delete; assignments do not provide the access path protocol.

Null coalescing

The null coalescing operator (#229) is presented only in terms of property access, but surely it's equally applicable to all three postfix operators:

  • school?.getAll — property access — maybe school is none
  • school.getAll?.({ type: "student" }) — call — maybe school.getAll is none
  • school.getAll({ type: "student" })?.[42] — indexing — maybe the result of the call is none

The added dots here makes the syntax consistently ?. for this operator, which is kind of nice. Raku has a different operator spelled .? called the safe method call operator which we are not considering here.

Each of the three null coalescing operators provides two new event types for the access protocol: one for the trunk and one for the tip. Because these are new event types which are not likely to be known by other consumers, a "standard interpretation" is also provided:

  • For the trunk, if the lhs is found to be none, the bail() function is triggered. The standard interpretation of bail() is to abort interpretation of events at that point and immediately return none. Otherwise, the default semantics for the trunk.
  • For the tip, if the lhs is found to be none, bail() is triggered, otherwise the default semantics for the tip is used.

exists + null coalescing

The exists macro makes sure to also provide a different bail() implementation, namely to abort and return false.

(Edit: You might think that some kind of try/catch might be enough here, listening for "access path step failed"-style exceptions being thrown. That is not specific enough, though; we only want to catch failures from this access path, not lower ones that might bubble up through the call stack via function calls in this access path.)

delete + null coalescing

The delete macro provides a no-op implementation of bail() — if we didn't get all the way to the tip by way of short-circuiting via a null coalescing operator whose lhs was none, then we trivially succeed in our quest to delete the tip. The value of the whole delete <path> expression is none.

delete exists + null coalescing

The details from delete exists are still in play, and exists still provides its own bail() implementation. So this case just works out.

This is the most advanced case. Here is an example:

delete exists school?.getAll({ type: "student" })[42];

This line of code:

  • Would delete the index-42 element from the result of the getAll call, if such an element exists.
  • Would return true if there was such an element to delete.
  • Would return false either if there wasn't such an element to delete, or if we bailed earlier because school was none.

Phew!

Assignment + null coalescing

(Edit: This section added much later, and needs to be filled out. Preferably this should not be allowed, because a null coalescing access pathchain doesn't provide lvalues (since it sometimes returns none). Just dropping the assignment on the floor in case of a none feels reckless and like a source of bugs. On the other hand, throwing an error feels counter to the whole idea of null coalescing.)

Smalltalk and Kernel

I once read that Smalltalk got its greatness because the extreme late-binding meant that the AST is essentially free to be interpreted by any interested party — if you want to provide a completely different semantics to the standard operators, that's just a matter of sending the same messages to some different objects. That image always appealed to me.

It feels like that's what we're doing here. We're identifying access paths as a place where a lot of interests converge — standard evaluation, exists, delete, assignments, and null coalescing. Providers of the access path protocol can provide new event types (but don't have to), whereas consumers can choose to interpret those events with their own desired semantics. In the end, since both providers and consumers are statically manifest in the code, the whole interpreter can be inlined as very efficient target code — there's essentially no overhead.

All this reminds me of how Kernel's operatives are suspiciously static and inline-able, like macros. I still have no good explanation for why that might be so — it's either a severe lack of imagination on the part of the people authoring operatives, or it's a sign of a deeper connection between macros and operatives.

@masak
Copy link
Owner Author

masak commented Jul 19, 2022

delete exists

[...]

In other words, delete knows how to work with exists in order to both do its own deletion and return upwards the existence boolean.

One thing that I really like which falls out from this design is the following: various incarnations of delete for data structures in various languages/libraries tend to have a useful return value, and therefore break the command-query separation. Now, we all know that's a guideline rather than a law, and I've mostly made peace with the fact that we sometimes benefit from bending it a bit. I think the "spirit of the law" is largely preserved even when delete returns something — but my complaint is this: it doesn't read well.

[user]: delete x from S
[computer]: true

true what? This could mean "it was indeed deleted, my liege", or "there are still elements in S", or "the thing we just deleted was truthy", or even "the storage taken up by the slot storing x in S was immediately reclaimed and the proceeds were given to charity". I grant that the first interpretation is likely, but (my point) not by much. Writing if (delete x.y.z) { ... } compounds this problem. What are we asking here?

(Edit: This is, of course, an instance of boolean blindness. The true above has been semantically separated from whatever question was being asked to give that answer.)

Here is a list of popular languages (and their collection APIs) in flagrante committing this CQS violation:

  • Java implements .remove on collection types. It has a boolean return value. (Ditto .removeAll)
  • C++ STL either returns a new iterator (fair enough), or a size_type (essentially an "enhanced boolean") saying how many elements were deleted.
  • JavaScript's newish Set has a .delete method which returns "a boolean asserting whether an element was successfully removed or not". The older built-in delete operator returns a boolean, but it's always true except in very unlikely circumstances. Thanks, JavaScript!
  • Python is the only language on this list which gets this right. By virtue of being a statement, del for deletion doesn't even have a return value. The documentation for object.__del__(self) talks about scary interactions with garbage collection, but it does not mention caring about the return value of this method at all.

Anyway, writing if (exists x.y.z) { ... } (query-only) is obviously meaningful, and (IMO) writing if (delete exists x.y.z) { ... } also squeaks in as successfully combining the command and the query in a way that reads well. Java, C++, and JavaScript expect us to use the command method remove/erase/delete as a combined command/query, even though there's no syntactic indication that it can even act as a query. With delete exists, we do get such a syntactic hint.

@masak
Copy link
Owner Author

masak commented Oct 24, 2023

In short, access paths need to be a protocol.

I've since also seen access paths being called "member access chains". I kind of like that. Not sure how married I am to the "member" gloss, but "chain", definitely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant