-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation update on lexical lookup via quasis/injectiles, etc #410
Comments
Adding these from #410; if nothing else, this one is an interesting regression test.
Adding these from #410; if nothing else, this one is an interesting regression test.
Adding these from #410; if nothing else, this one is an interesting regression test.
This exploration, and the questions it raised, have been superseded by the answers found in #410
Just a quick note about terminology: "unique variable" was one of those terms I went through looking for the right term. It has its heart in the right place, but it didn't stick. What seems to have stuck (in the Alma source code at least) is the term "direct lookup" — as contrasted with "lexical lookup". In retrospect, I do like focusing on the lookup instead of the variable; it's the lookup that's different, after all. |
Holy carp! @vendethiel, you'd better take a look at this. I'm reading Shutt's thesis. I confess I'm reading it very slowly, and the reason I'm reading it very slowly is that it can't be read quickly. At least I can't. The problem is not that it's opaque (rather the opposite, in fact); the problem is that it packs a lot of very precise information in each sentence. And then I come across this, on page 111. For context, this is talking about Scheme's macro expansion hygiene, as laid down by R5RS.
This is a description of direct lookup. Scheme and Alma/Raku differ in the way that we've discussed in #567, but for this "subtle heart" part, those distinctions seem moot. There's a 1-to-1 correspondence here between "renamed to a gensym" and "turned into a direct lookup". I mean, I think I knew that, at some level. But it still feels both shocking and validating to know that Scheme and Alma macro hygiene actually converge like this. I still need to grok Scheme macro expansion. Probably I won't be able to rest until I've written at least one working implementation of it. |
That difference is declarative vs imperative. Scheme templates don't take arguments like Alma macros do, they just "match". The end result is the same, the way it's done is just different. |
So, I came in here just now to make this exact same point again. The "cross-stage persistence" of #567 maybe adds an extra overtone to the macro hygiene chord, but the fundamental frequency is the same: that macro expansion can introduce new bindings, which are (need to be) subject to uniqueness/hygiene requirements. With cross-stage persistence (XSP), a new binding can come not just from within the quasi itself, but also from the quasi's surrounding lexical context. Quoting @vendethiel:
Turning this around in my mind, I'm not 100% sure whether Scheme's "macro by example" have cross-stage persistence or not. I think they might, actually. Now I need to find out. (The litmus test would be something like this: I declare a Scheme macro that uses I used to be more worried about the damage the "imperative" part of Raku/Alma macros could do to hygiene. Nowadays I'm more relaxed, because the algorithm described in the OP mostly acts "around" the macro, so the macro is free to do pretty much whatever it wants (including explicitly unhygienic things), and it will mesh pretty well with the hygiene algorithm.
This still holds. I should get a round tuit. |
This is probably something I should publish as a blog post on strangelyconsistent. For now, there's time only to write it up here as a self-closing issue.
Who knows, maybe there's some interested reader out there for whom this is useful. The main use is for me to info-dump what I feel is a first complete, workable solution to hygiene in 007 and Perl 6. (As a consequence, I won't pull any punches. If the below sounds like complete gobbledygook, assume that's me, not you. But maybe see the examples at the end.)
I should also link to this gist because it contains the germs of the thinking that's outlined here. Think of the gist as the exploration part and this issue as its final distillation.
Unique variables
In order to explain the solution, let's introduce a non-established term: unique variables.
In general, each variable in the source code is declared somewhere, and is then used a number of times:
Without any further information, we don't actually know whether the variable
foo
corresponds to one single allocated location in memory, or several. Here are a few reasons it might be several:But there are also a few well-known cases where a variable declaration corresponds exactly to one single location — let's call such a variable (both its declaration and its usages) a unique variable. Here are some examples of variables that turn out to be unique:
In all of these cases, we can start to think about optimizations where we do the lookup at compile time, and replace the runtime lookup with just the unique location. (If we can infer that all we're ever doing is reading form that location, we can further optimize reading from the location down to just its resulting constant value.)
Why are macro variables unique?
The last point in the preceding list is the odd one out, so it bears pausing for a bit and motivating that one. Macros are routines just like functions, so on the face of it, they should sort into the top list, not the bottom one.
While it is true that a macro can be called multiple times, and that each such call will generate a fresh location for each of its variables, code generated in the macro will only ever see a fixed variable. Whenever the macro gets called again, it's another quasi being created; one does not simply declare the same variable in the same macro call twice. It might be related to the fact that the macro runs at compile time, so what's compile time for the injectile is actually runtime for the macro, even though that's... the same time.
I can't stress enough how this "happy coincidence" feels both necessary and sufficient in some way. In the sense that it's been really tricky to see how to make macro hygiene realistic... but the fact that quasis in macros only ever see unique variables in their surrounding macros means that, by a cosmic coincidence, all the runtime lookups that would have been "detached" (in the terminology of the gist) can instead be optimized away.
I dunno, YMMV, but to me it seems like quite a wonderful generalization. Who knew global variables and macro "closure" variables had this trait in common?
Implementation details
It gets cuter. Let's assume that it's possible both to uniquify a variable (to turn all its runtime lookups into global location accesses), and to de-uniquify it (to turn all its location accesses back into runtime lookups). Enough information needs to be stored in the location object itself to be able to restore it that way.
Two points in time are of interest during macro expansion:
I think a good metaphor here is the freeze-drying of foodstuffs. We freeze-dry the food in order to transport it long distances or store it for a long time. When the food arrives at its destination, we can rehydrate it, returning it to its original fresh state.
In the case of the quasi, there are some variables that we know won't be lexically available from the mainline code. The two "lookup sites" involved form an inverted Y shape:
All the variables in the left branch of the Y shape will be unavailable from the point of view of (2), because they are no longer part of the sequence of
OUTER
blocks from (2). Preemptively, the quasi interpolation uniquifies all the variables used in the quasi but declared outside of it. (This includes the "stem" of the Y shape.) Later, the quasi injection de-uniquifies those variables declared in the stem of the Y.Why? Because while uniquification is a "necessary evil" and something we need to do to get hygiene at all, non-unique variables are still preferable and more in line with the end user's intuition/expectations.
Some examples
Hesitant addendum: with those last two examples, we might be able to get our cake and have it, too, getting
"block"
and"module"
as outputs, respectively. A strong-enough "fixup mechanism" ought to be able to make those variables refer to their runtime locations.The text was updated successfully, but these errors were encountered: