-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Suggestion: A form of proper tail calls #128
base: master
Are you sure you want to change the base?
Conversation
Added example of the passto statement in action.
Passto example now prints signed integers correctly on 16-bit architectures.
Linker is now aware of tail calls, knows their workspaces can overlap Passto statement support added to BASIC architecture, as best as possible. Passto statement support added to cgen. Tests added for passto statement. Currently passes on cgen, lx386, and msdos. Previously added passto example is now called 'cowcalc.' 'passto' is now a simpler example.
… not used. Commented out debug print()s in cowlink.
Fixed hack that used -1: uint16 in passto.test instead of 0xFFFF. Only worked by mistake.
…ctures. No support for 6502 interpreter architecture, as it is no longer used.
Passto examples now added to build script.
I'm sorry this has taken so long to get back to! I've been sunk in other projects (including finishing a book!)... Thanks very much for the PR; this is a really good idea and would be extremely useful. I'm planning on picking Cowgol up again for use in a different project, and this'd be useful. I've gone through the code; my biggest concern is the PREPARETAIL/TAILCALL split. It seems clunky --- the only reason for PREPARETAIL is to remove the stack pointer before the arguments get pushed, so that it can get put back under the arguments. If we're going to add an opcode for this maybe it'd be better to rethink the way return addresses are handled entirely. For any procedure with input parameters, the return address will get popped anyway in the function prologue --- maybe it'd be better to just stash that somewhere other than the stack? Of course, that'd make returning from procedures with no output parameters much more expensive. (I also notice that several implementations of PREPARETAIL put the return address in a register. e.g. a5 on the 68000. This will fail if the register gets used as the input parameter expressions get evaluated. Unfortunately the register allocator can only track registers within an expression so that won't be trivial to fix. A potentially not-great early design decision means that opcodes can only have two operands, and TAILCALL is already using both, so just adding it there isn't possible.) I have been very vaguely mulling the ability to do non-local jumps, for things like exception handling and returning from inner subroutines. (This would allow a better case..end case, for example, where each condition's body could be an implicit nested subroutine.) As that's notionally similar maybe some sort of more generic continuation scheme could cover both use cases. BTW, I was about to suggest |
I completely understand, I hop back and forth between passions as well. My own delay in responding should speak to that. I'm glad to see you like the proposal. PREPARETAIL did feel ugly when I introduced it, but I felt there was no other way to avoid it using the newgen syntax while keeping also the current calling conventions intact. Since you feel that more aggressive changes to that convention are needed, I feel there are a few different schemes that could be used, with the one taken depending on the needs of particular architectures: 1. Store return address in memory.This is already in use in the 6502 architecture, and could work for architectures of similar complexity. In cases where output parameters are used, the return address is copied to a fixed address, and an indirect jump is used to return from the function. It should be possible to only pay this cost in those subroutines where 2. Separate return address workspace.This is similar to the first one, but allows for memory savings. 3. Leave return address in place.This strategy works best for architectures that support stack-relative addressing. Input parameters are pushed onto the stack, reserving space for the output parameters if needed. The return address is left in place, to be either used in a 4. Linker changes.This is the most sweeping change, one I would not pursue in a pull request. David Wheeler, a man also interested in programming languages for small systems, proposes here that an optimizing linker like Cowgol has could eliminate the need to copy variable data entirely. The called function would write directly to the caller's activation record, without ever having to touch the stack. 5. The generic continuation scheme.I would need to have more information on what you are considering to try this. I am in particular interested in how this would work with the C implementation, which doesn't really have a way to do non-local labels. For the moment, I will change P.S: What is the naming scheme for variables in Cowgol? I have not been able to tell, several different ones appear to be used. |
Problem
Writing a fast interpreter, parser, or other state machine in the current version of Cowgol is hard (see here). The current codegen focuses on size instead of time, which makes sense for the platforms Cowgol is intended for. It is useful to have the choice of time vs. size optimization, however. Case statements can also stretch on, and on, and on, which apart from being hard to read can be difficult to optimize for in itself. The proper tail call is a solution that solves both the performance and structuring issues, and I've been able to bring it into Cowgol.
Overview
The
passto
statement is structured like this:To the user, it behaves identically to a subroutine call followed immediately by
return
. It can only be used if the subroutine named within it (the 'catcher') has the same output parameters as the subroutine which uses it (the 'passer').Adding tail calls in this explicit form is simpler than identifying every call that is in tail position. The name suggests its purpose: while a normal subroutine call is a vertical transfer of control,
passto
is horizontal transfer of control. Other names I considered weregoto
, which was rather cheeky, anddelegate
, which seemed more to suggest vertical transfer.Full Semantics
The Lemon grammar is:
To use
passto
the passer must have the same output parameters as the catcher named in the statement.A passer must be on the same level of nesting as the catcher.
Upon execution of the statement, execution of the passer ends. Its activation record is discarded, and execution of the catcher begins in its place. A catcher can also call other passers using normal subroutine calls, allowing for nested levels of passing.
Should the catcher return normally or use the
return
statement, control flow will return to the subroutine which called the passer using a normal subroutine call. If multiple subroutines each passed to each other, control flow returns to the latest time a passer was called normally.The subroutine which called the passer is not required to know that the passer uses
passto
; it looks like any other subroutine.It should be possible for two or more subroutines to pass to each other, never returning normally. This should be possible indefinitely, without any assumptions on how deeply these subroutines can mutually recurse.
Implementation
I added two new midcodes: PREPARETAIL and TAILCALL. PREPARETAIL in most cases removes the return address from the top of the stack and saves it in a register. This is so TAILCALL can place the arguments on the stack without burying the return address.
TAILCALL is much like CALL. After placing the arguments on the stack, it then places the return address back on the stack and jumps to the new subroutine instead of calling it. As far as the catcher is concerned the stack is the same as if it had been called normally. The C implementation uses a trampoline; I've put in some effort to keep the overhead small when
passto
is not in use.A new field has been added to the reference record in COO files, tracking whether a reference is a tailcall or not. This does increase the size of COO files by a small degree. Adding a new record type for tailcall references could provide better density.
The linker is now aware of tailcalls and will place their activation records so that they overlap. A small amount of cycle detection is needed because of how interface references work.
That's really the scale of the changes. The impact on the compiler's complexity is small.
Examples
I designed a Brainfuck interpreter using different forms of interpreter design as a benchmark of the feature. It can be found here.
The repo has a new example called
cowcalc.cow
, which is an RPN calculator implemented usingpassto
. It gives an idea of what a state machine using the new feature looks like.I suppose I have to include a traditional tail-recursion example, as well: