Feature Suggestion: A form of proper tail calls #128

shadowofneptune · 2022-12-28T07:42:00Z

Problem

Writing a fast interpreter, parser, or other state machine in the current version of Cowgol is hard (see here). The current codegen focuses on size instead of time, which makes sense for the platforms Cowgol is intended for. It is useful to have the choice of time vs. size optimization, however. Case statements can also stretch on, and on, and on, which apart from being hard to read can be difficult to optimize for in itself. The proper tail call is a solution that solves both the performance and structuring issues, and I've been able to bring it into Cowgol.

Overview

The passto statement is structured like this:

passto subroutine(arguments);

To the user, it behaves identically to a subroutine call followed immediately by return. It can only be used if the subroutine named within it (the 'catcher') has the same output parameters as the subroutine which uses it (the 'passer').

Adding tail calls in this explicit form is simpler than identifying every call that is in tail position. The name suggests its purpose: while a normal subroutine call is a vertical transfer of control, passto is horizontal transfer of control. Other names I considered were goto, which was rather cheeky, and delegate, which seemed more to suggest vertical transfer.

Full Semantics

The Lemon grammar is:

 statement ::= PASSTO startsubcall inputargs(INA) SEMICOLON.

To use passto the passer must have the same output parameters as the catcher named in the statement.
A passer must be on the same level of nesting as the catcher.
Upon execution of the statement, execution of the passer ends. Its activation record is discarded, and execution of the catcher begins in its place. A catcher can also call other passers using normal subroutine calls, allowing for nested levels of passing.
Should the catcher return normally or use the return statement, control flow will return to the subroutine which called the passer using a normal subroutine call. If multiple subroutines each passed to each other, control flow returns to the latest time a passer was called normally.
The subroutine which called the passer is not required to know that the passer uses passto; it looks like any other subroutine.
It should be possible for two or more subroutines to pass to each other, never returning normally. This should be possible indefinitely, without any assumptions on how deeply these subroutines can mutually recurse.

Implementation

I added two new midcodes: PREPARETAIL and TAILCALL. PREPARETAIL in most cases removes the return address from the top of the stack and saves it in a register. This is so TAILCALL can place the arguments on the stack without burying the return address.

TAILCALL is much like CALL. After placing the arguments on the stack, it then places the return address back on the stack and jumps to the new subroutine instead of calling it. As far as the catcher is concerned the stack is the same as if it had been called normally. The C implementation uses a trampoline; I've put in some effort to keep the overhead small when passto is not in use.

A new field has been added to the reference record in COO files, tracking whether a reference is a tailcall or not. This does increase the size of COO files by a small degree. Adding a new record type for tailcall references could provide better density.

The linker is now aware of tailcalls and will place their activation records so that they overlap. A small amount of cycle detection is needed because of how interface references work.

That's really the scale of the changes. The impact on the compiler's complexity is small.

Examples

I designed a Brainfuck interpreter using different forms of interpreter design as a benchmark of the feature. It can be found here.

The repo has a new example called cowcalc.cow, which is an RPN calculator implemented using passto. It gives an idea of what a state machine using the new feature looks like.

I suppose I have to include a traditional tail-recursion example, as well:

include "cowgol.coh";

sub countloop(n: uint32) is
	if n > 0 then
		print_i32(n);
		print(", ");
		passto countloop(n - 1);
	else
		print("lift-off!\n");
	end if;
end sub;

print("T-minus ");
countloop(10);

Added example of the passto statement in action.

Passto example now prints signed integers correctly on 16-bit architectures.

Linker is now aware of tail calls, knows their workspaces can overlap Passto statement support added to BASIC architecture, as best as possible. Passto statement support added to cgen. Tests added for passto statement. Currently passes on cgen, lx386, and msdos. Previously added passto example is now called 'cowcalc.' 'passto' is now a simpler example.

… not used. Commented out debug print()s in cowlink.

Fixed hack that used -1: uint16 in passto.test instead of 0xFFFF. Only worked by mistake.

…ctures. No support for 6502 interpreter architecture, as it is no longer used.

Passto examples now added to build script.

davidgiven · 2023-02-25T23:28:25Z

I'm sorry this has taken so long to get back to! I've been sunk in other projects (including finishing a book!)...

Thanks very much for the PR; this is a really good idea and would be extremely useful. I'm planning on picking Cowgol up again for use in a different project, and this'd be useful.

I've gone through the code; my biggest concern is the PREPARETAIL/TAILCALL split. It seems clunky --- the only reason for PREPARETAIL is to remove the stack pointer before the arguments get pushed, so that it can get put back under the arguments. If we're going to add an opcode for this maybe it'd be better to rethink the way return addresses are handled entirely. For any procedure with input parameters, the return address will get popped anyway in the function prologue --- maybe it'd be better to just stash that somewhere other than the stack? Of course, that'd make returning from procedures with no output parameters much more expensive.

(I also notice that several implementations of PREPARETAIL put the return address in a register. e.g. a5 on the 68000. This will fail if the register gets used as the input parameter expressions get evaluated. Unfortunately the register allocator can only track registers within an expression so that won't be trivial to fix. A potentially not-great early design decision means that opcodes can only have two operands, and TAILCALL is already using both, so just adding it there isn't possible.)

I have been very vaguely mulling the ability to do non-local jumps, for things like exception handling and returning from inner subroutines. (This would allow a better case..end case, for example, where each condition's body could be an implicit nested subroutine.) As that's notionally similar maybe some sort of more generic continuation scheme could cover both use cases.

BTW, I was about to suggest goto as an alternative keyword when I realised you had already mentioned it! I do like it better than passto, and I do want to add proper goto at some point anyway --- jumping to labels and to other procedures are conceptually similar and unambiguous so it would save a keyword.

shadowofneptune · 2023-03-17T02:03:20Z

I completely understand, I hop back and forth between passions as well. My own delay in responding should speak to that.

I'm glad to see you like the proposal. PREPARETAIL did feel ugly when I introduced it, but I felt there was no other way to avoid it using the newgen syntax while keeping also the current calling conventions intact. Since you feel that more aggressive changes to that convention are needed, I feel there are a few different schemes that could be used, with the one taken depending on the needs of particular architectures:

1. Store return address in memory.

This is already in use in the 6502 architecture, and could work for architectures of similar complexity. In cases where output parameters are used, the return address is copied to a fixed address, and an indirect jump is used to return from the function. It should be possible to only pay this cost in those subroutines where goto subroutine() is used.

2. Separate return address workspace.

This is similar to the first one, but allows for memory savings.

3. Leave return address in place.

This strategy works best for architectures that support stack-relative addressing. Input parameters are pushed onto the stack, reserving space for the output parameters if needed. The return address is left in place, to be either used in a return or discarded in a goto.

4. Linker changes.

This is the most sweeping change, one I would not pursue in a pull request. David Wheeler, a man also interested in programming languages for small systems, proposes here that an optimizing linker like Cowgol has could eliminate the need to copy variable data entirely. The called function would write directly to the caller's activation record, without ever having to touch the stack.

5. The generic continuation scheme.

I would need to have more information on what you are considering to try this. I am in particular interested in how this would work with the C implementation, which doesn't really have a way to do non-local labels.

For the moment, I will change passto to goto and begin changing implementations to schemes 1, 2, and 3.

P.S: What is the naming scheme for variables in Cowgol? I have not been able to tell, several different ones appear to be used.

shadowofneptune and others added 10 commits December 14, 2022 09:11

Added the passto statement to the frontend.

8649e5f

Added passto statement support to 386 architecture.

59ad155

Added example of the passto statement in action.

Added passto support to 8086 architecture.

3c7098a

Passto example now prints signed integers correctly on 16-bit architectures.

Simplified cgen tailcall support: should have little impact if passto…

f4ee1a3

… not used. Commented out debug print()s in cowlink.

Added passto support to 68000, ARM, pdp11, PowerPC architectures.

f2bfe8a

Fixed hack that used -1: uint16 in passto.test instead of 0xFFFF. Only worked by mistake.

Added passto statement support to HD6303, 6502, 8080, and Z80 archite…

1334180

…ctures. No support for 6502 interpreter architecture, as it is no longer used.

Added a further correctness check to the passto frontend.

04420ed

Removed redundant checks from cowlink's new cycle detector.

5f22562

Passto examples now added to build script.

Removed copy of archpowerpc.cow committed by mistake

3f27d3b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Suggestion: A form of proper tail calls #128

Feature Suggestion: A form of proper tail calls #128

shadowofneptune commented Dec 28, 2022 •

edited

Loading

davidgiven commented Feb 25, 2023

shadowofneptune commented Mar 17, 2023 •

edited

Loading

Feature Suggestion: A form of proper tail calls #128

Are you sure you want to change the base?

Feature Suggestion: A form of proper tail calls #128

Conversation

shadowofneptune commented Dec 28, 2022 • edited Loading

Problem

Overview

Full Semantics

Implementation

Examples

davidgiven commented Feb 25, 2023

shadowofneptune commented Mar 17, 2023 • edited Loading

1. Store return address in memory.

2. Separate return address workspace.

3. Leave return address in place.

4. Linker changes.

5. The generic continuation scheme.

shadowofneptune commented Dec 28, 2022 •

edited

Loading

shadowofneptune commented Mar 17, 2023 •

edited

Loading