Add simpler loop construct to replace the current scan long-term #189

aseyboldt · 2023-01-09T19:38:16Z

aseyboldt
Jan 9, 2023
Maintainer

The scan code in pytensor is pretty involved, and over time grew into something that's pretty hard to understand and work with.
In the last design meeting we've been discussing if it would maybe be a good idea to replace it by something completely new, re-designed from ground up.

This is a first attempt at how such a replacement might look like.

First, we add a new loop construct, that's hopefully quite simple and can represent arbitrary loops, not just scan-like loops. The (Loop class below). Only some loops easily allow for reverse mode autodiff, and we represent those loops as loops where we know a reverse loop.

We then add scan as a function that builds a loop, and it's reverse from those building blocks.

A (pseudocode-ish) implementation of this idea:

class IfElse(Op):
    """Represent a true if-else statement. The current ifelse is I think
    actually a switch statement (at least in the impl of the numba backend).

    We store two internal FunctionGraphs, one for the if-branch and one
    for the else-statement.
    """
    __props__ = ("if_branch", "else_branch")

    def __init__(self, if_branch: FunctionGraph, else_branch: FunctionGraph):
        self.if_branch = if_branch
        self.else_branch = else_branch
        # assert same input and output types

    def make_node(self, condition, *inputs):
        return Apply([condition, *inputs], self.if_branch.outputs)

    def perform(self, inputs, output_storage):
        condition, *inputs = inputs
        if condition:
            out = self.if_branch(*inputs)
        else:
            out = self.else_branch(*inputs)

        output_storage[0] = out


class Loop(Op):
    """Represent a do-while loop.

    We represent the loop body as an inner FunctionGraph, which
    computes the next state and whether the loop should continue.

    Why a do-while loop instead of a while loop? Not entirely sure
    which would be a better primitive, but compiler people seem
    to like a do-while loop more, and often represent a while loop
    as a do-while loop within an if-statement. This seems to make
    some optimizations simpler.
    See eg https://releases.llvm.org/11.0.0/docs/LoopTerminology.html#rotated-loops

    Forward gradients can be implemented as a loop that uses the forward
    gradients of the update function.

    Reverse mode gradients needs to iterate through the loop in reverse order though,
    so we optionally allow a second argument that represents the reverse of the loop.
    In user code that reverse loop would probably usually be coming from the scan
    functions below.
    """
    def __init__(
        # (*state,  *consts) -> (bool, *state)
        update: FunctionGraph,
        reverse: Optional[FunctionGraph],
    ):
        self._state_types = update.outputs[1:]
        self._const_types = update.inputs[len(self._state_type):]
        self.update = update
        self.reverse = reverse
        # Todo validate output of update

    def make_node(self, *state, *constants):
        # assert [item.type for item in state] == self._state_types
        return Apply(
            [state_type() for state_type in self._state_types]
            + [const_type() for const_type in self._const_types],
            [state_type() for state_type in self._state_types],
        )

    def perform(self, node, inputs, output_storage):
        (*state, *consts) = inputs
        while True:
            go_on, *state = self.update(*state, *consts)
            if not go_on:
                break
        output_storage[0] = state

    def L_Op(self, *args):
        if not self.reverse:
            raise NotImplementedError()
        # Use L_Op of self.reverse.update
        ...

    def R_Op(self, *args):
        # Use R_op of self.update
        ...

def scan(fn, initial_state, *, max_iters, constants=None):
    """Transform variables in a loop and collect all intermediate states.

    Roughly equivalent to

    ```
    def scan(fn, initial_state, *, constants):
        trace = []
        state = initial_state
        i = 0
        while True:
            trace.append(state)
            # TODO we could also pass in trace I guess?
            go_on, state = fn(i, state, *constants)
            i += 1
            if not go_on:
                break
        return np.asarray(trace)
    ```

    If `initial_state` is a list, it will also return a list of traces.
    The number of iterations is bounded by `max_iters`, which is used
    to pre-allocate the result array.
    Additional constant arguments for fn can be passed through `constants`.
    """
    # More or less pseudocode...
    if constants is None:
        constants = []

    is_list = isinstance(initial_state, list)
    if not is_list:
        initial_state = [initial_state]

    traces = []
    for var in initial_state:
        # We append one dimension at the front of each initial state
        # to store all intermediate values
        trace_type = type(var.type)(shape=[None, *var.type.shape], dtype=var.type.dtype)
        traces.append(trace_type())

    n_states = len(state)

    def loop_advance(*variables):
        idx = variables[0]
        traces = variables[1:n_states]
        states = variables[n_states+1:2*n_states+1]
        constants = variables[2*n_states+1:]

        next_traces = [pt.set_subtensor(traces[i][idx], states[i]) for i in range(n_states)]
        next_idx = idx + 1
        go_on, *next_states = fn(next_idx, state, *constants)
        go_on = pt.and_(go_on, pt.le(next_idx, max_iters))

        return [go_on] + next_traces + next_states + constants

    def loop_reverse(*variables):
        idx = variables[0]
        traces = variables[1:n_states]
        states = variables[n_states+1:2*n_states+1]
        constants = variables[2*n_states+1:]

        prev_idx = idx - 1
        prev_states = [trace[prev_idx] for trace in traces]
        return [pt.ge(pre_idx, 1)] + traces + prev_states + constants

    init_traces = [
        empty((max_iters,) + var.shape, dtype=var.dtype)
        for var in initial_states
    ]

    initial_states = [pytensor.scalar.uint64(0), *initial_state, *init_traces, *constants]
    loop = Loop(loop_advance, reverse=loop_reverse)
    return loop(*initial_states)

cc @ricardoV94 @Armavica @lucianopaz

ricardoV94 · 2023-01-09T20:42:44Z

ricardoV94
Jan 9, 2023
Maintainer

What do you think about the implicit semantics we have now for "truncated" outputs. Right now if the input trace has shape==(3, ...) instead of shape==(n_states, ...) then only the last 3 states are returned (and kept in memory during looping)

This requires some overhead with rolling outputs and tracking the pointer. I think it may make sense (specially with taps), but I would like to make those at compilation time (if llvm is not clever enough to do it itself) and not part of the symbolic graph as is now.

For instance it doesn't make sense that it's as hard to make an optimized loop as shown in ##174 (comment)

2 replies

aseyboldt Jan 9, 2023
Maintainer Author

I'm really confused about the current tabs implementation. Without rev mode gradients, sure I see how that works, and should be pretty straight forward to implement using the Loop construct above as well.
But with rev mode grads? Doesn't it have to store all values somewhere? If not, I'm missing something important...

ricardoV94 Jan 9, 2023
Maintainer

The taps/trace optimization is only present when you don't need the intermediate steps, with gradients you need.

lucianopaz · 2023-01-09T20:48:30Z

lucianopaz
Jan 9, 2023
Maintainer

Thanks @aseyboldt . I’ll have to read this more in detail tomorrow, but from a first quick glance I can say that you don’t need to add another ifelse Op. the one in pytensor already does a lazy evaluation of each of the branches (at least according to its docs).
The other thing that I’m not super happy about is that the loop Op still detaches its inner graph from whatever it is that is calling the loop. That being said, it looks like you are trying to implement a loop with a single carry over value, so the taps logic from scan needs to be implemented by the user (which is fine by me). I’ve never managed to put my head around the theta and eval nodes of the e-graph IR of loop nodes, but those in principle seem to be able to represent the loop body transparently and not detach the inner graph from the outer graph.

3 replies

aseyboldt Jan 9, 2023
Maintainer Author

About if-else: it does evaluate everything in the numba (and I think jax) backends. The C linker has some special stuff going on that avoids it, but I think that's pretty strange design.... https://github.com/pymc-devs/pytensor/blob/main/pytensor/ifelse.py#L57

About that whole "inner graph" question: I think we have a choice here if we want to have language with loops: Either we have inner graphs, or we accept cycles into the graph. Most compilers have cycles I think, but over the last couple of weeks I think I got around to the inner-graph way of doing things. Here is a discussion about this in the egg repo: egraphs-good/egg#106 The linked paper was also quite helpful.

It would be cool if we could add a FunctionGraph type and variable though, so that we can treat that as a first class function, and also add an apply node...

lucianopaz Jan 9, 2023
Maintainer

Looks like this repo that combines egraphs and RVSDG together could serve as a pointer? https://github.com/jameysharp/optir/

aseyboldt Jan 9, 2023
Maintainer Author

Yes, I think it does something quite similar to this loop op: https://github.com/jameysharp/optir/blob/main/src/language.rs#L208
The representation of the inner graph is a bit different (it is directly part of the graph, using the distinction between all children and "same scope children". I think (?) they should be fairly easy to convert between. I think having this wrapped in a FunctionGraph is actually maybe a bit cleaner though.

ricardoV94 · 2023-01-10T03:53:02Z

ricardoV94
Jan 10, 2023
Maintainer

BTW the proposal is already 10x more appealing than the current. I suggest we start working on it sooner rather than later. @aseyboldt do you want to do it yourself or are you happy letting someone else tackle it?

2 replies

aseyboldt Jan 10, 2023
Maintainer Author

If someone else wants to give it a shot that would be great. There are more than enough projects to choose from right now. :-)

ricardoV94 Jan 10, 2023
Maintainer

I might take a stab at it this week.

ricardoV94 · 2023-01-10T04:31:46Z

ricardoV94
Jan 10, 2023
Maintainer

The other thing we should consider is how to transpile to JAX.

The IfElse maps well to lax.cond

The Loop can be converted to the jax while loop, but that is not differentiable by default.

When we have an actual for loop and not while loop, should we write it as a jax scan, so that autodiff works?

Also, why not fallback to a scan for autodiff in pytensor instead of raising as in the L_op example at the top? The biggest issue would be to avoid having a redundant loop and scan with the same inner function (minus the set_subtensor).

That reminds me of another option which is for the loop primitive to have two types of outputs: last state and the intermediate states. During compilation we could get rid of the intermediate states if they aren't used anywhere.

4 replies

aseyboldt Jan 10, 2023
Maintainer Author

Yeah, I've also been worrying a bit about jax compatibility here. We could always transform to jax after computing derivatives, but that would get rid of a lot of functionality after transformation.
But putting back the trace into the basic loop construct doesn't sound right to me as well. That's making the loop complicated again, and not just a loop.
I guess in general that might be a problem that could come up in some form or another in other contexts as well: depending on the backend we might want to dispatch at a different level. Maybe one option to solve that in more generally would be to wrap cases like this in an OpFromGraph, that has a special dispatch. So we could make it that the scan function above doesn't return the graph directly, but wraps it in OpFromGraph(jax_dispatch=something_using_jax_scan)? And only if the backend is not jax, do we then inline this inner graph and compile it down to a function on our own?

ricardoV94 Jan 10, 2023
Maintainer

We can have high level dummy Ops as in the scalar Op PR, and specialize to different formats during rewrite. Your code example seems like a good final representation for a while loop, but it doesn't need to be the first/only representation.

I am also unconvinced about the smart reverse function, I don't think it will generally exist except for very few trivial loops, so I am still inclined to build the core loop functionality assuming the only way to reverse a loop is to store intermediate states in a first pass.

ricardoV94 Jan 10, 2023
Maintainer

Other reasons you might want to have different high level loop dummy ops: For batching for loops (a la vmap) you don't need/want to batch the index, and not representing it initially as a while loop makes it easier to know that.

Similarly not representing scan as the specialized "update+set_subtensor of an initial empty trace" makes it easier to batch correctly.

Basically it's easier to "transform" loop graphs (be it for transpilation, auto-diff, batching, logprob inference in PyMC) if we can quickly look at it and see what "kind" of loop it is.

ricardoV94 Jan 10, 2023
Maintainer

I feel that part of why scan is so hard to work with is that they went the other way and represented scan symbolically as always being a "scan". Then you need silly logic to find out when only the last output is requested. I would avoid flipping the coin and making scans hard to identify.

Add simpler loop construct to replace the current scan long-term #189

Uh oh!

Uh oh!

aseyboldt Jan 9, 2023 Maintainer

Replies: 4 comments · 11 replies

Uh oh!

Uh oh!

ricardoV94 Jan 9, 2023 Maintainer

Uh oh!

aseyboldt Jan 9, 2023 Maintainer Author

Uh oh!

Uh oh!

ricardoV94 Jan 9, 2023 Maintainer

Uh oh!

lucianopaz Jan 9, 2023 Maintainer

Uh oh!

aseyboldt Jan 9, 2023 Maintainer Author

Uh oh!

lucianopaz Jan 9, 2023 Maintainer

Uh oh!

aseyboldt Jan 9, 2023 Maintainer Author

Uh oh!

ricardoV94 Jan 10, 2023 Maintainer

Uh oh!

aseyboldt Jan 10, 2023 Maintainer Author

Uh oh!

ricardoV94 Jan 10, 2023 Maintainer

Uh oh!

Uh oh!

ricardoV94 Jan 10, 2023 Maintainer

Uh oh!

Uh oh!

aseyboldt Jan 10, 2023 Maintainer Author

Uh oh!

Uh oh!

ricardoV94 Jan 10, 2023 Maintainer

Uh oh!

Uh oh!

ricardoV94 Jan 10, 2023 Maintainer

Uh oh!

ricardoV94 Jan 10, 2023 Maintainer

aseyboldt
Jan 9, 2023
Maintainer

Replies: 4 comments 11 replies

ricardoV94
Jan 9, 2023
Maintainer

aseyboldt Jan 9, 2023
Maintainer Author

ricardoV94 Jan 9, 2023
Maintainer

lucianopaz
Jan 9, 2023
Maintainer

aseyboldt Jan 9, 2023
Maintainer Author

lucianopaz Jan 9, 2023
Maintainer

aseyboldt Jan 9, 2023
Maintainer Author

ricardoV94
Jan 10, 2023
Maintainer

aseyboldt Jan 10, 2023
Maintainer Author

ricardoV94 Jan 10, 2023
Maintainer

ricardoV94
Jan 10, 2023
Maintainer

aseyboldt Jan 10, 2023
Maintainer Author

ricardoV94 Jan 10, 2023
Maintainer

ricardoV94 Jan 10, 2023
Maintainer

ricardoV94 Jan 10, 2023
Maintainer