Interaction transformation method and Operon. #39

greenleaf641 · 2024-10-30T01:04:19Z

I am trying to see if it is possible to have Operon work not with a tree structure but by using the "Transformation Interaction representation". Briefly what this method hopes to do, is simplify the resulting functions by representing the population not as a tree but as a collection of multiple unaries acting on top of a polynomial. Something like:

weight1 * function1(x1^poly11 * x2^ poly12 * ...) + weight2 * function2(x1^poly21 * x2^poly22 * ...) + ...

I made a new generator that will only select functions with arity 1 and instead of a tree will generate a collection of genotypes that have the form described above:

struct Individual {
        // number of nodes (Functions to be applied)
        uint16_t Length = 0UL; // 0-65535

        Operon::Vector<Operon::Scalar> Fitness;

        // vector of free parameters of the affine combination of the `Functions` in `Individual`.
        Operon::Vector<Operon::Scalar> Coefficients;

        // vector of transformation Functions by index.
        Operon::Vector<Operon::Function> Functions;

        // vector of interaction Functions for each transformation.
        Operon::Vector<Operon::Vector<Operon::Scalar>> Polynomials;

        Individual() = default;

        ...
};

I altered the Interpreter as well so it can evaluate the genotypes of this form; I added this function to the interpreter class:

        auto ComputePolynomials(Operon::Range range, int i) const {
            auto const& individual{ individual_.get() };
            auto const& terms = individual.Polynomials[i];
            const auto start = range.Start();
            const auto size = range.Size();

            constexpr int64_t S{ BatchSize };
            auto* h = primal_.data_handle();

            std::fill_n(h + i * S, S, T{1});

            for (int64_t j = 0; j < static_cast<int64_t>(terms.size()); ++j) {
                auto const* variableValues = dataset_.get().GetValues(j).subspan(start, size).data();
                auto* primalPtr = h + i * S;
                auto term = terms[j];
                std::transform(variableValues, variableValues + size, primalPtr, primalPtr, [term](auto value, auto current) {
                    T result = std::pow(value, term);
                    return current * result;
                });
            }
        }

So then the newForwardPass can evaluate them like this:

        inline auto ForwardPass(Operon::Range range, int row, bool trace = false) const -> void {
            auto const start { static_cast<int64_t>(range.Start()) };
            auto const len   { static_cast<int64_t>(range.Size()) };
            auto const& individual = individual_.get();
            auto const nn = std::ssize(individual.Functions);
            constexpr int64_t S{ BatchSize };

            auto rem = std::min(S, len - row);
            Operon::Range rg(start + row, start + row + rem);

            // forward pass - compute primal and trace
            for (auto i = 0L; i < nn; ++i) {
                auto const& [coefficient, nodeFunction, nodeDerivative, polynomial] = context_[i];

                auto* ptr = primal_.data_handle() + i * S;

                if (nodeFunction) {
                    ComputePolynomials(rg, i);
                    std::invoke(*nodeFunction, individual.Functions, primal_, i, rg);

                    // first compute the partials
                    if (trace && nodeDerivative) {
                        std::invoke(*nodeDerivative, individual.Functions, primal_, trace_, i, i);
                    }

                    // apply weight after partials are computed
                    if (coefficient != T{1}) {
                        std::ranges::transform(std::span(ptr, rem), ptr, [coefficient](auto x) { return x * coefficient; });
                    }
                }
            }
        }

but I'm not sure what to do when it comes to the ForwardTrace and ReverseTrace which I assume will be necessary for optimizing the coefficients later on.

Why are we multiplying trace with dot, or trace with primal exactly, and do you have any idea on how these two functions should be altered to support transformation interaction instead?

The text was updated successfully, but these errors were encountered:

foolnotion · 2024-10-30T16:38:34Z

Hi,

If I understand correctly the question is how to implement autodiff for your custom function.

Why are we multiplying trace with dot, or trace with primal exactly, and do you have any idea on how these two functions should be altered to support transformation interaction instead?

ForwardTrace and ReverseTrace implement forward- and reverse mode automatic differentiation (see e.g. https://arxiv.org/abs/1502.05767), in a slightly more simplified way since in Operon's case the directed acyclic graph is in fact a tree.

I'm not sure what to do when it comes to the ForwardTrace and ReverseTrace which I assume will be necessary for optimizing the coefficients later on.

This requires that all intermediate results are stored in the primal to be used later by the traces. In your ForwardPass function, it looks to me like the evaluation of the entire polynomial expression (weight1 * function1(x1^poly11 * x2^ poly12 * ...) + weight2 * function2(x1^poly21 * x2^poly22 * ...) + ...) is stored in one single column of the primal matrix by the ComputePolynomials function. In other words, the intermediate results which would allow differentiation with respect to the weights inside the polynomial are missing (since the primal only stores the final aggregate values).

I would suggest to implement your own Interpreter instead. You could use dual numbers directly and then you don't need multiple passes. You can probably adapt this code to your needs:

operon/test/source/operon_test.hpp

Line 210 in 28ac41b

auto Autodiff(auto const& tree, auto const& dataset, auto const range) {

greenleaf641 · 2024-10-30T17:19:11Z

I see, having seen the paper I think what you are saying makes sense a little bit. So in this case, should I store in primal the result of each interaction (meaning weight1 * function1(x1^poly11 * x2^ poly12 * ...) per column)?

Additionally have you looked at this method of symbolic regression in the past, do you think if I manage to make it work the results will be worth it?

folivetti · 2024-10-30T17:37:41Z

@greenleaf641 as one of the authors of ITEA, I would say go for it :-D but I might be biased
In my experience with ITEA, while it doesn't excel in srbench, it has some success cases when you really need to ensure the linearity of the parameters.
(also, I would personally love to see this integrate to Operon :-) )

greenleaf641 · 2024-10-30T17:41:33Z

It would be fun I think, since I liked the idea. But for now I'm just changing Operon's existing code, if it works well, I should figure a way to add this as an additional option alongside Operon's traditional tree structure.

folivetti · 2024-10-30T17:43:36Z

Btw, you described the IT representation but pointed out to TIR paper, in TIR you would have something like:

f(x) = weight1 * function1(x1^poly11 * x2^ poly12 * ...) + weight2 * function2(x1^poly21 * x2^poly22 * ...) + ...

g(x) = weight1 * function1(x1^poly11 * x2^ poly12 * ...) + weight2 * function2(x1^poly21 * x2^poly22 * ...) + ...

and the expression would be

linkFunction(f(x)/(1 + g(x))

this would be a little bit more difficult to implement, but it often generates smaller and more accurate expressions.

greenleaf641 · 2024-10-30T17:48:15Z

My mistake. No I meant the IT method [https://arxiv.org/pdf/1902.03983], I just linked the wrong paper somehow!
Thank you for the correction.

greenleaf641 · 2024-11-01T17:44:33Z

Would I be correct in saying that in the case of IT, trace_ (i.e. in ForwardTrace) is no longer needed?
I just do the calculation directly: jac.col(i).segment(row, rem) = primal.col(i).head(rem) / individual.Weights[i];

folivetti · 2024-11-04T10:40:19Z

I think so, the derivatives for IT expressions are straightforward, like you said

foolnotion · 2024-11-05T07:49:36Z

Would I be correct in saying that in the case of IT, trace_ (i.e. in ForwardTrace) is no longer needed? I just do the calculation directly: jac.col(i).segment(row, rem) = primal.col(i).head(rem) / individual.Weights[i];

Your idea looks plausible, then you would only need the primal. I suggest starting with some small expressions and checking if the derivatives are correct, like we do in the unit test: https://github.com/heal-research/operon/blob/main/test/source/implementation/autodiff.cpp

greenleaf641 · 2024-11-15T01:39:57Z

I think I can train now "somewhat correctly". I have a few questions though. Why does Operon pass arguments by value and not by reference on the mutation/crossover functions? Wouldn't it be more efficient to do the latter?

By using this with the "Variations of vertical fluid" dataset as reference I get:
9.807*(ρ * Δh) or 9.807*abs(ρ * Δh) as an answer. Operon gives me -2.643664 * cbrt((ρ ^ 3.000000) * (Δh ^ 3.000000)) which is obviously correct but only after applying scaling. Is this the correct behaviour; should the result be returned like that or did I make some mistake somewhere in between?

Coefficient optimization happens with the scaled data so maybe this makes sense but just to be sure.

foolnotion · 2024-11-15T17:23:18Z

Why does Operon pass arguments by value and not by reference on the mutation/crossover functions? Wouldn't it be more efficient to do the latter?

Most of these operators work by performing some changes on a copy of the parent. Therefore, it makes sense to accept the arguments by value. One can also use std::move to avoid a copy.

By using this with the "Variations of vertical fluid" dataset as reference I get:
9.807*(ρ * Δh) or 9.807*abs(ρ * Δh) as an answer. Operon gives me -2.643664 * cbrt((ρ ^ 3.000000) * (Δh ^ 3.000000)) which is obviously correct but only after applying scaling. Is this the correct behaviour; should the result be returned like that or did I make some mistake somewhere in between? Coefficient optimization happens with the scaled data so maybe this makes sense but just to be sure.

I am assuming that you are using the cli programs (operon-gp, operon-nsgp) ? If yes, normally the scaling terms are added to the model. If the scaling is not what you expected, I suspect something with the data?

greenleaf641 · 2024-11-15T19:02:00Z

Correct, the cli nsgp program. But given that I've edited some things to work with IT, then maybe it could be a problem on my end. I need to investigate further since the returned function is not correct without the scaling factor.

greenleaf641 mentioned this issue Nov 26, 2024

HAVE_CERES being defined might throw errors for the NonlinearLeastSquaresOptimizer #41

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interaction transformation method and Operon. #39

Interaction transformation method and Operon. #39

greenleaf641 commented Oct 30, 2024

foolnotion commented Oct 30, 2024 •

edited

Loading

greenleaf641 commented Oct 30, 2024

folivetti commented Oct 30, 2024

greenleaf641 commented Oct 30, 2024

folivetti commented Oct 30, 2024

greenleaf641 commented Oct 30, 2024

greenleaf641 commented Nov 1, 2024

folivetti commented Nov 4, 2024

foolnotion commented Nov 5, 2024

greenleaf641 commented Nov 15, 2024

foolnotion commented Nov 15, 2024

greenleaf641 commented Nov 15, 2024

Interaction transformation method and Operon. #39

Interaction transformation method and Operon. #39

Comments

greenleaf641 commented Oct 30, 2024

foolnotion commented Oct 30, 2024 • edited Loading

greenleaf641 commented Oct 30, 2024

folivetti commented Oct 30, 2024

greenleaf641 commented Oct 30, 2024

folivetti commented Oct 30, 2024

greenleaf641 commented Oct 30, 2024

greenleaf641 commented Nov 1, 2024

folivetti commented Nov 4, 2024

foolnotion commented Nov 5, 2024

greenleaf641 commented Nov 15, 2024

foolnotion commented Nov 15, 2024

greenleaf641 commented Nov 15, 2024

foolnotion commented Oct 30, 2024 •

edited

Loading