Performance and recursivity in the interpreter of mathics #844

mmatera · 2020-09-08T17:12:01Z

A performance problem that mathics has is related to the costs of recursions in python. The current implementation seems to be written with the logic of a functional language (like mma) for which recursive functions are natural. However, as the interpreter is implemented on python, a lot of resources are required to build the frames associated with each recursion level. Then, I see two possibilities: either look for a different language for the interpreter or to change the way to walk on the expressions.
I propose then to see if this second way is possible (of course, is something for a future release).

See for example https://www.geeksforgeeks.org/inorder-tree-traversal-without-recursion/

GarkGarcia · 2020-09-08T21:43:12Z

That's a big issue indeed. Recursion is formally equivalent to iteration on regards to tree traversal algorithms (I think). The problem is most tree traversal algorithms are much cleaner when expressed in terms of recursion (at least in my experience), so there's a maintainability cost to using iteration instead of recursion.

I propose then to see if this second way is possible (of course, is something for a future release).

I agree. I think we could improve things a lot in that regard.

CallmeNezha · 2021-07-30T05:36:46Z

I don't know if anyone is still interested in this issue 😄. Recently I have implemented two really simple yet working Expression evaluation algorithms (iteration and recursion) just for benchmarking these two methods in Python. And it seems like iteration algorithm is not faster(even 20% slower in my case) than natively do recursion in Python, since it needs stack(deque)'s extra pushing and poping. But iteration is far more stable than recursion in computing time variance (and memory I think ?) and frees itself of Python 1000 recursion limitation, because now stack is actually replace by deque.

Two algorithm evaluate same expression .

WL: Fold[Plus, 0, Range[400]]
Expression: Plus[Plus[........Plus[0,1],2],3....]

Below is the cProfile output and benchmarks. After benchmark section, I paste my code there. Recursive version throws recursion limit error when I nest Plus by 1000 times (not changing python recursion limit by default), so I use 400 in the this test.

-------------  benchmark --------------------------
Time unit is second, measured by time.time()
**evaluate_recursive** with nest depth:400 loops: 200 min: 0.00156 max: 0.00302 average: 0.00170 stdev: 0.00019
result value: 80200
**evaluate_iterative** with nest depth:400 loops: 200 min: 0.00209 max: 0.00231 average: 0.00213 stdev: 0.00003
result value: 80200

------------- evaluate_recursive cProfile ------------------------
                  21208 function calls (16811 primitive calls) in 0.004 seconds

   Ordered by: call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
4000/2400    0.000    0.000    0.001    0.000 {built-in method builtins.hash}
     3202    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
     2400    0.000    0.000    0.000    0.000 evaluation-benchmark.py:45(__hash__)
     1601    0.000    0.000    0.000    0.000 evaluation-benchmark.py:57(is_atom)
   1601/1    0.002    0.000    0.004    0.004 evaluation-benchmark.py:208(evaluate_recursive)
     1201    0.000    0.000    0.000    0.000 evaluation-benchmark.py:63(is_symbol)
     1200    0.000    0.000    0.000    0.000 evaluation-benchmark.py:98(leaves)
   1200/3    0.000    0.000    0.004    0.001 evaluation-benchmark.py:218(<genexpr>)
      800    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
      800    0.000    0.000    0.000    0.000 evaluation-benchmark.py:20(value)
      800    0.000    0.000    0.001    0.000 evaluation-benchmark.py:54(__eq__)
      400    0.000    0.000    0.000    0.000 evaluation-benchmark.py:14(__init__)
      400    0.000    0.000    0.000    0.000 evaluation-benchmark.py:33(__eq__)
      400    0.000    0.000    0.000    0.000 evaluation-benchmark.py:69(is_expression)
      400    0.000    0.000    0.000    0.000 evaluation-benchmark.py:90(__init__)
      400    0.000    0.000    0.000    0.000 evaluation-benchmark.py:94(head)
      400    0.000    0.000    0.000    0.000 evaluation-benchmark.py:160(apply)
        1    0.000    0.000    0.004    0.004 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.004    0.004 <string>:1(<module>)

------------- evaluate_iterative cProfile ------------------------
         32413 function calls (30813 primitive calls) in 0.006 seconds

   Ordered by: call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     4402    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
     4402    0.000    0.000    0.000    0.000 {method 'append' of 'collections.deque' objects}
     4402    0.000    0.000    0.000    0.000 {method 'pop' of 'collections.deque' objects}
4000/2400    0.000    0.000    0.001    0.000 {built-in method builtins.hash}
     2401    0.000    0.000    0.000    0.000 evaluation-benchmark.py:69(is_expression)
     2400    0.000    0.000    0.000    0.000 evaluation-benchmark.py:45(__hash__)
     2001    0.000    0.000    0.000    0.000 evaluation-benchmark.py:33(__eq__)
     2001    0.000    0.000    0.000    0.000 evaluation-benchmark.py:63(is_symbol)
     1200    0.000    0.000    0.000    0.000 evaluation-benchmark.py:98(leaves)
      800    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
      800    0.000    0.000    0.000    0.000 evaluation-benchmark.py:20(value)
      800    0.000    0.000    0.001    0.000 evaluation-benchmark.py:54(__eq__)
      800    0.000    0.000    0.000    0.000 evaluation-benchmark.py:94(head)
      400    0.000    0.000    0.000    0.000 {method 'clear' of 'collections.deque' objects}
      400    0.000    0.000    0.000    0.000 {method 'extend' of 'collections.deque' objects}
      400    0.000    0.000    0.000    0.000 evaluation-benchmark.py:14(__init__)
      400    0.000    0.000    0.000    0.000 evaluation-benchmark.py:90(__init__)
      400    0.000    0.000    0.001    0.000 evaluation-benchmark.py:160(apply)
        1    0.000    0.000    0.006    0.006 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.003    0.003    0.006    0.006 evaluation-benchmark.py:243(evaluate_iterative)
        1    0.000    0.000    0.006    0.006 <string>:1(<module>)

"""
evaluation process

1. Expression: Head|[Leaves...]
First we have expression with head and leaves, head must be one of expression or symbol and leaves can be any.
2. Evaluate from down to top, we only can evaluate node when its' child nodes are all evaluated(not considering 
'Hold' mechanism). Yeah, it is intrinsically recursive. Atom which is not symbol are returned unchanged, symbol 
can be replaced by defined rules, and expression which matched any pattern will be replaced either.
"""
def evaluate_recursive(expr: Union[Expression, Symbol, Atom], definitions):
    if is_atom(expr):
        if is_symbol(expr):
            for rule in definitions[expr].get('ownvalues',[]):
                expr = rule.apply(expr)
            return expr
        else:
            return expr
    elif is_expression(expr):
        head = evaluate_recursive(expr.head, definitions)
        new_expr = Expression(head, *(evaluate_recursive(leaf, definitions) for leaf in expr.leaves))
        for rule in definitions[head].get('downvalues',[]):
            next_new_expr = rule.apply(new_expr)
            if next_new_expr is not None: #there is rule that applies
                if next_new_expr != new_expr: #ok that's not finished because we can still evaluate it
                    return evaluate_recursive(next_new_expr, definitions)
                else:
                    return next_new_expr
            else:
                pass # rule not applied
        #!for no rule can applies we return expr as is
        return new_expr

"""
1. Implement evaluation of expression using two stacks
2. An example, `o` represents expression, left branch is `head` and right branch has `children` seperated by comma.
          o1
         /  \
       o2    2, o3
      / \      /
     f.  a.   g

Definitions:
a := 1

Time 0: Push o1
stack   = [o1]
e_stack = []

Time 1: Pop o1 and check if can`expand` and push again into stack
stack   = [&,o3,2,@,o2] <--- [o1]
e_stack = []

Time 2: Pop and do same thing as Time 1
stack   = [&,o3,2,@,&,a,@,f] <--- [&,o3,2,@,o2]
e_stack = []

Time 3: Pop elements and push into e_stack until it meets & or it can expand
stack   = [&,o3,2,@,&,a] 
e_stack = [f,@] 

Time 4: Pop 'a' and replace with 1 and push again into stack
stack   = [&,o3,2,@,&,1] 
e_stack = [f,@] 

Time 5: Pop elements and push into e_stack until it meets & or it can expand
stack  = [&,o3,2,@,&]  <---- next step we will meet & then 'roll' e_stack immediately
e_stack = [f,@,1] 

Time 6: We evaluate right to left on e_stack and meets @ and consume one head before it
stack   = [&,o3,2,@]    --- >  &
e_stack = ['f[1]']  <--- [f,@,1]  no downvalues to apply and we return f[1]

Time 7: Pop elements and push into e_stack until it meets & or it can expand
stack   = [&,o3]
e_stack = ['f[1]',@,2] 

Time 8: Expand again
stack   = [&,&,@,g] <---  [&,o3]
e_stack = ['f[1]',@,2] 

Time 9: Pop elements and push into e_stack until it meets & or it can expand
stack   = [&,&]
e_stack = ['f[1]',@,2,g,@] 

Time 10: We evaluate right to left on e_stack and meets @ and consume one head before it
stack   = [&]    --- >  &
e_stack = ['f[1]',@,2,'g[]']

Time 11: We evaluate right to left on e_stack and meets @ and consume one head before it
stack   = []    --- >  &
e_stack = ['f[1][2,g[]]']

Time 12: 
Now we have evaluated expression f[1][2,g[]]
"""

def evaluate_iterative(expr: Union[Expression, Symbol, Atom], definitions):
    stack = deque()
    e_stack = deque()
    call_stack = deque()
    stack.append(expr)

    APPLY = '@'
    CALL = '&'

    while stack:
        e = stack.pop()
        if is_expression(e):
            stack.append(CALL)
            stack.extend(reversed(e.leaves))
            stack.append(APPLY)
            stack.append(e.head)
            continue
        elif is_symbol(e):
            next_part = None 
            for rule in definitions[e].get('ownvalues', []):
                next_part = rule.apply(e)
                if next_part is not None:
                    if next_part == part:
                        e_stack.append(next_part)
                    else:
                        stack.append(next_part)
                    break
                # !if
            # !for
            if not next_part:
                e_stack.append(e)
            # !if
            continue

        if e != CALL:
            e_stack.append(e)
            continue
        else:
            call_stack.append(e_stack.pop())
            while call_stack[-1] != APPLY:
                call_stack.append(e_stack.pop())
            call_stack.pop()  # pop APPLY
            part = Expression(e_stack.pop(), *reversed(call_stack))
            call_stack.clear()
            next_part = None  # rules are yield for performance, so we check it instead of get length of rules
            for rule in definitions[part.head].get('downvalues', []):
                next_part = rule.apply(part)
                if next_part:
                    if next_part == part:
                        e_stack.append(next_part)
                    else:
                        stack.append(next_part)
                    # !else
                    break
                else:
                    pass
            # !for
            if not next_part:
                e_stack.append(part)
            # !if
            continue
    # !while
    return e_stack.pop()

rocky · 2021-07-30T07:29:51Z

@CallmeNezha I am very interested in this and

Would like to see some version of this get into the code base, and possibly iterated on
Want to make sure that the study and work is promoted rather than being buried in a thread of on the topic.

Here are some thoughts: let me know what you think.

We can have both routines in mathics core. If we hit a recursion limit error the iterative approach can be used on retry. I think though there still might need to be some mechanism for detecting infinite regress. Perhaps a limit on the queue sizes?

All of this can be embellished with settings. Some kinds of things that come to mind:

a setting for maximum queue size
a setting to enable at all.
a setting to use as a fallback, or use exclusively

With respect to promoting dissemination of the information, what I would suggest is adding it to the developer guide as its own section in https://github.com/Mathics3/mathics-development-guide which is visible as https://mathics-development-guide.readthedocs.io/en/latest/ .

The process for adding could be done in a number of ways. My preference is for you to add it and put in a PR in that repository. If you are unsure about your English, I would be happy to go over and suggest changes. (The English, she is hard)

Or if you would prefer, I can try to cut and past this there and adapt it to the Sphinx format and you can review and revise if that is easier or preferable for you.

Here are some other ideas for exploring.

The first question I have is: which Python interpreter was work done with?

My experiments with pyston 2.2 (targeting Python 3.8) is as advertised, that it is 30% faster than CPython. pyston 2.3 is out so things may have changed. Maybe even gotten faster since the in the very brief mention of changes is that this is one thing mentioned (without any detail). The downside of pyston is that I don't believe it supports SciPy yet. SciPy is currently optional, but that handles images right now. And of course I would like to see more use of it in the things it is good in. PyPy is another kind of interpreter to try of this variety.

The second idea is whether specializing in queuing would help. This could be done in C or using a numpy array. See https://stackoverflow.com/questions/42771110/fastest-way-to-left-cycle-a-numpy-array-like-pop-push-for-a-queue for example

numpy is currently mandatory for Mathics.

There are two other idea avenues. One is caching results. This can be done independently. The other is determining if we can get quicker failures if some sanity checking is done based on the shape of data. Both of these idea are extremely vague right now in my mind.

Eagerly waiting to hear your thoughts...

CallmeNezha · 2021-07-31T05:01:47Z

Your idea about both routine co-exsist is exactly same as mine 😄. I have been reading and understanding the code base of expression evaluation and pattern matching of Mathics recently. And found recursive coding style have been heavily used ever since the beginning of the project. So I think this is a feasible way to do it without breaking things we already have.

I'd like to put a PR, and working on the documentation first to get straight from idea to requirements to how implementation will be done precisely. But it won't be so quick because I'd like to fully assess the feature and downside it brings, also it must rubustly integrated ,we don't want a fragile core anyway. And I also wonder if iteration can alleviate the huge memory footprint problem(function call frame will keep every variable it has, no matter they are useless or not).In my case,when I import ~8MB XML file, it generates 3GB in the process and actually 700MB in the retured result(MMA only increase ~40MB), this is another story of I'm trying to optimize the XML module and found out it is not so simple 😂.

I'm currently using CPython Python3.8 comes with conda environment for its compatibility.

Here are my thoughts:

Having the things done in C language is a tentante idea, I'm looking into this idea several times, but I need more experimenting on that, keep Python and C cooperate is not easy, not mention we need Sympy for symbolic computation. Merely have queue in C or Numpy is not worth it in my opinion, it has overhead In the long term, we might need a different language #564
Caching is a good idea for changing space for time, and it can be done in iteration routine just like cache mechanism we have in recursive evaluating the expression, but when we don't have any expression result can be reused, it become nonsense and consume a lot of space and computing time for caching.
Sanity checking if I'm understanding you right. Is not possible, because of the halting problem. You can not decide programing will finish or running forever. https://en.wikipedia.org/wiki/Halting_problem

rocky · 2021-07-31T05:47:41Z

Your idea about both routine co-exsist is exactly same as mine .

Ok - great. Then let us do that.

...
I'd like to put a PR, and working on the documentation first to get straight from idea to requirements to how implementation will be done precisely.

Ok - great. Then do that.

But it won't be so quick because I'd like to fully assess the feature and downside it brings, also it must robustly integrated ,we don't want a fragile core anyway.

What I am learning is that things are and have been a bit too fragile. Every time we try to expand something in a trivial way, I find that a lot of clean up is needed first to pick things apart because everything is so tightly intertwined, and extended to the maximum it can go.

I realize things are not going to be done quickly. There are things that I've wanted to do from the beginning of working on the code that I haven't been able to get around to because there is so much other stuff needing addressing first.

And I also wonder if iteration can alleviate the huge memory footprint problem(function call frame will keep every variable it has, no matter they are useless or not).In my case,when I import ~8MB XML file, it generates 3GB in the process and actually 700MB in the retured result(MMA only increase ~40MB), this is another story of I'm trying to optimize the XML module and found out it is not so simple .

Maybe; maybe not. But again my experience has been that even if you focus on one thing at a time - the most recent thing being trying to make sense of the testing and documentation - the amount of work to disentangle this is so great that even just this simple thing has to be done in multiple iterations.

I'm currently using CPython Python3.8 comes with conda environment for its compatibility.

Ok. Then pyston is roughly comparable.

Here are my thoughts:

Having the things done in C language is a tentante idea, I'm looking into this idea several times, but I need more experimenting on that, keep Python and C cooperate is not easy, not mention we need Sympy for symbolic computation.

This is my least favorite idea. We currently use Cython. It make debugging a pain, slows down development in constantly recompiling and I am not convinced (nor have seen benchmarks) to suggest that it is worth all of the trouble. My bent is more algorithms where I we look for order of magnitude kinds of speedups, not small constant factors like you get here.

Merely have queue in C or Numpy is not worth it in my opinion, it has overhead

I am not convinced here. A numpy array has integrates into Python very well and has little overhead when it is used for a dedicated purpose as it would be here.

In the long term, we might need a different language #564

Doing this has the same kinds of challenges that we are already trying to address in modernizing the Python code.

For example, I like Rust which is one of the languages suggested. Well, we'd have to modularize the code better, and add types. And we're already trying to do that. And already this is a challenge, although there is slow progress here.

So on the flip side whatever we are getting done in terms of adding types and breaking down the code into smaller units would be useful in moving towards a faster (strongly typed) language.

Caching is a good idea for changing space for time, and it can be done in iteration routine just like cache mechanism we have in recursive evaluating the expression, but when we don't have any expression result can be reused, it become nonsense and consume a lot of space and computing time for caching.

I threw this out, and I am sure certain kinds of caching help. In fact we have various kinds of caching already. various mpmath and numpy constants are cached (One has to include the precision as part of the key here). Various Symbols like True, and False are cached - this is a very special case. And there is some other kind of cache around, but I don't understand this that well and probably it isn't very good either.

I'll also throw out one other idea has been in the back of my mind that is somewhat related to caching: move to front for lists when there is success. When there is a list of things to match against, it is often the case that the last item in the list that matched will also be the one that will match in the future. So what's often done here is to reorder the list so that that item moves to the front of the list. I don't know if it is semantically valid to do this with the list of own and down values though.

Sanity checking if I'm understanding you right. Is not possible, because of the halting problem. You can not decide programing will finish or running forever. https://en.wikipedia.org/wiki/Halting_problem

This is the common excuse/trap in thinking that people fall into. We are not looking for something that works every time, just some time. There are clearly special and easy cases where you know things won't match in terms of shape. For example when a Function has a fixed number of arguments. Those cases where it is difficult to figure out, you just don't worry about it and not try to figure out.

GarkGarcia added performance enhancement labels Sep 8, 2020

GarkGarcia mentioned this issue Sep 9, 2020

Fix quadratic runtimes in Mathics #619

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance and recursivity in the interpreter of mathics #844

Performance and recursivity in the interpreter of mathics #844

mmatera commented Sep 8, 2020 •

edited

Loading

GarkGarcia commented Sep 8, 2020

CallmeNezha commented Jul 30, 2021

rocky commented Jul 30, 2021 •

edited

Loading

CallmeNezha commented Jul 31, 2021

rocky commented Jul 31, 2021 •

edited

Loading

Performance and recursivity in the interpreter of mathics #844

Performance and recursivity in the interpreter of mathics #844

Comments

mmatera commented Sep 8, 2020 • edited Loading

GarkGarcia commented Sep 8, 2020

CallmeNezha commented Jul 30, 2021

rocky commented Jul 30, 2021 • edited Loading

CallmeNezha commented Jul 31, 2021

rocky commented Jul 31, 2021 • edited Loading

mmatera commented Sep 8, 2020 •

edited

Loading

rocky commented Jul 30, 2021 •

edited

Loading

rocky commented Jul 31, 2021 •

edited

Loading