-
-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance and recursivity in the interpreter of mathics #844
Comments
That's a big issue indeed. Recursion is formally equivalent to iteration on regards to tree traversal algorithms (I think). The problem is most tree traversal algorithms are much cleaner when expressed in terms of recursion (at least in my experience), so there's a maintainability cost to using iteration instead of recursion.
I agree. I think we could improve things a lot in that regard. |
I don't know if anyone is still interested in this issue 😄. Recently I have implemented two really simple yet working Expression evaluation algorithms (iteration and recursion) just for benchmarking these two methods in Python. And it seems like iteration algorithm is not faster(even 20% slower in my case) than natively do recursion in Python, since it needs stack(deque)'s extra pushing and poping. But iteration is far more stable than recursion in computing time variance (and memory I think ?) and frees itself of Python 1000 recursion limitation, because now stack is actually replace by deque. Two algorithm evaluate same expression .
Below is the cProfile output and benchmarks. After benchmark section, I paste my code there. Recursive version throws recursion limit error when I nest Plus by 1000 times (not changing python recursion limit by default), so I use 400 in the this test.
"""
evaluation process
1. Expression: Head|[Leaves...]
First we have expression with head and leaves, head must be one of expression or symbol and leaves can be any.
2. Evaluate from down to top, we only can evaluate node when its' child nodes are all evaluated(not considering
'Hold' mechanism). Yeah, it is intrinsically recursive. Atom which is not symbol are returned unchanged, symbol
can be replaced by defined rules, and expression which matched any pattern will be replaced either.
"""
def evaluate_recursive(expr: Union[Expression, Symbol, Atom], definitions):
if is_atom(expr):
if is_symbol(expr):
for rule in definitions[expr].get('ownvalues',[]):
expr = rule.apply(expr)
return expr
else:
return expr
elif is_expression(expr):
head = evaluate_recursive(expr.head, definitions)
new_expr = Expression(head, *(evaluate_recursive(leaf, definitions) for leaf in expr.leaves))
for rule in definitions[head].get('downvalues',[]):
next_new_expr = rule.apply(new_expr)
if next_new_expr is not None: #there is rule that applies
if next_new_expr != new_expr: #ok that's not finished because we can still evaluate it
return evaluate_recursive(next_new_expr, definitions)
else:
return next_new_expr
else:
pass # rule not applied
#!for no rule can applies we return expr as is
return new_expr """
1. Implement evaluation of expression using two stacks
2. An example, `o` represents expression, left branch is `head` and right branch has `children` seperated by comma.
o1
/ \
o2 2, o3
/ \ /
f. a. g
Definitions:
a := 1
Time 0: Push o1
stack = [o1]
e_stack = []
Time 1: Pop o1 and check if can`expand` and push again into stack
stack = [&,o3,2,@,o2] <--- [o1]
e_stack = []
Time 2: Pop and do same thing as Time 1
stack = [&,o3,2,@,&,a,@,f] <--- [&,o3,2,@,o2]
e_stack = []
Time 3: Pop elements and push into e_stack until it meets & or it can expand
stack = [&,o3,2,@,&,a]
e_stack = [f,@]
Time 4: Pop 'a' and replace with 1 and push again into stack
stack = [&,o3,2,@,&,1]
e_stack = [f,@]
Time 5: Pop elements and push into e_stack until it meets & or it can expand
stack = [&,o3,2,@,&] <---- next step we will meet & then 'roll' e_stack immediately
e_stack = [f,@,1]
Time 6: We evaluate right to left on e_stack and meets @ and consume one head before it
stack = [&,o3,2,@] --- > &
e_stack = ['f[1]'] <--- [f,@,1] no downvalues to apply and we return f[1]
Time 7: Pop elements and push into e_stack until it meets & or it can expand
stack = [&,o3]
e_stack = ['f[1]',@,2]
Time 8: Expand again
stack = [&,&,@,g] <--- [&,o3]
e_stack = ['f[1]',@,2]
Time 9: Pop elements and push into e_stack until it meets & or it can expand
stack = [&,&]
e_stack = ['f[1]',@,2,g,@]
Time 10: We evaluate right to left on e_stack and meets @ and consume one head before it
stack = [&] --- > &
e_stack = ['f[1]',@,2,'g[]']
Time 11: We evaluate right to left on e_stack and meets @ and consume one head before it
stack = [] --- > &
e_stack = ['f[1][2,g[]]']
Time 12:
Now we have evaluated expression f[1][2,g[]]
"""
def evaluate_iterative(expr: Union[Expression, Symbol, Atom], definitions):
stack = deque()
e_stack = deque()
call_stack = deque()
stack.append(expr)
APPLY = '@'
CALL = '&'
while stack:
e = stack.pop()
if is_expression(e):
stack.append(CALL)
stack.extend(reversed(e.leaves))
stack.append(APPLY)
stack.append(e.head)
continue
elif is_symbol(e):
next_part = None
for rule in definitions[e].get('ownvalues', []):
next_part = rule.apply(e)
if next_part is not None:
if next_part == part:
e_stack.append(next_part)
else:
stack.append(next_part)
break
# !if
# !for
if not next_part:
e_stack.append(e)
# !if
continue
if e != CALL:
e_stack.append(e)
continue
else:
call_stack.append(e_stack.pop())
while call_stack[-1] != APPLY:
call_stack.append(e_stack.pop())
call_stack.pop() # pop APPLY
part = Expression(e_stack.pop(), *reversed(call_stack))
call_stack.clear()
next_part = None # rules are yield for performance, so we check it instead of get length of rules
for rule in definitions[part.head].get('downvalues', []):
next_part = rule.apply(part)
if next_part:
if next_part == part:
e_stack.append(next_part)
else:
stack.append(next_part)
# !else
break
else:
pass
# !for
if not next_part:
e_stack.append(part)
# !if
continue
# !while
return e_stack.pop() |
@CallmeNezha I am very interested in this and
Here are some thoughts: let me know what you think. We can have both routines in mathics core. If we hit a recursion limit error the iterative approach can be used on retry. I think though there still might need to be some mechanism for detecting infinite regress. Perhaps a limit on the queue sizes? All of this can be embellished with settings. Some kinds of things that come to mind:
With respect to promoting dissemination of the information, what I would suggest is adding it to the developer guide as its own section in https://github.com/Mathics3/mathics-development-guide which is visible as https://mathics-development-guide.readthedocs.io/en/latest/ . The process for adding could be done in a number of ways. My preference is for you to add it and put in a PR in that repository. If you are unsure about your English, I would be happy to go over and suggest changes. (The English, she is hard) Or if you would prefer, I can try to cut and past this there and adapt it to the Sphinx format and you can review and revise if that is easier or preferable for you. Here are some other ideas for exploring. The first question I have is: which Python interpreter was work done with? My experiments with pyston 2.2 (targeting Python 3.8) is as advertised, that it is 30% faster than CPython. pyston 2.3 is out so things may have changed. Maybe even gotten faster since the in the very brief mention of changes is that this is one thing mentioned (without any detail). The downside of pyston is that I don't believe it supports SciPy yet. SciPy is currently optional, but that handles images right now. And of course I would like to see more use of it in the things it is good in. PyPy is another kind of interpreter to try of this variety. The second idea is whether specializing in queuing would help. This could be done in C or using a numpy array. See https://stackoverflow.com/questions/42771110/fastest-way-to-left-cycle-a-numpy-array-like-pop-push-for-a-queue for example numpy is currently mandatory for Mathics. There are two other idea avenues. One is caching results. This can be done independently. The other is determining if we can get quicker failures if some sanity checking is done based on the shape of data. Both of these idea are extremely vague right now in my mind. Eagerly waiting to hear your thoughts... |
Your idea about both routine co-exsist is exactly same as mine 😄. I have been reading and understanding the code base of expression evaluation and pattern matching of Mathics recently. And found recursive coding style have been heavily used ever since the beginning of the project. So I think this is a feasible way to do it without breaking things we already have. I'd like to put a PR, and working on the documentation first to get straight from idea to requirements to how implementation will be done precisely. But it won't be so quick because I'd like to fully assess the feature and downside it brings, also it must rubustly integrated ,we don't want a fragile core anyway. And I also wonder if iteration can alleviate the huge memory footprint problem(function call frame will keep every variable it has, no matter they are useless or not).In my case,when I import ~8MB XML file, it generates 3GB in the process and actually 700MB in the retured result(MMA only increase ~40MB), this is another story of I'm trying to optimize the XML module and found out it is not so simple 😂. I'm currently using CPython Python3.8 comes with conda environment for its compatibility. Here are my thoughts:
|
Ok - great. Then let us do that.
Ok - great. Then do that.
What I am learning is that things are and have been a bit too fragile. Every time we try to expand something in a trivial way, I find that a lot of clean up is needed first to pick things apart because everything is so tightly intertwined, and extended to the maximum it can go. I realize things are not going to be done quickly. There are things that I've wanted to do from the beginning of working on the code that I haven't been able to get around to because there is so much other stuff needing addressing first.
Maybe; maybe not. But again my experience has been that even if you focus on one thing at a time - the most recent thing being trying to make sense of the testing and documentation - the amount of work to disentangle this is so great that even just this simple thing has to be done in multiple iterations.
Ok. Then pyston is roughly comparable.
This is my least favorite idea. We currently use Cython. It make debugging a pain, slows down development in constantly recompiling and I am not convinced (nor have seen benchmarks) to suggest that it is worth all of the trouble. My bent is more algorithms where I we look for order of magnitude kinds of speedups, not small constant factors like you get here.
I am not convinced here. A numpy array has integrates into Python very well and has little overhead when it is used for a dedicated purpose as it would be here. Doing this has the same kinds of challenges that we are already trying to address in modernizing the Python code. For example, I like Rust which is one of the languages suggested. Well, we'd have to modularize the code better, and add types. And we're already trying to do that. And already this is a challenge, although there is slow progress here. So on the flip side whatever we are getting done in terms of adding types and breaking down the code into smaller units would be useful in moving towards a faster (strongly typed) language.
I threw this out, and I am sure certain kinds of caching help. In fact we have various kinds of caching already. various mpmath and numpy constants are cached (One has to include the precision as part of the key here). Various Symbols like True, and False are cached - this is a very special case. And there is some other kind of cache around, but I don't understand this that well and probably it isn't very good either. I'll also throw out one other idea has been in the back of my mind that is somewhat related to caching: move to front for lists when there is success. When there is a list of things to match against, it is often the case that the last item in the list that matched will also be the one that will match in the future. So what's often done here is to reorder the list so that that item moves to the front of the list. I don't know if it is semantically valid to do this with the list of own and down values though.
This is the common excuse/trap in thinking that people fall into. We are not looking for something that works every time, just some time. There are clearly special and easy cases where you know things won't match in terms of shape. For example when a Function has a fixed number of arguments. Those cases where it is difficult to figure out, you just don't worry about it and not try to figure out. |
A performance problem that mathics has is related to the costs of recursions in python. The current implementation seems to be written with the logic of a functional language (like mma) for which recursive functions are natural. However, as the interpreter is implemented on python, a lot of resources are required to build the frames associated with each recursion level. Then, I see two possibilities: either look for a different language for the interpreter or to change the way to walk on the expressions.
I propose then to see if this second way is possible (of course, is something for a future release).
See for example https://www.geeksforgeeks.org/inorder-tree-traversal-without-recursion/
The text was updated successfully, but these errors were encountered: