-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Engine: Dynamically update maximum stack size close to overflow #6052
Engine: Dynamically update maximum stack size close to overflow #6052
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test and work for me!
When the process is finished, will the recursion number decrease? Or the recursion is cumulated when a daemon is running and will set back only after it restarts? |
No, it will keep the recursion limit, until the worker resets. I thought about resetting it, but I don't really see the point. If it can run with a given limit once, it should be fine to keep running with it. So resetting it is not really adding anything positive. Only thing it is adding is a potential "ping-pong" between increasing and then lowering the limit. |
One question is the performance hit of calling The timings were done at a stack depth of 50 and each has a different scaling. Since we will typically be running at stack depths that are considerably larger than 50, it may be best to have the algorithm that scales best for larger stack depths. From the analysis, it seems that from itertools import count
def stack_size3a(size=2):
"""Get stack size for caller's frame.
"""
frame = sys._getframe(size)
try:
for size in count(size, 8):
frame = frame.f_back.f_back.f_back.f_back.\
f_back.f_back.f_back.f_back
except AttributeError:
while frame:
frame = frame.f_back
size += 1
return size - 1 Are we willing to use this implementation which accesses the private |
Why don't we define our own |
Yeah, works for me. Will add it |
0f3c4b3
to
354137e
Compare
Ok, I have finalized the implementation with the optimized |
The Python interpreter maintains a stack of frames when executing code which has a limit. As soon as a frame is added to the stack that were to exceed this limit a `RecursionError` is raised. Note that, unlike the name suggests, the cause doesn't need to involve recursion necessarily although that is a common cause for the problem. Simply creating a deep but non-recursive call stack will have the same effect. This `RecursionError` was routinely hit when submitting large numbers of workflows to the daemon that call one or more process functions. This is due to the process function being called synchronously in an async context, namely the workchain, which is being executed as a task on the event loop of the `Runner` in the daemon worker. To make this possible, the event loop has to be made reentrant, but this is not supported by vanilla `asyncio`. This blockade is circumvented in `plumpy` through the use of `nest-asyncio` which makes a running event loop reentrant. The problem is that when the event loop is reentered, instead of creating a separate stack for that task, it reuses the current one. Consequently, each process function adds frames to the current stack that are not resolved and removed until after the execution finished. If many process functions are started before they are finished, these frames accumulate and can ultimately hit the stack limit. Since the task queue of the event loop uses a FIFO, it would very often lead to this situation because all process function tasks would be created first, before being finalized. Since an actual solution for this problem is not trivial and this is causing a lot problems, a temporary workaround is implemented. Each time when a process function is executed, the current stack size is compared to the current stack limit. If the stack is more than 80% filled, the limit is increased by a 1000 and a warning message is logged. This should give some more leeway for the created process function tasks to be resolved. Note that the workaround will keep increasing the limit if necessary which can and will eventually lead to an actual stack overflow in the interpreter. When this happens will be machine dependent so it is difficult to put an absolute limit. The function to get the stack size is using a custom implementation instead of the naive `len(inspect.stack())`. This is because the performance is three order of magnitudes better and it scales well for deep stacks, which is typically the case for AiiDA daemon workers. See https://stackoverflow.com/questions/34115298 for a discussion on the implementation and its performance.
354137e
to
6bd178d
Compare
I did the test run with 4 daemons again and use ~50% of the available daemon worker slots, which previously has a lot of failing processes. It is now all fine! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Temporary workaround for #4876
The Python interpreter maintains a stack of frames when executing code which has a limit. As soon as a frame is added to the stack that were to exceed this limit a
RecursionError
is raised. Note that, unlike the name suggests, the cause doesn't need to involve recursion necessarily although that is a common cause for the problem. Simply creating a deep but non-recursive call stack will have the same effect.This
RecursionError
was routinely hit when submitting large numbers of workflows to the daemon that call one or more process functions. This is due to the process function being called synchronously in an async context, namely the workchain, which is being executed as a task on the event loop of theRunner
in the daemon worker. To make this possible, the event loop has to be made reentrant, but this is not supported by vanillaasyncio
. This blockade is circumvented inplumpy
through the use ofnest-asyncio
which makes a running event loop reentrant.The problem is that when the event loop is reentered, instead of creating a separate stack for that task, it reuses the current one. Consequently, each process function adds frames to the current stack that are not resolved and removed until after the execution finished. If many process functions are started before they are finished, these frames accumulate and can ultimately hit the stack limit. Since the task queue of the event loop uses a FIFO, it would very often lead to this situation because all process function tasks would be created first, before being finalized.
Since an actual solution for this problem is not trivial and this is causing a lot problems, a temporary workaround is implemented. Each time when a process function is executed, the current stack size is compared to the current stack limit. If the stack is more than 80% filled, the limit is increased by a 1000 and a warning message is logged. This should give some more leeway for the created process function tasks to be resolved.
Note that the workaround will keep increasing the limit if necessary which can and will eventually lead to an actual stack overflow in the interpreter. When this happens will be machine dependent so it is difficult to put an absolute limit.