You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Active Memory Manager uses the optimistic memory (managed + unmanaged old) as a hardcoded measure to base all of its decisions upon.
This is generally a good choice in a production environment.
There are however two notable exceptions:
When the process memory does not deflate on its own. This issue is probably fixable with distributed.nanny.environ.MALLOC_TRIM_THRESHOLD_ is ineffective #5971 on Linux, and (to my knowledge) unfixable on MacOSX. This can cause the AMM to take poor decisions, e.g. move all data away from a worker because it sees huge amounts of managed memory - except that that memory is actually reusable.
In unit tests. Most of the AMM tests currently run on nannies and require large amounts of data and lax constraints to be stable. The AMM stress tests are currently disabled on CI, not because of AMM's fault (the same tests fail also with AMM disabled) but instead because, in order to let AMM take correct decisions, they have to spawn 10 Nannies, which are too much for the measly github CI hosts to handle. Those stress tests would be extremely valuable to run in CI, as they've detected state machine corruption and other deadlocks in the past many times already. See Remove @avoid_ci from stress tests #6271.
Design
Add a new setting to distributed.yaml, {distributed.scheduler.active-memory-manager.measure: optimistic}. This mirrors {distributed.worker.memory.rebalance.measure: optimistic}. Note that rebalance() has been penned in to be rewritten: #4906.
The text was updated successfully, but these errors were encountered:
The Active Memory Manager uses the optimistic memory (managed + unmanaged old) as a hardcoded measure to base all of its decisions upon.
This is generally a good choice in a production environment.
There are however two notable exceptions:
Design
Add a new setting to distributed.yaml,
{distributed.scheduler.active-memory-manager.measure: optimistic}
. This mirrors{distributed.worker.memory.rebalance.measure: optimistic}
. Note that rebalance() has been penned in to be rewritten: #4906.The text was updated successfully, but these errors were encountered: