You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
dask.distributed running multiple processes on the local machine
memory enabled to cache common transformation across processes (it's supposed to be multiprocessing-safe)
But I did two things that make me thing this mode of operation is not supported:
Setting breakpoint inside's joblib.Memory.cache() function - it only get's called to check if produced individual is valid (check_pipeline/_pre_test function)
Looking at the code that actually performs evaluation of individuals. Everything seems to happen inside dask_ml.model_selection._search.build_graph(). But the way it handles pipelines (if my analysis is correct) is to recursively extract all leaf transformers and estimators, turn them into Dask graph nodes and then, at the end, rebuild pipelines. No sklearn.Pipeline code appears to be executed (and that's where caching is implemented)
My questions are as follow:
Is my analysis correct and that mode is indeed unsupported?
What would be the easiest way to add this caching functionality?
The text was updated successfully, but these errors were encountered:
I wanted to use TPOT with
But I did two things that make me thing this mode of operation is not supported:
My questions are as follow:
The text was updated successfully, but these errors were encountered: