distributed environment #177

drorasaf · 2016-06-21T00:34:31Z

Are there any plans to allow tpot to be used in a distributed environments?

rhiever · 2016-06-21T04:06:08Z

Eventually, yes. There's been discussion of using Dask to parallelize TPOT (cc @tonyfast). We've also been thinking about PySpark for parallel cloud computing. However, we're still focused on getting the core algorithm and tool finished before we really work our way into scaling to distributed environments.

drorasaf · 2016-06-21T16:41:37Z

My common use case is parallel cloud computing and I think that in order for any interesting dataset to come in handy with TPOT it has to scale.
I might consider leaving it up to the user which one he prefers to use since he knows best the use case.

minimumnz · 2016-06-26T10:18:08Z

I'd love better parallel processing on a single machine. I feel sad when i see 3 cores at 0% and 1 at 100%

danthedaniel · 2016-06-26T16:47:04Z

@minimumnz: it's fairly easy to make that change - https://github.com/teaearlgraycold/tpot/tree/parallelize

But TPOT itself likely won't have local parallelization until cluster support is also added, since it'd be much nicer to have both cases covered by one library.

ghgr · 2016-11-04T15:44:27Z

May I ask what is the current priority level of using distributed computing libraries (ideally DASK, that comes with caching) in tpot? I think that's vital for such a project to be usable in the real world and it should be orthogonal to the "core" branch development.

I think that if we manage to represent the whole population of pipelines in a huge dask graph it would be a good start. Then, caching of intermediate results (with the current development branch I'm spending most of the computing time recalculating the same xgboost!), multicore and multi-server would be hand in hand.
Any chance of reopening this issue?

rhiever · 2016-11-04T17:49:22Z

I agree. Can you file a separate issue and list the possible options?

rhiever added the question label Jun 21, 2016

rhiever closed this as completed Sep 1, 2016

ghgr mentioned this issue Nov 4, 2016

Parallelize cross validation as a provisional optimization #302

Closed

ghgr mentioned this issue Nov 5, 2016

Parallelization with python dask and dask-learn. Proposal. #304

Open

AIAdventures mentioned this issue Jun 6, 2017

Titanic example -problem with 2nd last cell. #492

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

distributed environment #177

distributed environment #177

drorasaf commented Jun 21, 2016

rhiever commented Jun 21, 2016 •

edited

Loading

drorasaf commented Jun 21, 2016

minimumnz commented Jun 26, 2016

danthedaniel commented Jun 26, 2016

ghgr commented Nov 4, 2016 •

edited

Loading

rhiever commented Nov 4, 2016

distributed environment #177

distributed environment #177

Comments

drorasaf commented Jun 21, 2016

rhiever commented Jun 21, 2016 • edited Loading

drorasaf commented Jun 21, 2016

minimumnz commented Jun 26, 2016

danthedaniel commented Jun 26, 2016

ghgr commented Nov 4, 2016 • edited Loading

rhiever commented Nov 4, 2016

rhiever commented Jun 21, 2016 •

edited

Loading

ghgr commented Nov 4, 2016 •

edited

Loading