2018 milestones #9108

reyoung · 2018-03-15T07:58:20Z

Fluid supports multi-GPUs and cluster, and high usability

Deadline:

KPI:

Make fluid supports all models in PaddlePaddle/Book, PaddlePaddle/models. Complete the inference framework of Fluid on linux and mobile.
- Make Baidu teams (Speech, NLP, Image, Abacus) use fluid to train and inference models.
The speed of training models in PaddlePaddle/book is not slower than TF in MultiGPUs and cluster.
The memory consumption of training models in PaddlePaddle/book is not larger than TF in MultiGPUs and a cluster.

Fluid distributed computing

Deadline:

KPI:

Make Fluid support EDL (Elastic Deep Learning). Make cluster training of Fluid can adapt to workload changes by provisioning and de-provisioning resources in an autonomic manner.
Support model parallelism as well as data parallelism.
Make Fluid support OpenMPI APIs to do distributed all-reduce.
Make Fluid support GPU direct when possible.

Compatible with ONNX

Deadline:

KPI:

Make ProgramDesc can be converted to ONNX model files.
Make ONNX can be converted into ProgramDesc, and make Fluid can train ONNX model.

Support CSP program model and imperative programming

Deadline:

KPI:

Users only use Python as a compiler frontend and produce the ProgramDesc, and an interpreter will execute the ProgramDesc.
The ProgramDesc includes IfElse operator and While, and supports auto diff.
Saving and loading model, printing metrics are all configured in ProgramDesc. Deeply integrate with VisualDL to give a GUI.
Support to configure CSP(coroutines, channel, select) in ProgramDesc. Use CSP to implement multi GPUs and cluster training.

The text was updated successfully, but these errors were encountered:

helinwang · 2018-03-15T18:39:01Z

Should we specify a TensorFlow version (e.g., the latest release v1.6.0) for performance comparison? Otherwise if after we surpass TensorFlow's performance, they release a new well-optimized version, we would be in an awkward position.

Not saying that we should not aim at being better than the latest TensorFlow, my point is maybe we should focus on a fixed target first.

helinwang · 2018-03-15T18:41:37Z

Support model parallelism as well as data parallelism

Is there a need for model parallelism other than large embedding lookups. If so we may want to change to "Support large embedding lookups as well as data parallelism"

helinwang · 2018-03-15T18:47:52Z

Make Fluid support OpenMPI APIs to do distributed all-reduce.

Currently, we use the parameter server architecture (via send/recv operator) for parameter update, it's a completely different architecture with all-reduce. From my understanding their theoretical network throughput consumption and time consumption for each step are similar. We already support parameter server architecture, what is the reason that we need to support another approach with similar performance?

Support to configure CSP(coroutines, channel, select) in ProgramDesc. Use CSP to implement multi GPUs and cluster training.

If we use CSP for cluster training, it looks more like the parameter server architecture than the all-reduce architecture.

wangkuiyi · 2018-03-15T22:36:37Z

I agree with @helinwang that MPI AllReduce is NOT part of the milestones of PaddlePaddle. I am open to someone out of PaddlePaddle team to try that approach, but it makes sense only if they could do it when they run PaddlePaddle jobs in containers.

wangkuiyi · 2018-03-15T22:43:07Z

Thanks to @reyoung and @helinwang and others for this list.

I tried to summarize the milestones as the follows. @PaddlePaddle/paddle team please comment

note from @panyx0718 about eng time: 6 fulltime months could mean 3 people spend 2 months fulltime working on something

Version	ETA	Features	eng time
0.11.2	end-March	Support FP16	2+ fulltime months
0.11.3	mid-April	Improve the performance of the single-node-multi-GPU setting	5+ fulltime months
0.11.4	end-April	Improve the performance of the multi-node setting	5+ fulltime months
		Implement the distributed lookup table using memcached	?
0.12	end-April	ONNX-compatible and Fluid-to-ONNX converter	?
		Support the imperative programming paradigm	30+ fulltime months
0.12.1	end-May	Improve the Fluid API and make it stable	15+ fulltime months
0.13	end-June	Fluid on Kubernetes and EDL	?
0.13.1	end-July	Develop the transpiler for CUDA	?
0.13.2	mid-August	Develop the transpiler for ARM	?
0.13.3	end-August	Develop the transpiler for ROCm (depends on AMD team's work)	?
0.13.4	mid-September	Develop the transpiler for Intel Movidius	?
0.14	mid-October	Fluid Debugger	5 fulltime monts
1.0	end-October	Wrap up the above work and officially release the new set of tutorials, documentation, Websites	?
1.0.1	end-November	Fluid on EDL Kubernetes stable and performant for cloud service deployment	?

varunarora · 2018-03-15T23:38:12Z

This is a solid list, thanks to everyone who worked on it. (more comments coming soon)

typhoonzero · 2018-03-16T01:50:08Z

@helinwang @wangkuiyi The reason for supporting MPI all-reduce is that latest openmpi implement can support GPU direct if the hardware supports it. This is the fastest way to implement very high performance distributed GPU training. Anyway, we can still try the time-consuming way: implement GPU direct using CUDA libs directly. What do you think?

PaddleCI · 2018-03-16T03:16:11Z

The principle is that we need to make sure that the distributed training system is reasonably easy to use. Given that most AI systems would depend on not only PaddlePaddle, but application-specific third party software, e.g., OpenCV for vision, we’d prefer to run AI applications inside containers. Another invariant is that we don’t want to loss the capability of fault recovery. Given above two rules, I would like to see an application of MPI with PaddlePaddle Fluid. I think the capability of starting a distributed job using MPI is more urgent than the speed-up, because it seems that many teams inside Baidu are using MPI. On Mar 15, 2018, at 6:50 PM, 武毅 <notifications@github.com<mailto:notifications@github.com>> wrote: @helinwang<https://github.com/helinwang> @wangkuiyi<https://github.com/wangkuiyi> The reason for supporting MPI all-reduce is that latest openmpi implement can support GPU direct if the hardware supports it. This is the fastest way to implement very high performance distributed training. Anyway, we can still try the time-consuming way: implement GPU direct using CUDA libs directly. What do you think? — You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub<#9108 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AbwsoChN6ePfsLVvjT8Bau2lEFFv0nC2ks5texpggaJpZM4Srsy->.

panyx0718 · 2018-03-16T06:01:11Z

@wangkuiyi Thanks for the milestone.

I suggest we add 2 more columns: 1. the number of full time engineers and 2. the number of months spent developing them. For example:
0.13.1 | end-July | Develop the transpiler for CUDA | fulltime N people | develop for M months

wangkuiyi · 2018-03-16T18:02:04Z

Good point! @panyx0718 Please feel free to add these columns.

helinwang · 2018-03-16T18:18:44Z

@typhoonzero @wangkuiyi @PaddleCI thanks for the comments! Good to know the MPI need from Baidu, as well as the GPU direct support from openmpi.

Given that we already have NCCL all-reduce, the development time for integrating openmpi (or NCCL2) maybe not that high, plus the additional benefit of already tuned communication and GPU direct support. That could save us a lot of effort. Fault recovery can be added by checkpointing, maybe fault tolerance on MPI can be added with some special peer-aware logic that creates a new MPI communicator when some node left or joined.

panyx0718 · 2018-03-17T02:12:22Z

@wangkuiyi Done

shanyi15 · 2018-08-15T11:06:14Z

您好，此issue在近一个月内暂无更新，我们将于今天内关闭。若在关闭后您仍需跟进提问，可重新开启此问题，我们将在24小时内回复您。因关闭带来的不便我们深表歉意，请您谅解~感谢您对PaddlePaddle的支持!
Hello, this issue has not been updated in the past month. We will close it today for the sake of other user‘s experience. If you still need to follow up on this question after closing, please feel free to reopen it. In that case, we will get back to you within 24 hours. We apologize for the inconvenience caused by the closure and thank you so much for your support of PaddlePaddle Group!

reyoung added the need be discussed label Mar 15, 2018

reyoung assigned wangkuiyi and typhoonzero Mar 15, 2018

panyx0718 self-assigned this Mar 16, 2018

varunarora mentioned this issue Mar 22, 2018

Add design doc for onnx convertor #9296

Merged

shanyi15 closed this as completed Aug 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2018 milestones #9108

2018 milestones #9108

reyoung commented Mar 15, 2018 •

edited by typhoonzero

Loading

helinwang commented Mar 15, 2018 •

edited

Loading

helinwang commented Mar 15, 2018

helinwang commented Mar 15, 2018 •

edited

Loading

wangkuiyi commented Mar 15, 2018 •

edited

Loading

wangkuiyi commented Mar 15, 2018 •

edited by panyx0718

Loading

varunarora commented Mar 15, 2018

typhoonzero commented Mar 16, 2018 •

edited

Loading

PaddleCI commented Mar 16, 2018 via email

panyx0718 commented Mar 16, 2018

wangkuiyi commented Mar 16, 2018

helinwang commented Mar 16, 2018

panyx0718 commented Mar 17, 2018

shanyi15 commented Aug 15, 2018

2018 milestones #9108

2018 milestones #9108

Comments

reyoung commented Mar 15, 2018 • edited by typhoonzero Loading

Fluid supports multi-GPUs and cluster, and high usability

Deadline:

KPI:

Fluid distributed computing

Deadline:

KPI:

Compatible with ONNX

Deadline:

KPI:

Support CSP program model and imperative programming

Deadline:

KPI:

helinwang commented Mar 15, 2018 • edited Loading

helinwang commented Mar 15, 2018

helinwang commented Mar 15, 2018 • edited Loading

wangkuiyi commented Mar 15, 2018 • edited Loading

wangkuiyi commented Mar 15, 2018 • edited by panyx0718 Loading

varunarora commented Mar 15, 2018

typhoonzero commented Mar 16, 2018 • edited Loading

PaddleCI commented Mar 16, 2018 via email

panyx0718 commented Mar 16, 2018

wangkuiyi commented Mar 16, 2018

helinwang commented Mar 16, 2018

panyx0718 commented Mar 17, 2018

shanyi15 commented Aug 15, 2018

reyoung commented Mar 15, 2018 •

edited by typhoonzero

Loading

helinwang commented Mar 15, 2018 •

edited

Loading

helinwang commented Mar 15, 2018 •

edited

Loading

wangkuiyi commented Mar 15, 2018 •

edited

Loading

wangkuiyi commented Mar 15, 2018 •

edited by panyx0718

Loading

typhoonzero commented Mar 16, 2018 •

edited

Loading