You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
详细描述:异构集群(P40,K40,V100,A100)的设备有不同的显存大小,算力吞吐。在异构集群上进行分布式数据并行,需要考虑不同硬件的显存和算力,来实现在所有硬件显存不溢出的前提下达到最高的整体训练吞吐。参赛者需要通过 Cost Model 对不同异构硬件的显存和算力、任务模型进行建模,并实现一套负载均衡的算法; 将建模信息作为均衡算法输入,计算出每个设备的上 local batch size 等具体训练参数。评价指标是:任务模型使用均衡算法得到的训练参数,在异构集群上数据并行整体吞吐。
Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the API,FAQ,Github Issue and AI community to get the answer.Have a nice day!
【PaddlePaddle Hackathon 4】核心框架开源贡献其他任务合集
(此 ISSUE 为 PaddlePaddle Hackathon 第四期活动的任务 ISSUE,更多详见 【PaddlePaddle Hackathon 第四期】任务总览)
注:报名参与其他任务的同学可以向 paddle-hack@baidu.com 发邮件,我们会邀请你加入对应的社群参与讨论。开发请参考 贡献指南,任务列表如下:
No.89:清理动态 import语句,解决circle import 问题
No.90:JITLayer C++ 端暴露AnaLysisConfig 给用户,提升易用性
No.91:TensorHook支持动转静
No.92:ppocr det&rec 全量化模型在 tim-vx(晶晨/瑞芯微) 等设备上的精度提升
No.93:增加 linux 下 cpu tensor file_descriptor 传输方案
No.94:GPU tensor 全局引用计数
CudaIPCSentData
,CudaIPCRefcountedFiles
等功能,将ipc 传输后的Tensor与CudaIPCSentData
使用UniqueVoidPtr
绑定。全局引用计数。No.95:CPU tensor mac/win32 传输 + 适配 DataLoader
No.96:基于 Paddle 的数据并行DataParallel 添加 join 接口,满足数据流的不均衡输入
No.97:基于 Paddle 实现异构集群数据并行训练自动负载均衡
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
答疑交流
The text was updated successfully, but these errors were encountered: