Optimize distributed training performance #7607

typhoonzero · 2018-01-17T12:02:38Z

No description provided.

wangkuiyi · 2018-01-17T19:57:40Z

It is difficult to understand the title of this issue.

typhoonzero · 2018-01-18T02:06:04Z

Sorry, the original issue is trying to leave a TODO in the project. I'm writing a transpiler which can split one variable and send to multiple parameter servers to do the partial, elementwise optimization, but the current implementation of split_op have to copy the splited data to newly allocated memory, which can be optimized.

Then we noticed that all the allocation is managed by "buddy allocator" under paddle/memory, if we use several pointers pointing to sections of the original tensor, and then use these pointers in new tensor objects will probably cause memory errors.

I changed this issue to be more general, and we can do some other performance enhancements for now.

typhoonzero mentioned this issue Jan 17, 2018

Enhance distributed train performance #7608

Merged

typhoonzero changed the title ~~Optimize split_op not copy tensor.~~ Optimize distributed training performance Jan 18, 2018

typhoonzero closed this as completed in #7608 Jan 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize distributed training performance #7607

Optimize distributed training performance #7607

typhoonzero commented Jan 17, 2018

wangkuiyi commented Jan 17, 2018

typhoonzero commented Jan 18, 2018

Optimize distributed training performance #7607

Optimize distributed training performance #7607

Comments

typhoonzero commented Jan 17, 2018

wangkuiyi commented Jan 17, 2018

typhoonzero commented Jan 18, 2018