Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize distributed training performance #7607

Closed
typhoonzero opened this issue Jan 17, 2018 · 2 comments
Closed

Optimize distributed training performance #7607

typhoonzero opened this issue Jan 17, 2018 · 2 comments

Comments

@typhoonzero
Copy link
Contributor

No description provided.

@wangkuiyi
Copy link
Collaborator

It is difficult to understand the title of this issue.

@typhoonzero typhoonzero changed the title Optimize split_op not copy tensor. Optimize distributed training performance Jan 18, 2018
@typhoonzero
Copy link
Contributor Author

Sorry, the original issue is trying to leave a TODO in the project. I'm writing a transpiler which can split one variable and send to multiple parameter servers to do the partial, elementwise optimization, but the current implementation of split_op have to copy the splited data to newly allocated memory, which can be optimized.

Then we noticed that all the allocation is managed by "buddy allocator" under paddle/memory, if we use several pointers pointing to sections of the original tensor, and then use these pointers in new tensor objects will probably cause memory errors.

I changed this issue to be more general, and we can do some other performance enhancements for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants