Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need the ability to group operations together. Similar to collections in Tensorflow #8016

Closed
abhinavarora opened this issue Feb 1, 2018 · 4 comments

Comments

@abhinavarora
Copy link
Contributor

This problem came into light when I was investigating the method we could use to move regularization to Pserver (#7432). The current distributed transpiler splits the params and the grads and passes different slices to each pserver. Hence, when we create optimize ops, we use sliced parameters and gradients. However, the distribute transpiler currently does this through a hack. The transpiler identifies these ops by checking if the op contains inputs called Param and Grad. This works well because optimize ops have their own dedicated operations called sgd_op, adam_op, etc.

However, in case of regularization and gradient clipping, we rely on generic tensor ops like scale and elementwise_add. These ops take as inputs the parameters and thus on the pserver they should take the sliced parameters as inputs. Thus we need a way to identify these ops in the distribute transpiler, so that we can make sure that we pass the sliced params and grads as inputs to them. The above-mentioned hack will not work for this case because these are generic ops which have input and output names like X, Y, etc.

A hacky solution would be to create dedicated ops for regularization. Currently, regularization layer adds a scale and an elementwise_add op in Python. Instead, we could create a separate op which composes these 2 ops in C++.

A better and a more sustainable solution would be to support adding tags to Python ops. This could allow us to group ops of similar tags. In this way, we can make sure that all the ops that are added for regularization carry a regularization tag. Similarly, gradient clipping ops carry a tag. The distribute transpiler can then process the ops by tag and apply whatever slicing logic it needs to apply to them. These tags are similar to the concept of Collections in Tensorflow.

@helinwang
Copy link
Contributor

Thank you! I agree.

Maybe we don't need to implement the "A hacky solution would be to create dedicated ops for regularization", if "put regularization on pserver" is not a hard requirement for the Feburary deadline? Otherwise the code will be removed after the "correct" solution is in place.

@abhinavarora
Copy link
Contributor Author

Also including @reyoung because he has the most experience with Fluid Python API

@typhoonzero
Copy link
Contributor

@helinwang

if "put regularization on pserver" is not a hard requirement for the Feburary deadline?

No, I think. Yet this feature could be a great improvement.

@shanyi15
Copy link
Collaborator

您好,此issue在近一个月内暂无更新,我们将于今天内关闭。若在关闭后您仍需跟进提问,可重新开启此问题,我们将在24小时内回复您。因关闭带来的不便我们深表歉意,请您谅解~感谢您对PaddlePaddle的支持!
Hello, this issue has not been updated in the past month. We will close it today for the sake of other user‘s experience. If you still need to follow up on this question after closing, please feel free to reopen it. In that case, we will get back to you within 24 hours. We apologize for the inconvenience caused by the closure and thank you so much for your support of PaddlePaddle Group!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants