Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor recompute #45348

Closed

Conversation

wuhuachaocoding
Copy link
Contributor

@wuhuachaocoding wuhuachaocoding commented Aug 23, 2022

PR types

Others

PR changes

Others

Describe

Refacto'r recompute api,
Users can use these fleet api about recompute:
1、fleet.recompute()
2、fleet.recompute_sequential() #for sequential model
3、fleet.recompute_hybrid() #hybrid parallel, recompute support offload and activation functions.
Users are not aware of these.

remarks:
(1)'fleet.recompute_sequential' and 'fleet.recompute_hybrid' are newly added API.
(2)the dir root of 'recompute' API is changed for easy usability.

in the past:
#  this method of calling recompute is not  recommend, maybe remove it in the future(2.4.0 version).
from paddle.distributed.fleet.utils import recompute


now:
# this method of calling recompute is recommend.
from paddle.distributed.fleet import recompute

@paddle-bot
Copy link

paddle-bot bot commented Aug 23, 2022

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Copy link
Contributor

@gongweibao gongweibao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments.

python/paddle/distributed/fleet/__init__.py Outdated Show resolved Hide resolved
python/paddle/distributed/fleet/__init__.py Outdated Show resolved Hide resolved
@ZHUI
Copy link
Collaborator

ZHUI commented Aug 25, 2022

这个 PR 可以支持 数据并行吗? 不用手动 merge 梯度的那种。

@gongweibao
Copy link
Contributor

这个 PR 可以支持 数据并行吗? 不用手动 merge 梯度的那种。

需要增加save_tensor_hooks,依赖红雨。

Copy link
Contributor

@gongweibao gongweibao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个事情急不得!

@gongweibao gongweibao changed the title refactor recompute Refactor recompute Sep 2, 2022
gongweibao
gongweibao previously approved these changes Sep 2, 2022
Copy link
Contributor

@gongweibao gongweibao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

fuyinno4
fuyinno4 previously approved these changes Sep 2, 2022
dingjiaweiww
dingjiaweiww previously approved these changes Sep 5, 2022
@@ -30,7 +30,10 @@
ch.setFormatter(formatter)
logger.addHandler(ch)

__all__ = []
__all__ = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里不用重复放到__all__里面,这个API就更长了
paddle.distributed.fleet.recompute.recompute.recompute

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

赞同非对外的函数不用放到__all__中。在这个模块的__init__.py中放置一下作为这个模块的接口就可以了。

但是为何这样调用paddle.distributed.fleet.recompute.recompute.recompute,不太理解!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

python/paddle/distributed/fleet/recompute/__init__.py Outdated Show resolved Hide resolved
@paddle-bot paddle-bot bot closed this Sep 19, 2023
@paddle-bot
Copy link

paddle-bot bot commented Sep 19, 2023

Since you haven't replied for more than a year, we have closed this issue/pr.
If the problem is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up.
由于您超过一年未回复,我们将关闭这个issue/pr。
若问题未解决或有后续问题,请随时重新打开,我们会继续跟进。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants