New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Recompute upgrade #47985

Merged

gongweibao merged 11 commits into PaddlePaddle:develop from wuhuachaocoding:recompute_upgrade

Dec 20, 2022

Contributor

wuhuachaocoding commented Nov 15, 2022 •

edited

Loading

PR types

New features

PR changes

Others

Describe

【Recompute upgrade】:
in the past, there is just one PyLayer implementation of recompute, here , this PR adds other hook implementation of recompute.

【test】

【features】
A100 40G fleetx

【loss align】

wuhuachaocoding added 2 commits

November 8, 2022 11:20


          update recompute.

3ed8d7d


          upgrade recompute.

dad7966

paddle-bot bot commented Nov 15, 2022

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

wuhuachaocoding and others added 6 commits

November 15, 2022 07:06


          update.

7e89a42


          update.

05a0064


          update for annotation.

ecc19e2


          update test.

0c216b8


          refine save hook

8d76bbb


          Merge branch 'refine_savehook' into recompute_upgrade

fb212aa

gongweibao requested changes

View reviewed changes

Contributor

gongweibao left a comment

Modify docs and put the link to comments.

python/paddle/distributed/fleet/recompute/recompute.py

+                  recompute without reentrant, that means use hook to implement the recompute function rather than re-entrant autograd.
+                  """
+                  if preserve_rng_state:

Contributor

gongweibao Nov 23, 2022

User can't use CPU or other XXPU?

Contributor Author

wuhuachaocoding Nov 23, 2022

only support GPU to preserve rng state now.

python/paddle/distributed/fleet/recompute/recompute.py

+                  if tracer._amp_dtype == 'float16':
+                      amp_dtype = 'float16'
+                  elif tracer._amp_dtype in ('bfloat16', 'float32'):
+                      amp_dtype = 'bfloat16'

Contributor

gongweibao Nov 23, 2022

float32->bfloat16?

Contributor Author

wuhuachaocoding Nov 23, 2022

auto_cast supports two dtypes: float16 and bfloat16.

python/paddle/distributed/fleet/recompute/recompute.py

+                      fwd_cuda_rng_state_tracker = (
+                          get_rng_state_tracker().get_states_tracker()
+                      )
+                  tracer = framework._dygraph_tracer()

Contributor

gongweibao Nov 23, 2022

Are these functions all internal functions with _?

Contributor Author

wuhuachaocoding Nov 23, 2022

yes, and only use this way to get tracer from python, the name of _dygraph_tracer is defined by framewark.

python/paddle/distributed/fleet/recompute/recompute.py Outdated

@@ @@ -13,6 +13,7 @@ @@
               # limitations under the License.
               import paddle
+              import weakref
               from paddle.fluid import core

Contributor

gongweibao Nov 23, 2022 •

edited

Loading

Don't use fluid now.

Contributor Author

wuhuachaocoding Nov 23, 2022

DONE

gongweibao requested a review from ForFishes

November 23, 2022 02:14


          update for recompute.

81929d6

ForFishes previously approved these changes

View reviewed changes

Member

ForFishes left a comment

LGTM


          Merge remote-tracking branch 'upstream/develop' into recompute_upgrade

db416b0

wuhuachaocoding dismissed ForFishes’s stale review via

db416b0

December 8, 2022 08:36


          update

bad716d

sljlp reviewed

View reviewed changes

python/paddle/distributed/fleet/recompute/recompute.py

+              from paddle.distributed.fleet.meta_parallel.parallel_layers.random import (
+                  get_rng_state_tracker,
+              )
+              from paddle.framework import core, in_dygraph_mode

Contributor

sljlp Dec 19, 2022

OK

gongweibao approved these changes

View reviewed changes

Contributor

gongweibao left a comment

LGTM

gongweibao merged commit 64f780c into PaddlePaddle:develop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet