-
Notifications
You must be signed in to change notification settings - Fork 685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/zero mix with mp #8036
Merged
Merged
Feat/zero mix with mp #8036
Changes from 53 commits
Commits
Show all changes
62 commits
Select commit
Hold shift + click to select a range
a470ff4
add zero limit
strint 9447157
add debug
strint 5eefdf8
Merge branch 'master' of https://github.com/Oneflow-Inc/oneflow into …
strint e3acaa9
add mix zero test
strint b481a7e
refactor zero api
strint 3b56468
zero test with mp
strint 66c8ac3
add 2d test
strint 4f56df2
add zero nd
strint 635ac69
Merge branch 'master' of https://github.com/Oneflow-Inc/oneflow into …
strint 2834289
add nd zero
strint b805256
add sbp cast
strint 2ede354
test passed soft limit consumer
strint 0227f54
refine size api
strint 7036e04
zero use stage 2
strint c26763e
add limit consumer api
strint d84e8a9
add new api
strint 2555ee1
Merge branch 'master' of https://github.com/Oneflow-Inc/oneflow into …
strint ac0b9d2
refine zero s select
strint dd0a865
fix index out of range
strint e0304a7
Merge branch 'master' of https://github.com/Oneflow-Inc/oneflow into …
strint 501518f
rm zero limit on device type
strint 0e2f9a2
Merge branch 'feat/zero_mix_with_mp' of https://github.com/Oneflow-In…
strint e3eed8c
zero test with activation checkpointing
strint f966b4f
Merge branch 'feat/zero_mix_with_mp' of https://github.com/Oneflow-In…
strint ebc9ff9
add indentity when dp sequence len is 1
strint 2011e2c
move to base with master
strint b7f4fed
fix confict
strint ffe2094
fix
strint b58b48a
fix
strint 6975f33
fix
strint 32bc1d1
Merge branch 'feat/logical_nccl_send_recv' into feat/zero_mix_with_mp
strint cce8efd
add test
strint a30b0c0
debug bad case
strint c73013f
refine test for eager and graph boxing
strint 08b1f69
test case ready
strint 821a8f4
simplify
strint 29079a0
refine test
strint e49d380
fix buff size
strint 9bd521f
Merge branch 'feat/logical_nccl_send_recv' into feat/zero_mix_with_mp
strint b374505
merge master
strint 3fc1821
fix conflict
strint 79e1290
refine zero nd
strint 3225045
refine
strint c751435
add full test
strint 5c78921
revert change
strint bfa726c
refine split check
strint 0bcbf30
fix typo
strint 14c8520
rm log
strint 56754bc
spit long func
strint 567af33
restore test
strint 886914c
Merge branch 'master' of https://github.com/Oneflow-Inc/oneflow into …
strint 84ca778
Update optimizer_placement_optimization_pass.cpp
strint 7095ec3
auto format by CI
oneflow-ci-bot 5a5b9c5
Merge branch 'master' into feat/zero_mix_with_mp
strint b401e66
auto format by CI
oneflow-ci-bot 3c7d2c5
Merge branch 'master' into feat/zero_mix_with_mp
strint 2b0324e
fix static check
strint 7d611c4
add tips for zero api change
strint 04548ac
Merge branch 'master' into feat/zero_mix_with_mp
strint 640487b
auto format by CI
oneflow-ci-bot 9928cdd
Merge branch 'master' into feat/zero_mix_with_mp
mergify[bot] dc4e40d
Merge branch 'master' into feat/zero_mix_with_mp
mergify[bot] File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -71,6 +71,9 @@ void CreateNcclComm(ncclComm_t* comm, const int dev, const std::string& key, | |
<< ", nccl_unique_id = " << NcclUniqueId2String(nccl_unique_id) << ", rank = " << rank | ||
<< ", key = {" << key << "}\n"; | ||
OF_NCCL_CHECK(ncclCommInitRank(comm, device_vec.size(), nccl_unique_id, rank)); | ||
VLOG(2) << " EagerNcclCommMgr::ncclCommInitRank succeed device_vec.size() = " << device_vec.size() | ||
<< ", nccl_unique_id = " << NcclUniqueId2String(nccl_unique_id) << ", rank = " << rank | ||
<< ", key = {" << key << "}\n"; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Debug EagerNcclCommMgr |
||
} | ||
|
||
} // namespace | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -997,13 +997,19 @@ Maybe<void> LazyJobBuildAndInferCtx::Complete() { | |
} | ||
}; | ||
int32_t pass_cnt = 0; | ||
const int64_t prev_v = FLAGS_v; | ||
auto DoPass = [&](const std::string& pass_name, int32_t cnt = 0) -> Maybe<void> { | ||
VLOG(1) << job_name << " is compiling with pass" | ||
<< " pass_cnt_" + std::to_string(pass_cnt) + "-" + pass_name | ||
<< (cnt > 0 ? std::to_string(cnt) : ""); | ||
if (unlikely(NeedLogJob(pass_name))) { | ||
std::string cnt_str = cnt > 0 ? std::to_string(cnt) : ""; | ||
LogJob("pass_cnt_" + std::to_string(pass_cnt) + "-" + pass_name + cnt_str + "-before"); | ||
FLAGS_v = 3; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When debugging a pass, its glog level will turn GLog_v = 3 |
||
} | ||
JUST(JobPass4Name(pass_name)(mut_job(), &job_pass_ctx)); | ||
if (unlikely(NeedLogJob(pass_name))) { | ||
FLAGS_v = prev_v; | ||
std::string cnt_str = cnt > 0 ? std::to_string(cnt) : ""; | ||
LogJob("pass_cnt_" + std::to_string(pass_cnt) + "-" + pass_name + cnt_str + "-after"); | ||
} | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
优化 zero 的 API