-
Notifications
You must be signed in to change notification settings - Fork 790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/graph logical op debug repr #8131
Merged
Merged
Changes from all commits
Commits
Show all changes
149 commits
Select commit
Hold shift + click to select a range
a470ff4
add zero limit
strint 9447157
add debug
strint 5eefdf8
Merge branch 'master' of https://github.com/Oneflow-Inc/oneflow into …
strint e3acaa9
add mix zero test
strint b481a7e
refactor zero api
strint 3b56468
zero test with mp
strint 66c8ac3
add 2d test
strint 4f56df2
add zero nd
strint 635ac69
Merge branch 'master' of https://github.com/Oneflow-Inc/oneflow into …
strint 2834289
add nd zero
strint b805256
add sbp cast
strint 2ede354
test passed soft limit consumer
strint 0227f54
refine size api
strint 5989506
add module config
xiacijie dd08951
save nn.Module info in job.proto for better debugging
xiacijie 090a7f4
add new line
xiacijie e577d82
Merge branch 'master' into add-ModuleBlock.ops()-method
xiacijie 9560f56
add ModuleBlock.ops_proto() API
xiacijie 01c19f8
Merge branch 'add-ModuleBlock.ops()-method' of github.com:Oneflow-Inc…
xiacijie 7036e04
zero use stage 2
strint e5e637d
Merge branch 'master' into add-ModuleBlock.ops()-method
xiacijie 9eb7a5a
print operators' info when print ModuleBlock
xiacijie b727d97
Merge branch 'add-ModuleBlock.ops()-method' of github.com:Oneflow-Inc…
xiacijie 2269b9e
handle VariableOpConf
xiacijie 7ea7fc1
update
xiacijie 2dfd997
Merge branch 'master' into add-ModuleBlock.ops()-method
xiacijie 048965f
update
xiacijie 35d23b0
fix
xiacijie 18fce6c
Merge branch 'add-ModuleBlock.ops()-method' of github.com:Oneflow-Inc…
xiacijie 8bc590f
move operators repr method to graph util
xiacijie c26763e
add limit consumer api
strint d84e8a9
add new api
strint 2555ee1
Merge branch 'master' of https://github.com/Oneflow-Inc/oneflow into …
strint ac0b9d2
refine zero s select
strint d2f9f35
Merge branch 'add-ModuleBlock.ops()-method' of https://github.com/One…
strint 55bb6df
add module block
strint 5039557
fix
strint 8e0abb7
refact for rm op in module conf
strint 511e25b
fix
strint 9101fb7
add sbp debug
strint 8cb036a
add sbp repr
strint 0ab75a2
add shape
strint c69e35f
refine
strint 8eb62fe
Merge branch 'master' of https://github.com/Oneflow-Inc/oneflow into …
strint 1110066
add sys op in repr
strint 4306ac7
add full op debug
strint dd0a865
fix index out of range
strint 51f3559
Merge branch 'feat/zero_mix_with_mp' of https://github.com/Oneflow-In…
strint e0304a7
Merge branch 'master' of https://github.com/Oneflow-Inc/oneflow into …
strint 73da0b7
Merge branch 'feat/zero_mix_with_mp' of https://github.com/Oneflow-In…
strint 501518f
rm zero limit on device type
strint 0e2f9a2
Merge branch 'feat/zero_mix_with_mp' of https://github.com/Oneflow-In…
strint 8a67dd4
add no scope op to graph
strint e3eed8c
zero test with activation checkpointing
strint f966b4f
Merge branch 'feat/zero_mix_with_mp' of https://github.com/Oneflow-In…
strint a6b16cd
merge zero
strint ab20d2e
Merge branch 'master' into feat/op_level_debug_backward_sbp
strint 1599ee6
fix order
strint 5144e32
Merge branch 'feat/op_level_debug_backward_sbp' of https://github.com…
strint ebc9ff9
add indentity when dp sequence len is 1
strint e77dd89
add debug repr
strint cc40e14
refine repr of op
strint 9a61d3b
refine and fix
strint 51b9657
rm useless log
strint 2011e2c
move to base with master
strint b7f4fed
fix confict
strint 1ba26df
merge op level debug
strint ffe2094
fix
strint b58b48a
fix
strint 6975f33
fix
strint 32bc1d1
Merge branch 'feat/logical_nccl_send_recv' into feat/zero_mix_with_mp
strint 70d793e
Merge branch 'feat/zero_mix_with_mp' into feat/op_level_debug_backwar…
strint 5209b02
fix proto
strint 98dcf2d
refine test
strint 484aff0
fix type
strint cce8efd
add test
strint a30b0c0
debug bad case
strint c73013f
refine test for eager and graph boxing
strint 08b1f69
test case ready
strint 821a8f4
simplify
strint 29079a0
refine test
strint e49d380
fix buff size
strint 9bd521f
Merge branch 'feat/logical_nccl_send_recv' into feat/zero_mix_with_mp
strint f82a317
Merge branch 'feat/zero_mix_with_mp' into feat/op_level_debug_backwar…
strint b374505
merge master
strint 3fc1821
fix conflict
strint 79e1290
refine zero nd
strint 3225045
refine
strint bbe7114
Merge branch 'feat/zero_mix_with_mp' into feat/op_level_debug_backwar…
strint c751435
add full test
strint 5c78921
revert change
strint bfa726c
refine split check
strint 0bcbf30
fix typo
strint 14c8520
rm log
strint 56754bc
spit long func
strint 459d6f5
Merge branch 'feat/zero_mix_with_mp' into feat/op_level_debug_backwar…
strint b78c4bd
refine
strint 567af33
restore test
strint a1c0ff2
Merge branch 'feat/zero_mix_with_mp' into feat/op_level_debug_backwar…
strint c1508e3
merge master
strint b5bdbef
refine pass and mem debug
strint 886914c
Merge branch 'master' of https://github.com/Oneflow-Inc/oneflow into …
strint 3957515
Merge branch 'feat/zero_mix_with_mp' into feat/op_level_debug_backwar…
strint b7dad59
merge master
strint a6aa236
repr dtype
strint 9840be2
add placement
strint 84ca778
Update optimizer_placement_optimization_pass.cpp
strint 7095ec3
auto format by CI
oneflow-ci-bot 5a5b9c5
Merge branch 'master' into feat/zero_mix_with_mp
strint b401e66
auto format by CI
oneflow-ci-bot 3c7d2c5
Merge branch 'master' into feat/zero_mix_with_mp
strint 2b0324e
fix static check
strint 7d611c4
add tips for zero api change
strint 04548ac
Merge branch 'master' into feat/zero_mix_with_mp
strint 640487b
auto format by CI
oneflow-ci-bot 9928cdd
Merge branch 'master' into feat/zero_mix_with_mp
mergify[bot] dc4e40d
Merge branch 'master' into feat/zero_mix_with_mp
mergify[bot] a54ff83
Merge branch 'feat/zero_mix_with_mp' of https://github.com/Oneflow-In…
strint e40e732
fix merge
strint 451bb22
merge new update
strint 3b3b1a9
merge master
strint ee7ea67
auto format by CI
oneflow-ci-bot 9b2cf2d
auto format by CI
oneflow-ci-bot 7b5c6f3
Merge branch 'master' into feat/op_level_debug_backward_sbp
strint bc556d4
refine get job api
strint ca8d852
refine graph util import order
strint 1b3bbaa
Merge branch 'master' of https://github.com/Oneflow-Inc/oneflow into …
strint 66923eb
auto format by CI
oneflow-ci-bot 7bb0c85
fix static check
strint 79b1f3d
Merge branch 'feat/op_level_debug_backward_sbp' of https://github.com…
strint 7fdcc67
Merge branch 'master' into feat/op_level_debug_backward_sbp
strint 95cebef
auto format by CI
oneflow-ci-bot 0e0101e
Merge branch 'master' into feat/op_level_debug_backward_sbp
strint fe1013c
Merge branch 'master' into feat/op_level_debug_backward_sbp
mergify[bot] 45c2f37
fix special case
strint 51931ca
Merge branch 'feat/op_level_debug_backward_sbp' of https://github.com…
strint c010589
Merge branch 'master' into feat/op_level_debug_backward_sbp
strint 9966b89
Merge branch 'master' into feat/op_level_debug_backward_sbp
strint 6b01da2
refine level print and add full dtype repr
strint c882457
rm useless
strint b7b7429
Merge branch 'master' into feat/op_level_debug_backward_sbp
mergify[bot] d637e20
Merge branch 'master' into feat/op_level_debug_backward_sbp
mergify[bot] 49c8562
Merge branch 'master' into feat/op_level_debug_backward_sbp
strint 22ea17a
Merge branch 'master' into feat/op_level_debug_backward_sbp
mergify[bot] 1461e30
Merge branch 'master' into feat/op_level_debug_backward_sbp
mergify[bot] e145a0c
Merge branch 'master' into feat/op_level_debug_backward_sbp
mergify[bot] 37999e7
Merge branch 'master' into feat/op_level_debug_backward_sbp
strint a865cc5
Merge branch 'master' into feat/op_level_debug_backward_sbp
mergify[bot] 9ecad29
Merge branch 'master' into feat/op_level_debug_backward_sbp
mergify[bot] File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,7 @@ | ||
syntax = "proto2"; | ||
package oneflow; | ||
|
||
import "oneflow/core/operator/op_conf.proto"; | ||
|
||
message ModuleConf { | ||
required string name = 1; | ||
repeated OperatorConf ops = 2; | ||
repeated string ops = 2; | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里为什么要把没有 chunk 的 mem block 也放在 chunk info 里呢? 好处是? 现在 chunk info 里增加了一大堆模型的 block,看起来不方便。
语义上也不正确。 这些 mem block 是不参与内存复用的,不在 chunk 内。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果要加,也应该是加到 rank memory info 里,而不是 chunk info 里。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是为了 debug 那些 Memory out of Chunk 的 op,之前有次这里数据比较异常,所以也加进来做 debug了。