Skip to content

Commit

Permalink
!17115 Delete key 0 if all processes in a group are killed
Browse files Browse the repository at this point in the history
Merge pull request !17115 from chuboning/cherry-pick-1735375236
  • Loading branch information
chuboning authored and it-is-a-robot committed Jan 15, 2025
1 parent b0b2938 commit 4c396ab
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions torch_npu/csrc/distributed/ProcessGroupHCCL.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -856,6 +856,8 @@ void ProcessGroupHCCL::abort(c10::optional<std::string> abortReason)
void ProcessGroupHCCL::deleteTCPStoreKey()
{
try {
// all processes in a group may be killed, so delete key 0 as a last resort
store_->deleteKey("0");
for (const auto &key : TCPStoreKeyList_) {
store_->deleteKey(key);
}
Expand Down

0 comments on commit 4c396ab

Please sign in to comment.