Skip to content

Commit

Permalink
DAOS-7485 control: Implement dmg system drain to act on all hosts (#1…
Browse files Browse the repository at this point in the history
…5506)

Add dmg system drain command to drain a set of storage nodes or ranks
from all the pools they belong too. Takes --ranks or --rank-hosts in
ranged format. Improve unit test coverage for lib/control, cmd/dmg and
server/mgmt_system system related functions.

Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
  • Loading branch information
tanabarr authored Dec 3, 2024
1 parent 9a4bec3 commit 21a881a
Show file tree
Hide file tree
Showing 23 changed files with 2,256 additions and 1,315 deletions.
21 changes: 19 additions & 2 deletions docs/admin/pool_operations.md
Original file line number Diff line number Diff line change
Expand Up @@ -591,8 +591,8 @@ with the following information for each pool:
- The imbalance percentage indicating whether data distribution across
the difference storage targets is well balanced. 0% means that there is
no imbalance and 100% means that out-of-space errors might be returned
by some storage targets while space is still available on others. Again
for the NVMe or DATA tier.
by some storage targets while space is still available on others. Applies
only for the NVMe or DATA tier.
- The number of disabled targets (0 here) and the number of targets that
the pool was originally configured with (total).

Expand Down Expand Up @@ -1260,6 +1260,23 @@ The pool target drain command accepts 2 parameters:
* The engine rank of the target(s) to be drained.
* The target indices of the targets to be drained from that engine rank (optional).

#### System Drain

To drain ranks or hosts from all pools that they belong to, the 'dmg system drain'
command can be used. The command takes either a host-set or rank-set:

To drain a set of hosts from all pools (drains all ranks on selected hosts):

```Bash
$ dmg system drain --rank-hosts foo-[001-100]
```

To drain a set of ranks from all pools:

```Bash
$ dmg system drain --ranks 1-100
```

### Reintegration

After an engine failure and exclusion, an operator can fix the underlying issue
Expand Down
2 changes: 2 additions & 0 deletions src/control/cmd/dmg/command_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,8 @@ func (bci *bridgeConnInvoker) InvokeUnaryRPC(ctx context.Context, uReq control.U
resp = control.MockMSResponse("", nil, &mgmtpb.SystemStartResp{})
case *control.SystemExcludeReq:
resp = control.MockMSResponse("", nil, &mgmtpb.SystemExcludeResp{})
case *control.SystemDrainReq:
resp = control.MockMSResponse("", nil, &mgmtpb.SystemDrainResp{})
case *control.SystemQueryReq:
if req.FailOnUnavailable {
resp = control.MockMSResponse("", system.ErrRaftUnavail, nil)
Expand Down
4 changes: 1 addition & 3 deletions src/control/cmd/dmg/json_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -113,9 +113,7 @@ func TestDmg_JsonOutput(t *testing.T) {
testArgs = append(testArgs, "foo:bar")
case "system del-attr":
testArgs = append(testArgs, "foo")
case "system exclude":
testArgs = append(testArgs, "--ranks", "0")
case "system clear-exclude":
case "system exclude", "system clear-exclude", "system drain":
testArgs = append(testArgs, "--ranks", "0")
}

Expand Down
Loading

0 comments on commit 21a881a

Please sign in to comment.