domain | shortname | name | status | editor | contributors | |
---|---|---|---|---|---|---|
github.com |
11/HCTL |
Hare Controller CLI |
stable |
Mandar Sawant <mandar.sawant@seagate.com> |
|
hctl
is a command-line interface for managing Hare cluster.
$ hctl --help
Usage: hctl <command> [options]
Interact with Motr cluster.
Commands:
bootstrap bootstrap the cluster
drive-state post drive event to Motr
fetch-fids Fetch the fids for motr services (m0d and s3server)
node manage the cluster nodes
node-join Start and join a node with the cluster.
reportbug gather Hare forensic data
shutdown stop the cluster or local node
start start cluster or local node
status show cluster status
help Show this help and exit.
$ hctl bootstrap --help
Usage: hare-bootstrap [<option>]... <CDF>
hare-bootstrap [<option>]... --conf-dir <dir>
Bootstrap the cluster.
Positional arguments:
<CDF> Path to the cluster description file.
-c, --conf-dir <dir> Don't generate configuration files, use existing
ones from the specified directory.
Options:
--debug Print commands and their arguments as they are executed.
--mkfs Execute m0mkfs. *CAUTION* This wipes all Motr data!
-h, --help Show this help and exit.
Initial bootstrap requires --mkfs
. Subsequent hctl bootstrap
calls SHOULD NOT use --mkfs
, otherwise any data stored in Motr will be lost.
$ hctl bootstrap --mkfs /tmp/ldr1-cluster.yaml
2020-09-03 09:54:39: Generating cluster configuration... OK
2020-09-03 09:54:40: Starting Consul server agent on this node........... OK
2020-09-03 09:54:49: Importing configuration into the KV store... OK
2020-09-03 09:54:49: Starting Consul agents on other cluster nodes... OK
2020-09-03 09:54:50: Updating Consul agents configs from the KV store... OK
2020-09-03 09:54:51: Installing Motr configuration files... OK
2020-09-03 09:54:51: Waiting for the RC Leader to get elected........... OK
2020-09-03 09:55:00: Starting Motr (phase1, mkfs)... OK
2020-09-03 09:55:06: Starting Motr (phase1, m0d)... OK
2020-09-03 09:55:09: Starting Motr (phase2, mkfs)... OK
2020-09-03 09:55:16: Starting Motr (phase2, m0d)... OK
2020-09-03 09:55:19: Checking health of services... OK
$ hctl shutdown
Stopping m0d@0x7200000000000001:0xc (ios) at ssc-vm-c-0552.colo.seagate.com...
Stopping m0d@0x7200000000000001:0x29 (ios) at ssc-vm-c-0553.colo.seagate.com...
Stopped m0d@0x7200000000000001:0xc (ios) at ssc-vm-c-0552.colo.seagate.com
Stopped m0d@0x7200000000000001:0x29 (ios) at ssc-vm-c-0553.colo.seagate.com
Stopping m0d@0x7200000000000001:0x9 (confd) at ssc-vm-c-0552.colo.seagate.com...
Stopping m0d@0x7200000000000001:0x26 (confd) at ssc-vm-c-0553.colo.seagate.com...
Stopped m0d@0x7200000000000001:0x26 (confd) at ssc-vm-c-0553.colo.seagate.com
Stopped m0d@0x7200000000000001:0x9 (confd) at ssc-vm-c-0552.colo.seagate.com
Stopping hare-hax at ssc-vm-c-0552.colo.seagate.com...
Stopping hare-hax at ssc-vm-c-0553.colo.seagate.com...
Stopped hare-hax at ssc-vm-c-0552.colo.seagate.com
Stopped hare-hax at ssc-vm-c-0553.colo.seagate.com
Stopping hare-consul-agent at ssc-vm-c-0552.colo.seagate.com...
Stopping hare-consul-agent at ssc-vm-c-0553.colo.seagate.com...
Stopped hare-consul-agent at ssc-vm-c-0552.colo.seagate.com
Stopped hare-consul-agent at ssc-vm-c-0553.colo.seagate.com
Killing RC Leader at ssc-vm-c-0552.colo.seagate.com... **ERROR**
$ hctl shutdown --node
Stopping m0d@0x7200000000000001:0x30 (ios) at ssc-vm-2090.colo.seagate.com...
Stopped m0d@0x7200000000000001:0x30 (ios) at ssc-vm-2090.colo.seagate.com
Stopping m0d@0x7200000000000001:0x2d (confd) at ssc-vm-2090.colo.seagate.com...
Stopped m0d@0x7200000000000001:0x2d (confd) at ssc-vm-2090.colo.seagate.com
Stopping hare-hax at ssc-vm-2090.colo.seagate.com...
Stopped hare-hax at ssc-vm-2090.colo.seagate.com
Stopping hare-consul-agent at ssc-vm-2090.colo.seagate.com...
Stopped hare-consul-agent at ssc-vm-2090.colo.seagate.com
$ hctl start
2020-09-18 06:59:59: Starting Consul server agent on this node............ OK
2020-09-18 07:00:09: Importing configuration into the KV store... OK
2020-09-18 07:00:09: Starting Consul agents on other cluster nodes... OK
2020-09-18 07:00:09: Updating Consul agents configs from the KV store... OK
2020-09-18 07:00:10: Installing Motr configuration files... OK
2020-09-18 07:00:10: Waiting for the RC Leader to get elected..... OK
2020-09-18 07:00:12: Starting Motr (phase1, m0d)... OK
2020-09-18 07:00:15: Starting Motr (phase2, m0d)... OK
2020-09-18 07:00:18: Checking health of services... OK
$ hctl start --node
2021-02-19 05:11:04: Starting Consul agent on this node.... OK
2021-02-19 05:11:05: Starting Motr (phase1, m0d)... OK
2021-02-19 05:11:09: Starting Motr (phase2, m0d)... OK
2021-02-19 05:11:12: Checking health of services... OK
OK
$ hctl status --help
usage: hare-status [OPTION]
Show cluster status.
optional arguments:
-h, --help show this help message and exit
--json show output in JSON format
$ hctl status
Data pools:
# fid name
0x6f00000000000001:0x2e 'tier1-ssd'
0x6f00000000000001:0x39 'tier2-hdd'
Profiles:
# fid name: pool(s)
0x7000000000000001:0x54 'fast': 'tier1-ssd'
0x7000000000000001:0x55 'slow': 'tier2-hdd'
0x7000000000000001:0x56 'both': 'tier1-ssd' 'tier2-hdd'
Services:
localhost (RC)
[started] hax 0x7200000000000001:0x6 172.28.128.45@tcp:12345:1:1
[started] confd 0x7200000000000001:0x9 172.28.128.45@tcp:12345:2:1
[started] ioservice 0x7200000000000001:0xc 172.28.128.45@tcp:12345:2:2
[unknown] m0_client 0x7200000000000001:0x28 172.28.128.45@tcp:12345:4:1
[unknown] m0_client 0x7200000000000001:0x2b 172.28.128.45@tcp:12345:4:2
$ hctl status --json
{
"pools": [
{
"fid": "0x6f00000000000001:0x2e",
"name": "tier1-ssd"
},
{
"fid": "0x6f00000000000001:0x39",
"name": "tier2-hdd"
}
],
"profiles": [
{
"fid": "0x7000000000000001:0x54",
"name": "fast",
"pools": [
"tier1-ssd"
]
},
{
"fid": "0x7000000000000001:0x55",
"name": "slow",
"pools": [
"tier2-hdd"
]
},
{
"fid": "0x7000000000000001:0x56",
"name": "both",
"pools": [
"tier1-ssd",
"tier2-hdd"
]
}
],
"filesystem": {
"stats": {
"fs_free_seg": 8590389096,
"fs_total_seg": 8590951472,
"fs_free_disk": 104689827840,
"fs_avail_disk": 104689827840,
"fs_total_disk": 104689827840,
"fs_svc_total": 2,
"fs_svc_replied": 2
},
"timestamp": 1602613220.761281,
"date": "2020-10-13T18:20:20.761281"
},
"nodes": [
{
"name": "localhost",
"svcs": [
{
"name": "hax",
"fid": "0x7200000000000001:0x6",
"ep": "172.28.128.45@tcp:12345:1:1",
"status": "started"
},
{
"name": "confd",
"fid": "0x7200000000000001:0x9",
"ep": "172.28.128.45@tcp:12345:2:1",
"status": "started"
},
{
"name": "ioservice",
"fid": "0x7200000000000001:0xc",
"ep": "172.28.128.45@tcp:12345:2:2",
"status": "started"
},
{
"name": "m0_client",
"fid": "0x7200000000000001:0x28",
"ep": "172.28.128.45@tcp:12345:4:1",
"status": "unknown"
},
{
"name": "m0_client",
"fid": "0x7200000000000001:0x2b",
"ep": "172.28.128.45@tcp:12345:4:2",
"status": "unknown"
}
]
}
]
}
This script fetches fids of motr and s3 services in a cluster or a node from consul-kv.json
generated by cfgen, and displays it using hctl fetch-fids
utility.
Note: Currently the script can run only from the node running bootstrap command.
$ hctl fetch-fids --help
usage: hare-fetch-fids [OPTION]
Fetches the fids for motr services (m0d and s3server) in cluster or node.
optional arguments:
-h, --help show this help message and exit
--service SERVICE, -s SERVICE
service name. - Returns fid for given service. List of
services- "confd", "ioservice", "s3server". Default:
Returns the fids for all the services.
--node NODE, -n NODE node-name - Returns the fids of services for the given
node. Default: Local node
--all Returns fids of all the services for nodes in cluster.
--json show output in JSON format
Usage: hctl fetch-fids [-s <service-name>] [-n <node-name>] [--all] [--json]
$ hctl fetch-fids -n ssc-vm-c-1813.colo.seagate.com -s ioservice --json
[
{
"name": "ioservice",
"fid": "0x7200000000000001:0xc"
},
{
"name": "ioservice",
"fid": "0x7200000000000001:0x1d"
}
]
$ hctl fetch-fids --all --json
[
{
"name": "ssc-vm-c-1813.colo.seagate.com",
"svcs": [
{
"name": "confd",
"fid": "0x7200000000000001:0x9"
},
{
"name": "ioservice",
"fid": "0x7200000000000001:0xc"
},
{
"name": "ioservice",
"fid": "0x7200000000000001:0x1d"
}
]
},
{
"name": "ssc-vm-c-1898.colo.seagate.com",
"svcs": [
{
"name": "confd",
"fid": "0x7200000000000001:0x35"
},
{
"name": "ioservice",
"fid": "0x7200000000000001:0x38"
}
]
}
]
$ hctl fetch-fids -n ssc-vm-c-1813.colo.seagate.com --json
[
{
"name": "confd",
"fid": "0x7200000000000001:0x9"
},
{
"name": "ioservice",
"fid": "0x7200000000000001:0xc"
},
{
"name": "ioservice",
"fid": "0x7200000000000001:0x1d"
}
]
$ hctl node --help
hctl node [OPTIONS] COMMAND [ARGS]...
Options:
--verbose
--username TEXT
--password TEXT
--help Show this message and exit.
Commands:
maintenance Switch the cluster to maintenance mode.
shutdown Shutdown (power off) the node by name.
standby Put the given node into standby mode.
status Show status of all cluster nodes.
unmaintenance Move the cluster from maintenance back to normal mode.
unstandby Remove the given node from standby mode.
Note: error messages are sent to stderr and duplicated to journald logs.
- Use
--username
and--password
current Linux user credentials to authenticate inpcsd
Pacemaker daemon. These parameters can be helpful when the command is executed by a non-root user. This pair of options is optional; when omitted, no local authentication will be issued while communicating to Pacemaker.
Usage: hctl node maintenance [OPTIONS]
Switch the cluster to maintenance mode.
Options:
--all [required]
--timeout-sec INTEGER Maximum time that this command will wait for any
operation to complete before raising an error
--help Show this message and exit.
Puts the cluster into "smart maintenance" mode. This mode includes the following sequence:
- Disable STONITH resources. Wait until these resources are stopped (no longer than
timeout-sec
seconds). - Put all the nodes to 'standby' mode. Wait until all resources are stopped (no longer than
timeout-sec
seconds).
Note: If any of the steps fail, the cluster will remain in an unstable state: if STONITH resources are disabled, split-brain becomes a real risk. The user will need to issue hctl node unmaintenance --all
manually to return the cluster back to normal state (note that this command can lead to fencing).
Usage: hctl node shutdown [OPTIONS] NODE
Shutdown (power off) the node by name.
Options:
--timeout-sec INTEGER Maximum time that this command will wait for any
operation to complete before raising an error
--help Show this message and exit.
Powers off the node via IPMI interface (the corresponding IPMI address and credentials are taken from Pacemaker's STONITH resources). Shutdown procedure:
- Switch the node to standby (so that all the resources get stopped for sure)
- Once all resources are stopped, trigger shutdown.
Notes:
- Item [1] waits until all the resources are stopped for sure. It waits no more than
timeout-sec
seconds. Once timeout is exceeded, the tool exits with an exception and non-zero exit code. - As a result, if the resources take too much time to shutdown (by any reason), shutdown will not happen but the node will remain in 'standby' mode. The user will need to 'unstandby' the node manually in this case.
- If shutdown fails (e.g. due to IPMI failure), the node will also remain in 'standby' mode.
Usage: hctl node standby [OPTIONS] [NODE]
Put the given node into standby mode.
Options:
--all Put all the nodes in the cluster to standby mode (no node name is
required).
--help Show this message and exit.
Puts the node into standby mode. Note that the tool DOES NOT wait until all the resources are stopped at the given node and exits early.
Usage: hctl node status [OPTIONS]
Show status of all cluster nodes.
Options:
--full Show overall cluster status, so not only nodes will be included.
--help Show this message and exit.
Outputs status of cluster nodes in JSON format. Sample output:
$ hctl node status
[{"name": "smc7-m11", "online": true, "standby": false, "unclean": false, "resources_running": 3}, {"name": "smc8-m11", "online": false, "standby": false, "unclean": false, "resources_running": 3}]
$ hctl node status --full
{"resources": {"statistics": {"started": 6, "stopped": 0, "starting": 0}}, "nodes": [{"name": "smc7-m11", "online": true, "standby": false, "unclean": false, "resources_running": 3}, {"name": "smc8-m11", "online": false, "standby": false, "unclean": false, "resources_running": 3}]}
Usage: hctl node unmaintenance [OPTIONS]
Move the cluster from maintenance back to normal mode.
Options:
--all [required]
--timeout-sec INTEGER Maximum time that this command will wait for any
operation to complete before raising an error
--help Show this message and exit.
Disables "smart maintenance" mode. Steps:
- Revoke all the nodes from 'standby' mode. Wait until all resources are stopped (no longer than
timeout-sec
seconds). - Enable STONITH resources. Wait until these resources are running (no longer than
timeout-sec
seconds).
Note: This command can be used as a general way to return the cluster back to normal mode (so it can 'cure' the cluster after 'standby --all' or after an unsuccessful shutdown).
Usage: hctl node unstandby [OPTIONS] [NODE]
Remove the given node from standby mode.
Options:
--all Remove all the nodes in the cluster from standby mode (no node name
is required).
--help Show this message and exit.
Note: Similarly to hctl node standby
, this command exits early, i.e. it doesn't wait until the resources are started at the nodes that used to be in standby state.
Use this command to start Hare and Motr services on a node that was rebooted.
[root@ssc-vm-c-0552 cortx-hare]# hctl node-join --help
Usage: hare-node-join [<option>]... <CDF>
hare-node-join [<option>]... --conf-dir <dir>
Start and join a node with the cluster.
Positional arguments:
<CDF> Path to the cluster description file.
-c, --conf-dir <dir> Don't generate configuration files, use existing
ones from the specified directory.
--conf-create Re-create configuration on this node.
--consul-addr <consul-addr> Active Consul server address.
--consul-port <consul-port> Active Consul server port.
Options:
-h, --help Show this help and exit.
[root@ssc-vm-c-0553 cortx-hare]# hctl status
Profile: 0x7000000000000001:0x3d
Data pools:
0x6f00000000000001:0x3e
Services:
ssc-vm-c-0553.colo.seagate.com (RC)
[started] hax 0x7200000000000001:0x23 192.168.9.107@tcp:12345:1:1
[started] confd 0x7200000000000001:0x26 192.168.9.107@tcp:12345:2:1
[started] ioservice 0x7200000000000001:0x29 192.168.9.107@tcp:12345:2:2
[unknown] m0_client 0x7200000000000001:0x37 192.168.9.107@tcp:12345:4:1
[unknown] m0_client 0x7200000000000001:0x3a 192.168.9.107@tcp:12345:4:2
ssc-vm-c-0552.colo.seagate.com
[unknown] hax 0x7200000000000001:0x6 192.168.9.108@tcp:12345:1:1
[unknown] confd 0x7200000000000001:0x9 192.168.9.108@tcp:12345:2:1
[unknown] ioservice 0x7200000000000001:0xc 192.168.9.108@tcp:12345:2:2
[unknown] m0_client 0x7200000000000001:0x1a 192.168.9.108@tcp:12345:4:1
[unknown] m0_client 0x7200000000000001:0x1d 192.168.9.108@tcp:12345:4:2
[root@ssc-vm-c-0552 cortx-hare]# hctl node-join --conf-dir /var/lib/hare --consul-addr 192.168.9.107 --consul-port 8500
2020-09-18 10:10:52: Starting Consul server agent on this node.... OK
2020-09-18 10:10:54: Updating Consul agents configs from the KV store... OK
2020-09-18 10:10:54: Waiting for the RC Leader to get elected... OK
2020-09-18 10:10:55: Starting Motr (phase1, m0d)... OK
2020-09-18 10:10:59: Starting Motr (phase2, m0d)... OK
2020-09-18 10:11:02: Checking health of services... OK
[root@ssc-vm-c-0552 cortx-hare]# hctl status
Profile: 0x7000000000000001:0x3d
Data pools:
0x6f00000000000001:0x3e
Services:
ssc-vm-c-0553.colo.seagate.com (RC)
[started] hax 0x7200000000000001:0x23 192.168.9.107@tcp:12345:1:1
[started] confd 0x7200000000000001:0x26 192.168.9.107@tcp:12345:2:1
[started] ioservice 0x7200000000000001:0x29 192.168.9.107@tcp:12345:2:2
[unknown] m0_client 0x7200000000000001:0x37 192.168.9.107@tcp:12345:4:1
[unknown] m0_client 0x7200000000000001:0x3a 192.168.9.107@tcp:12345:4:2
ssc-vm-c-0552.colo.seagate.com
[started] hax 0x7200000000000001:0x6 192.168.9.108@tcp:12345:1:1
[started] confd 0x7200000000000001:0x9 192.168.9.108@tcp:12345:2:1
[started] ioservice 0x7200000000000001:0xc 192.168.9.108@tcp:12345:2:2
[unknown] m0_client 0x7200000000000001:0x1a 192.168.9.108@tcp:12345:4:1
[unknown] m0_client 0x7200000000000001:0x1d 192.168.9.108@tcp:12345:4:2
If a failed node is replaced with a fresh and there's a need to re-generate the configuration files.
[root@ssc-vm-c-0552 cortx-hare]# hctl node-join /tmp/ees-cluster.yaml --consul-addr 192.168.9.107 --consul-port 8500 --conf-create
2020-09-18 10:16:52: Generating node configuration... OK
2020-09-18 10:16:52: Starting Consul server agent on this node.... OK
2020-09-18 10:16:53: Updating Consul agents configs from the KV store... OK
2020-09-18 10:16:53: Waiting for the RC Leader to get elected... OK
2020-09-18 10:16:53: Starting Motr (phase1, m0d)... OK
2020-09-18 10:16:58: Starting Motr (phase2, m0d)... OK
2020-09-18 10:17:01: Checking health of services... OK
It is possible that a fresh node needs to run motr mkfs in case the storage was wiped of.
Executing hctl node-join command with --mkfs
options will re-intialise Motr storage for the given node.
[root@ssc-vm-c-0552 cortx-hare]# hctl node-join /tmp/ees-cluster.yaml --consul-addr 192.168.9.107 --consul-port 8500 --mkfs --conf-create
2020-09-18 11:02:32: Generating node configuration... OK
2020-09-18 11:02:33: Starting Consul server agent on this node.... OK
2020-09-18 11:02:35: Updating Consul agents configs from the KV store... OK
2020-09-18 11:02:35: Waiting for the RC Leader to get elected... OK
2020-09-18 11:02:35: Starting Motr (phase1, mkfs)... OK
2020-09-18 11:02:42: Starting Motr (phase1, m0d)... OK
2020-09-18 11:02:45: Starting Motr (phase2, mkfs)... OK
2020-09-18 11:02:52: Starting Motr (phase2, m0d)... OK
2020-09-18 11:02:55: Checking health of services... OK