Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tendrl need to support expand cluster functionality for gluster cluster #805

Closed
shtripat opened this issue Jan 11, 2018 · 11 comments
Closed

Comments

@shtripat
Copy link
Member

Expand cluster mechanism currently in tendrl is not very seamless.
User needs to un-manage the cluster and then extend the cluster by adding new peers. After this the cluster needs to be imported.

Ideally tendrl should be able to figure out new peers introduced in the cluster while cluster is being managed by tendrl. User should be prompted to accept the newly discovered host for monitoring / management through tendrl.

@shtripat
Copy link
Member Author

@julienlim
Copy link
Member

@shtripat @r0h4n @nthomas-redhat @brainfunked @mcarrano @Tendrl/tendrl-qe

With regards to expand cluster, why does a user have to accept newly discovered hosts for monitoring? If new volumes or bricks get added, does that mean user has to accept them too?

IMHO, cluster expansion or changes should automatically be detected without user having to do anything -- specifically, user should not have to unmanage the cluster, then reimport the cluster, in order for the new nodes/peers to be recognized.

Is there a way for automatic expansion in the Tendrl UI without the user having to unmanage and reimport the cluster? Meaning, all User does is add the peers/nodes to the Gluster cluster, then User adds the Tendrl agents onto the new nodes, and starts/enables the agents, generates relevant dashboards for the nodes. Tendrl detects and adds a notification about new nodes detected and now added for management under Tendrl. Those nodes should then automatically show up in cluster details, hosts list (for the cluster), etc. without further user intervention.

@shtripat
Copy link
Member Author

@julienlim my bad for using the word accept. Rather I meant was that user should be notified and he just acks the new nodes/peer in the cluster. Even today the volumes, brick additions etc are taken care automatically.

Regarding automatic expansion in tendrl, yes its not there at the moment and this issue is for tracking changes for the same feature.

@shtripat
Copy link
Member Author

Based on latest discussion below would be the flow for automatic expansion of the cluster

  1. User installs glusterfs on the fresh node and uses peer probe to add the node to the existing gluster cluster
  2. Using tendrl-ansible, the tendrl-node-agent gets installed on the storage node
  3. Step-2 makes sure node is known to tendrl now, as part usual sync, tendrl figures out that there is a new node in the cluster and starts the import cluster flow on the new node (only) to setup required components like collectd and tendrl-gluster-integration

Finally the new node is part of the new cluster now.

Note: As part detecting a new node in the cluster, the discovery logic needs changes to recalculate the clusterid (based on new set of peers and their pool-id from gluster) and update the cluster details.

@r0h4n @julienlim @nthomas-redhat @mbukatov kindly review and ack this.

@sankarshanmukhopadhyay
Copy link
Contributor

Is the import (new_node) flow enabled automatically (or, without admin intervention)?

@shtripat
Copy link
Member Author

@sankarshanmukhopadhyay the expansion of cluster in tendrl would be automatic, once node-agent gets installed on new node by tendrl-ansible.

@julienlim
Copy link
Member

Ack @shtripat

Just one thing that I mentioned in person, i.e. appropriate log messages (i.e. eventing / alerting) needs to be generated -- specifically to detect new node and provide an alert about it and tell user what they need to do, i.e. use tendrl-ansible to install tendrl agents on the node. As well, it should also provide an event for when the node begins to be under Tendrl management.

@r0h4n @nthomas-redhat @mbukatov

@shtripat
Copy link
Member Author

@julienlim Ack

@r0h4n r0h4n added this to the Milestone 3 (2018) milestone Feb 17, 2018
shtripat pushed a commit to shtripat/bridge_common that referenced this issue Feb 23, 2018
This flow would be invoked from tendrl-node-agent on the additionally
detected new peers of the cluster. New nodes would be provisioned with
monitoring etc and imported into the cluster in tendrl system.

tendrl-bug-id: Tendrl#805
Signed-off-by: Shubhendu <shtripat@redhat.com>
shtripat pushed a commit to shtripat/bridge_common that referenced this issue Feb 27, 2018
tendrl-bug-id: Tendrl#805
Signed-off-by: Shubhendu <shtripat@redhat.com>
@r0h4n
Copy link
Contributor

r0h4n commented Mar 7, 2018

This done on the backend as of Milestone 3 2018 and new nodes will be auto expanded (after peer probe and after running tendrl-ansible on new ndoes) until below TODOs are done

TODO

@Tendrl/api @Tendrl/frontend @Tendrl/ux please note

@julienlim
Copy link
Member

@r0h4n @Tendrl/api @Tendrl/frontend @Tendrl/ux

Design updates have been posted at https://redhat.invisionapp.com/share/8QCOEVEY9. See #849 (comment) for further details.

shtripat pushed a commit to shtripat/node_agent that referenced this issue Mar 16, 2018
The peers list in tendrl central store would take sometime to reflect
and initially the no od peers would be equal to no of managed nodes
of cluster. So using this condition would cause issue. Rather we should
check the no of managed nodes of cluster against actual peers list from
underlying cluster and once they are same raise the clearing alert.

tendrl-bug-id: Tendrl/commons#805
Signed-off-by: Shubhendu <shtripat@redhat.com>
shtripat pushed a commit to shtripat/node_agent that referenced this issue Mar 16, 2018
The peers list in tendrl central store would take sometime to reflect
and initially the no od peers would be equal to no of managed nodes
of cluster. So using this condition would cause issue. Rather we should
check the no of managed nodes of cluster against actual peers list from
underlying cluster and once they are same raise the clearing alert.

tendrl-bug-id: Tendrl/commons#805
Signed-off-by: Shubhendu <shtripat@redhat.com>
@r0h4n
Copy link
Contributor

r0h4n commented Mar 16, 2018

The new mockups are implemented.

Please comment if any mismatch

@r0h4n r0h4n closed this as completed Mar 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants