Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with automatic expansion of cluster #849

Closed
r0h4n opened this issue Mar 6, 2018 · 12 comments
Closed

Problem with automatic expansion of cluster #849

r0h4n opened this issue Mar 6, 2018 · 12 comments
Assignees

Comments

@r0h4n
Copy link
Contributor

r0h4n commented Mar 6, 2018

As Milestone 3 2018 is being dev tested, Some workflow issues have come up for auto expansion of cluster. Here is the current workflow

Steps for auto expand cluster by User

  1. Create new storage nodes for expansion
  2. Install Gluster and peer probe the nodes into the existing cluster
  3. Run tendrl-ansible on new nodes and install tendrl-node-agent
  4. Existing tendrl-node-agents on old nodes discover the new tendrl-node-agents and expand the cluster automatically.

Problems:

  • In-case the user does step 1 and 2 and does not run tendrl-ansible, this leads to a weird situation where gluster get-state and existing cluster (gluster-integration) know about the new nodes and their volumes (but not bricks since bricks are node local data) but cant do much since tendrl-ansible is not run on new nodes and there's no tendrl-node-agents on new nodes yet

  • The user can choose to keep new nodes peer probed but unmanaged via Tendrl and not make it part of the Tendrl managed cluster, do we want to provide such freedom? If yes, who/what triggers the expand cluster then.

Solution:
-After step 1, 2, 3, If we agree to let users decide if they want auto expand, Then on the UI host list, we need to provide expand/import button for each such unmanaged new node listed there

  • If we want to provide auto expand feature, then our documentation must clearly point out that tendrl-ansible must be run on the node as soon as the node is peer probed.

@Tendrl/admins @shtripat @julienlim @Tendrl/qe Thoughts?

mbukatov added a commit to mbukatov/tendrl-ansible that referenced this issue Mar 6, 2018
Expand cluster section added based on suggestion from fbalak.

The procedure described in this commit is based on ongoing specification
for cluster expansion:

Tendrl/specifications#254

That said, the expand procedure is not fully finished at the time of
writing this commit, see also:

Tendrl/commons#849
@shtripat
Copy link
Member

shtripat commented Mar 6, 2018

As discussed in arch call today, I am in favor of an option in UI to show the newly available peers and ask for acknowledgment from user to accept and expand the cluster. @julienlim please add details for UX as and when ready with recommendations.

[1] https://github.com/Tendrl/documentation/wiki/Architecture-Meeting-Notes#6-march-2018

@julienlim
Copy link
Member

julienlim commented Mar 6, 2018

@r0h4n @shtripat @mbukatov @nthomas-redhat @gnehapk @Tendrl/qe @mcarrano @jjkabrown1

I've summarized below my thoughts on the Expand Cluster End-to-End Workflow (which include what the impact is to the UI. We'll be making UX design updates accordingly):

  1. User installs glusterfs on the fresh node and does a peer probe to add the node to the existing gluster cluster (and anything else related to expansion from gluster perspective).
  2. Tendrl generates alert informing user of the new node(s) detected, and to use tendrl-ansible to install tendrl agents on the node.
  3. Using tendrl-ansible, the tendrl-node-agent gets installed on the storage node. Once Tendrl detects that tendrl-node-agents are on the new nodes, the alert created in Remove salt package from requirements file. #2 should be cleared.
  4. In Tendrl UI, whenever there are unmanaged hosts that have tendrl-node-agent running on them, a single button will be shown to import the new nodes to the existing cluster.
    • The Button is a global action that would be performed on 1 or more unmanaged nodes within a cluster under Tendrl monitoring.
    • The Button can be triggered from the Cluster List and Host List for the given cluster. Once button is clicked, user is shown the “new” nodes that will be imported and generate the related Task List, that user can see details for while the task is in progress.
    • During the import/expansion, the cluster should show an intermediate or pending state. Disable Profiling and Unmanage actions should not be permitted during this time.
    • Task should generate an event in the UI when tasks completes to indicate it completed or failed.
  5. If user does not click aforementioned button, or while under import, the following views are impacted:
    • Cluster List — cluster status shows pending, message to show “Expansion in progress View Details”
    • Hosts List - need column to indicate if the host is managed or not, bricks column will not have an indicator for the unmanaged host (should be N/A or pending updates)
    • Volumes List — Bricks column will not be complete until import has completed and probably needs to reflect that it’s N/A or pending updates
    • Volume > Brick Details — this information will be either incomplete or non-existent, and we may need to show a message (inline notification) to indicate data may be out-of-synch or pending updates
    • Any data field that is unavailable should say N/A
  6. Once expansion completes successfully, all actions that were disabled need to be restores and all list views and bricks view should operate like normal (i.e. after a successful import), and statuses updated accordingly. An event should indicate that the cluster expansion was successful (can be part of the task).
  7. If expansion failed, an alert needs to be triggered and shown so user can take action. User needs to be able to view details and have ability to Unmanage (Unmanage button needs to be enabled again). This alert clears when cluster is successfully imported.

Related issues:

Thoughts?

@gnehapk
Copy link
Member

gnehapk commented Mar 7, 2018

@mbukatov @julienlim

  • Is the view cluster detail action(click on cluster name to view cluster detail pages) also allowed for a cluster when it is in expanding state as the data for the expanding cluster will have inconsistent values? As per the mockup, the cluster name is hyperlinked.
  • Where the task detail of expand cluster will be shown - global task list view or cluster specific task list view?

@gnehapk
Copy link
Member

gnehapk commented Mar 7, 2018

@shtripat @r0h4n @nthomas-redhat

  • Import status and button are shown when is_managed property is set to no in the API response.
  • Dashboard button is shown when is_managed property is set to yes in the API response.
  • UI read cluster's status from globaldetails.status property which can be healthy and unhealthy.

Please let me know if you need any other info.

@shtripat
Copy link
Member

shtripat commented Mar 7, 2018

From the backend perspective, cluster would contain a field state which would be set as expand_pending if there are new nodes in cluster which are not yet managed. So in UI if cluster.state == "expand_pending" enable the action Expand Cluster. Once expansion done state would be set as empty string and cluster.current_job would contain a completed expand cluster job.

@julienlim regarding alerts and their clearing I am in agreement.

@mcarrano
Copy link

mcarrano commented Mar 7, 2018

New mockups have been added on InVision to reflect the UI approach outlined by @julienlim above. See https://redhat.invisionapp.com/share/8QCOEVEY9

This includes updated to the Clusters view to allow for triggering the expansion and changes to the host view to include Unmanaged Hosts in the list as well as a global action to trigger cluster expansion from this view. We also added an alert to the Volume (Brick) details view to warn the user that brick details may be incomplete while expansion is in progress.

Note that triggering the expansion should display a modal to confirm before initiating the task. I don't have that in these wireframes but can follow up with details. Let me know if you have any questions.

@gnehapk
Copy link
Member

gnehapk commented Mar 8, 2018

@r0h4n @julienlim @nthomas-redhat @shtripat Will the read only users be allowed to click Expand action?

@r0h4n
Copy link
Contributor Author

r0h4n commented Mar 8, 2018

@gnehapk read only users should not be allowed to click Expand action

@gnehapk
Copy link
Member

gnehapk commented Mar 8, 2018

@julienlim @mcarrano If the host is in importing state[1], whether user should allow to see the host detail views? IMO we shouldn't allow this as data will be inconsistent and some of the entities might not be available at that moment.

  1. https://redhat.invisionapp.com/share/8QCOEVEY9#/screens/283620800

@shtripat
Copy link
Member

shtripat commented Mar 8, 2018

To summarize and having dev (backend + UI) aspect the flow could looks as below

  1. User imports an existing cluster in tendrl
  2. On new nodes glusyerfs gets installed and these nodes are peer probe'd in existing gluster cluster. At this point tendrl understands that there is a difference in detected cluster-id (because new peers in list) and raises an alert saying the same with asking user to run tendrl-ansible for new nodes in the system. At this stage Cluster.status remains empty only as actually expand is not pending because tendrl-ansible needs to be run
  3. Now once user runs tendrl-ansible for new nodes, the alert created in stpe-2 gets cleared and Cluster.status value is set as expand_pending. Based on this flag in cluster objects UI should enable the Expand Cluster button
  4. If user selects the option Expand Cluster, the value of Cluster.status changes to expanding with Cluster.current_job with a job detail for ExpandClusterWithDetectedPeers
  5. If the job succeeds, Cluster.status goes back to empty and current_status reflects the proper status for the job
  6. If job in step-4 fails, tendrl should raise an alert saying expand job for new nodes failed. This should suggest the user to un-manage and then import the cluster afresh (at the moment this would be the way). This process of unmanage + import would clear the alert generated reporting failure of expansion job

Hope this makes clearer for UI as what would be available from backend and how different status values of cluster to be taken care for showing buttons for cluster expansion.

@gnehapk ^^^

@mcarrano
Copy link

mcarrano commented Mar 8, 2018

@gnehapk I've made 2 updates to the wireframe deck. I've added a confirmation dialog to be displayed after the user clicks Expand and before the task is initiated. You can see that here: https://redhat.invisionapp.com/share/8QCOEVEY9#/283795644_Clusters-Confirm_Expansion The same dialog should appear whether Expand is triggered from the Clusters or Hosts view.

I also saw your question above about Host details for unmanaged hosts. You are right, we should not be able to see details for these hosts until import is complete. I have updated the wireframes to remove details links from these entries.

Let me know if you have any questions.

@r0h4n
Copy link
Contributor Author

r0h4n commented Mar 16, 2018

The new mock ups and the backend is implemented on master branches.

Please comment if any mismatch

@r0h4n r0h4n closed this as completed Mar 16, 2018
@r0h4n r0h4n added this to the Milestone 4 (2018) milestone Mar 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants