Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added specification for expansion of cluster in tendrl post detection #254

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

shtripat
Copy link
Member

@shtripat shtripat commented Jan 22, 2018

Signed-off-by: Shubhendu shtripat@redhat.com
tendrl-bug-id: #253

@shtripat
Copy link
Member Author

@r0h4n @nthomas-redhat @julienlim @mbukatov plz check and ack.
@mbukatov I have left place holders for adding tendrl-ansible details. Olz check and comment accordingly so that same can be added.

@shtripat shtripat changed the title Added specification of detection and expansion if cluster in tendrl Added specification of detection and expansion of cluster in tendrl Jan 22, 2018
@julienlim
Copy link
Member

julienlim commented Jan 22, 2018

@r0h4n
Copy link
Contributor

r0h4n commented Jan 29, 2018

Please separate the detection and expand spec

@julienlim
Copy link
Member

julienlim commented Jan 29, 2018

@shtripat @mbukatov @r0h4n @nthomas-redhat

In reviewing 50fca31, here are some things to consider:

(1) what happens if tendrl-ansible fails to get the tendrl agent(s) -- i.e. node, collectd, gluster-integration -- properly installed and configured? Can we add some mention about how this is tackled, even if it's a small mention that this is all handled at the command line. [Basically a spec should account for error-handling to some degree.]

(2) can one assume all this gets logged somewhere in some tendrl log file(s) -- whether successful or failed so one can troubleshoot?

(3) The "Testing" section in the document I assume refers to the "Acceptance Criteria" should include some mention that the cluster's host list in the Tendrl UI as well as Grafana will reflect the additional nodes/hosts as well as related dashboard(s).

(4) Can we also mention if that this can be detection for expansion of 1 or more nodes and that is not limited to only 1 at a time, as the testing (or Acceptance Criteria) will need to allow for doing n number of nodes that get added to the Gluster trusted storage pool, i.e. work for 1 as well as work for multiple nodes.

(5) Acceptance Criteria should also mention if the Inventory file is the one updated or not (as I couldn't tell if we were going with the Inventory file or with something in-memory, the alternate option), but that could be because the implementation is not yet determined.

There were some minor typos but that can be easily fixed when the spec gets updated.

@shtripat
Copy link
Member Author

@r0h4n regarding Please separate the detection and expand spec, I would start a separate specification for detection of new nodes in the gluster cluster.

@julienlim

(1) what happens if tendrl-ansible fails to get the tendrl agent(s) -- i.e. node, collectd, gluster
integration -- properly installed and configured? Can we add some mention about how this is
tackled, even if it's a small mention that this is all handled at the command line. [Basically a spec
should account for error-handling to some degree.]

If tendrl-node-agent not found, ideally tendrl-ansible being a command line tool should report with detailed error. I would suggest a separate specification under tendrl-ansible for this. @r0h4n ??
The installation of collectd is as dependency with tendrl-node-agent. The installation of tendrl-gluster-integration is part of import cluster flow. So when import cluster would get executed for the new node in cluster, it would be installed. If there are errors around installation of tendrl-gluster-integration, that would be logged as part of flow. I can add some details like this in specification.

(2) can one assume all this gets logged somewhere in some tendrl log file(s) -- whether
successful or failed so one can troubleshoot?

Anything related to tendrl-ansible should ideally with taken care as part of separate specification under tendrl-anisble. The details while import cluster flow would be updated in this specification.

(3) The "Testing" section in the document I assume refers to the "Acceptance Criteria" should
include some mention that the cluster's host list in the Tendrl UI as well as Grafana will reflect the >> additional nodes/hosts as well as related dashboard(s).

Ack. I would add this acceptance criteria

(4) Can we also mention if that this can be detection for expansion of 1 or more nodes and that is >> not limited to only 1 at a time, as the testing (or Acceptance Criteria) will need to allow for doing n >> number of nodes that get added to the Gluster trusted storage pool, i.e. work for 1 as well as work >> for multiple nodes.

Ack. I will add this acceptance criteria.

(5) Acceptance Criteria should also mention if the Inventory file is the one updated or not (as I
couldn't tell if we were going with the Inventory file or with something in-memory, the alternate
option), but that could be because the implementation is not yet determined.

Ack. At the moment we would go ahead with inventory file and manual execution of tendrl-ansible from command line. So its like admin first modifies the inventory file for tendrl-ansible with entries for additional nodes and then runs it. This installs tendrl-node-agent on the new nodes and configures them to report to the existing tendrl server. Once this step gets over, automatic detection of nodes in tendrl server and then running import cluster flow for these additional nodes would be taken care by tendrl-server's node-agent service. I will add an acceptance criteria around inventory file of tendrl-ansible.

@julienlim
Copy link
Member

@shtripat Ack your comments. I'll assume you'll make the appropriate updates to the spec.

With regards to the following statement:
"...So its like admin first modifies the inventory file for tendrl-ansible with entries for additional nodes and then runs it...."

I know if user created his/her own inventory file, probably user has to update it. However, if user is using the default /etc/ansible/hosts, should we consider adding the new entries directly there and save user the effort. This is probably a nice-to-have since we don't know yet how often users are creating his/her own inventory file or using the default Ansible hosts file.

@shtripat
Copy link
Member Author

@julienlim agree. At this moment we expect user to manually change the inventory file and run tendrl-ansible.

@shtripat
Copy link
Member Author

shtripat commented Feb 5, 2018

@julienlim @r0h4n added another issue #257 for specifying the detection part of new nodes in gluster cluster. @r0h4n as discussed please take care of this part of specification under a separate spec.

@shtripat
Copy link
Member Author

shtripat commented Feb 5, 2018

Updated the PR with more expected behavior details.

@r0h4n r0h4n added this to the Milestone 3 (2018) milestone Feb 17, 2018
@shtripat shtripat requested a review from a team as a code owner February 20, 2018 14:36
@shtripat shtripat changed the title Added specification of detection and expansion of cluster in tendrl Added specification for expansion of cluster in tendrl post detection Feb 20, 2018
Shubhendu added 2 commits February 20, 2018 20:26
tendrl-bug-id: Tendrl#253
Signed-off-by: Shubhendu <shtripat@redhat.com>
tendrl-bug-id: Tendrl#253
Signed-off-by: Shubhendu <shtripat@redhat.com>
@a2batic
Copy link
Member

a2batic commented Feb 28, 2018

As per suggestions from @shtripat @r0h4n, the expansion of cluster will take place when cluster is in managed state and can be indicated as follows:

  • When expansion is in progress, "Expanding Cluster" can be shown as message and other operations like import,Dashboard, unmanage can be blocked
  • When expansion fails for some reason, it can be shown similar to import or unmanage fail (View Details link).

@julienlim @mcarrano @nthomas-redhat , what are you thoughts? Please update the design for the same.

@julienlim
Copy link
Member

julienlim commented Feb 28, 2018

@a2batic @shtripat @nthomas-redhat @r0h4n @Tendrl/qe

  • If expansion fails, it needs to be logged and alert shown (and cleared when no longer an issue).
  • During expansion, how does user know about it or why certain operations are blocked?
  • Specifically, which operations are blocked, e.g. Import, Unmanage, Enable/Disable Volume Profiling?
  • Should Dashboard should be blocked? Does expansion impact existing collectd operations?
  • We need to indicate something's going on with the cluster (preventing aforementioned operations), and not allow for those actions to occur.
  • During expansion, is there some kind of Task (along with Task Details) that is created that is accessible to the user?
  • During cluster expansion, does cluster status become stale/unknown/other or is it still being refreshed?

Thoughts?

@a2batic - We'll look into making design updates.

@shtripat
Copy link
Member Author

shtripat commented Mar 1, 2018

@julienlim @a2batic @gnehapk During expansion for the cluster, we should show the message as Expanding with a View details link to show the job details.

Also the kebab menu options like Enable/Disable profiling and Unmanage should be disabled for the cluster.

In case there is a failure in expansion, there would be event logged from backend. Also the View Details link should remain and link to task failed details. In case of success the job link vanishes and message says ready to use as usual for the cluster.

Thoughts??

@shtripat
Copy link
Member Author

shtripat commented Mar 1, 2018

Also while expansion of cluster going on the clutsrer-id should not be clickable and preferably a different icon for cluster status required. @julienlim ?

@ltrilety
Copy link

ltrilety commented Mar 1, 2018

If I read the specification correctly tendrl notice a new node automatically and via some notification requires to run tendrl-ansible, right?
Has anyone thought about the opposite scenario, when a node is removed from gluster?

@shtripat
Copy link
Member Author

shtripat commented Mar 1, 2018

Lubos, so this spec talks about only expansion scenario and shrink part is not in the scope of this.

@mcarrano
Copy link

mcarrano commented Mar 1, 2018

@a2batic @julienlim @gnehapk @shtripat We've added a new wireframe to reflect what this should look like in the UI while the expansion is taking place. This approach can apply to expansion or any similar operation. See https://redhat.invisionapp.com/share/8QCOEVEY9#/282443505_Clusters-Update_In_Progress

In summary:
*The status icon should change to 'pficon-in-progress'

  • The message field should display "Expansion in progress" with a link to view task details.
  • Further actions (other than launching the dashboard) are disabled.

Let me know if you have any questions.


== References:

* <TBD> link to tendrl-ansible issue to track the changes for this support
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to add a short description of this use case into README file during work on Tendrl/tendrl-ansible#82

@gnehapk
Copy link
Member

gnehapk commented Mar 6, 2018

@mbukatov @julienlim Is the view detail action(click on cluster name to view cluster detail) also allowed for expanding cluster? As per the mockup, the cluster name is hyperlinked.

mbukatov added a commit to mbukatov/tendrl-ansible that referenced this pull request Mar 6, 2018
Expand cluster section added based on suggestion from fbalak.

The procedure described in this commit is based on ongoing specification
for cluster expansion:

Tendrl/specifications#254

That said, the expand procedure is not fully finished at the time of
writing this commit, see also:

Tendrl/commons#849
@julienlim
Copy link
Member

julienlim commented Mar 7, 2018

@gnehapk @a2batic @shtripat @nthomas-redhat @r0h4n @Tendrl/qe @mcarrano

Design updates have been posted at https://redhat.invisionapp.com/share/8QCOEVEY9. See Tendrl/commons#849 (comment) for further details.

@gnehapk During expansion, details are still viewable. For views where there may be out-of-synch data, we show an inline notification -- see https://redhat.invisionapp.com/share/8QCOEVEY9#/screens/283595310 example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants