-
Notifications
You must be signed in to change notification settings - Fork 66
Launcher should not allow users to continue if environment(s) have reached resource quotas #3346
Comments
@mceledonia to clarify the intent here, should the Launcher block creation if either Stage or Run environments have reached resource quota? |
@andrewazores I was always under the assumption that this would be for both Stage and Run being full, but I think for this stage of implementation and for the sake of simplicity we should block for either. If in the future we do have the granular control to allow deployment into one stage or the other in this wizard I think we can change that but for now this seems to be the way to go. |
Please see my comments here: #3344 (comment) I do not feel this should be a p0. |
+1 to @qodfathr assessment that this is a SEV2, but not a P0. Removing label. |
@invincibleJai @arunkumars08 I guess this issue can be solved from client side. We just need to add a check for the available resources and used resources. We can use this API end point to get the metadata |
@sunilk747 agree it should be handled the client end, it needs some development effort and even the flow needs to be decided i.e when to show quota screen https://redhat.invisionapp.com/share/CTGJ6F7V4K9#/screens , it's more like a task to me not exactly an issue. Which we need to have in sprint plan. |
@sunilk747 you can use this service: https://github.com/fabric8-ui/fabric8-ui/blob/3683fb70017dd16a96951d259a08cf30698e09ea/src/app/space/create/deployments/services/deployment-api.service.ts This provides access to the backend WIT endpoints developed for the Deployments page. This way you don't need to reimplement proxy handling etc. since it is done by the backend. |
In particular you can get CPU and Memory usage vs quota by calling
|
@catrobson @bdellasc
Please correct me if i am wrong ? |
@jyasveer I believe this would work. For this use case, instead of seeing the actual "Create Space" overlay, showing this makes sense to me. |
I want to make sure we all understand and agree to the implications of this. In the free tier, very quickly, all users will essentially be at quota. So we agree that a good experience is that a very disruptive overall is presented to them every single time they try to create a new app. Even though there is nothing wrong with creating a new app -- the user will not be stuck in any way if they create a new app. The only thing that will "go wrong" is the deployment may fail (but may not -- this is highly situationally dependent). They will be able to edit the app in Che, create work items, etc. etc. Moreover, it is entirely possible that if the user is not at quota, but is close to quota, the app deployment will fail anyway, and we are not warning them in this case. Lastly, there are several ways a deployment may fail -- CPU, mem, disk, routes, services, etc. Are we committing to (1) ensure we are checking all of those, (2) keeping pace with new artefacts with limits in the future (e.g. stateful sets), and (3) keeping pace with changes to those limits (e.g. combing Run and Stage into one pool of resources rather than two independent pools). I remain greatly concerned that we are trying to solve the wrong problem, and do so in the wrong way. If we pursue this change as described, then I would like to see those in support of the change to actually use the product, with that change feature flag turned on, to do real work with the service for a few weeks and see if they still feel the same way about the proposal. We have to balance what is logical/analytical against the emotional response to the feel of the service. I believe this proposal is far too analytical and not considering enough the desired emotional experience. |
I agree @qodfathr - I think the blocking of creation is being put into place because we do not have the right information, actions, and capabilities built into the system today to gracefully handle how the system acts, and what the user can do about it, in the case that they go over quota. This is something that I think will be really important for us to focus on improving as soon as possible. I'm not convinced that as of today, not blocking users will result in a better experience than blocking them, only because recovery from an over quota experience seems very difficult to handle - the user can get into too much trouble without clear recovery paths. I'd like to talk about this as a team as soon as possible so we can consider what the least amount of changes required would be in order for the information, user recovery, and resource management to be good enough that we no longer need to block users in an attempt to protect. |
Is this a case where if the deployment fails we could warn them and suggest they try again later? If this was paired with some kind of alert on the system delivery side could we then clear the issue for them from the back? |
@catrobson can you please help me know if UX team is still working on the UXD for this ? |
My impression: reading last comments from @catrobson and @qodfathr , it sounds like we (UXD, Development and PM) need to look at the information, user recovery and resource management topics. Based on that conversation, we would try to figure out how to add or improve those so we don’t have to “block” them when maxed out on current resource usage. “Blocking” them feels like an interim solution, when we really need to improve those other supporting pieces. Since Summit was this week, I don’t think this is something UXD has taken up yet, as a number of key stakeholders have been focused on Summit. If this is the path we agree needs to be taken, we’ll need to lay some groundwork to create stories/issues in order to begin fielding the wider topics mentioned above. We might need to stick with a simple “block” solution (+ some basic mechanisms to get unblocked) for now because the issues above might take some time to work through. If we run with the interim “block” solution now, keep in mind that we need to address those other areas as a more permanent solution. I’m not sure if @catrobson and @qodfathr would be alright with this approach and I don't want to speak for them. So, bottom line is - we do intend on fielding the wider issues with the rest of the team, but haven’t yet because of Summit this week. |
I think that as a general rule, if we block a user from performing a task, we have to also provide them with an easy path to get un-blocked. Blocking a user from performing a task without providing a path forward/around the block is a dead-end situation. Thx! |
@ldimaggi i agree with you. |
If this is critical to have something implemented ASAP, it might make sense to arrive at a very basic solution where we block the user, but give them that path to get unblocked, so we don't end up the with dead end that @ldimaggi mentioned. It may not be an ideal flow, but would serve the purpose for now. Then, we can take a wider look at a better, more permanent solution for the longer term? I think these are our options going forward:
I suspect the second option is the least painful for our users...something that is potentially clunky, but doesn't dead-end them right now. |
My understanding of this is that it was a temporary, first-step of a multi step implementation to solve the problem. I'm assuming (let me know if I'm wrong) that the dead end is remedied with a direct link to a page that allows the user to scale down, delete, or both... However there are a couple issues with this which need to be solved which is why we are implementing only the first increment (blocking the user and using language to direct them to a solution)
So we need to figure out what the best next step is for the user. Because we don't have the ability to upgrade the account we need to offer a solution that exists within the current resource limitations.
Those are only a few ideas off of the top of my head, it's Friday and almost lunch time, but all that to say that I understood this solution as an incremental first-step towards a better solution with the assumption that scaling up/down or deleting was the only path forward for the user. That assumption sounds like it's not be correct, as we can defer the blocking of the user to later when a deployment fails (if it does). It sounds like there needs to be more discussion around those two options and the possible solutions that branch from them. I agree with @qodfathr that we need to consider the balance between emotional and analytical responses, and with @catrobson that this needs more discussion with the larger team. |
I would say it would be good if we give users
That way, openshiftio.io is not blocking its users an d also warn users that his deployment may fail and also display the action points to counter that. |
Being out of quota (which has something like 5 dimensions -- disk,cpu, ram, services, routes, others?) should be something known to users at all times, in a non-intrusive way. Perhaps a warning icon in the topnav, which, when clicked, gives details plus links to how to remediate. But even as a "short term" solution, I wouldn't want us to proceed until all quota dimensions are known, documented and we've got the APIs to properly articulate them. Moreover, we cannot hard-code the quota limits into the front-end logic -- e.g, don't query to find out there are 5 routes in use and assume that the user cannot create a 6th. The API must be make this determination in conjunction with the backend. Assume the answer is potentially different for every single user. (i.e. a short-term solution is not an excuse to add to technical debt.) Being out of quota may impact users in many ways, but completing the Launcher is not one those ways, so solving it in the Launcher I still argue is the wrong approach. Rather, if the user knows that they are out of quota before they even start the Launcher, I feel this is a better overall experience. An unobtrusive, non-blocking topnav warning is one way to achieve that goal. (Read: the user sees the warning icon and elects to do something about it before clicking The primary impact of quota maximization is that app deployment fails. So, as a longer-term goal, IMHO, this is where we should look to improve the experience. e.g., rather than failing the pipeline, put it in a warning state requiring input (much like the Promotion step we have today). The two options could be "try again" or "cancel." The idea being that if the deployment fails due to some quota limitation (or really for any reason), the user is given the option to go correct the cause of the deployment failure and then try the deployment again (without having to rerun the whole pipeline). Taken further, imagine if the user could redirect the deployment elsewhere -- e.g. "oh, yeah, my free Stage env is full or too small to ever run that app, but let's deploy it to OSO Pro instread." A quick edit to the pipeline (perhaps via a GUI) and a click of "Try Again" and the user is back on track. Related to all of this, pipelines that are in any "input required" state probably should also have some sort of non-blocking notification to the user (e.g. an icon in the topnav). Clicking would give details and a link to the awaiting pipeline. This helps to address the primary impact of failed deployments (assuming that the pipeline does not fail but rather goes into the remediation state as described above). Moreover, this would help address a concern whereby a pipeline is not starting in Space |
+1 to Todd's description of handling a deployment that would fail due to a lack of resources. Instead of blocking a user, I think it's better to inform the user, and provide the user with a path forward. Also +1 to Todd's recommendation of our having a clear definition of all the dimensions of a user being out of quota (we've been focusing on a subset of these in our discussions) AND to our providing easy to understand quota information to our users. |
@pradeepto Please see Todd's comments above. As a part of long term approach instead of pipeline failure in case of resource quota limit exceed in openshift, can we give users a warning message with the option to go correct the cause of the deployment failure in the pipeline build level. Can you analyse the possibility from the build team as a future process improvement? |
@krishnapaparaju As discussed, We need an endpoint in the jenkins proxy layer, which will call the Openshift API to fetch the usage quota details and show the user two options which could be "try again" or "cancel" in the pipeline screen before it starts deployment in the openshift. |
@andrewazores @animuk @jyasveer Is there anything |
@piyush1594 As discussed, If you already have that endpoint in the jenkins proxy layer, then use that to prompt the user "try again" and/or "cancel" in the pipeline screen before it starts deployment in the openshift. |
@GeorgeActon @animuk just wanted to understand how much has been done and what is still needed to make this feature reality. /cc @slemeur |
@bartoszmajsak this has been assigned to build team before start of J train. Can you ask them? |
Parent: #3344
Task
As seen in design link in parent issue, the Launcher should display an overlay and block the user from continuing to create a new application if the user has insufficient available resources for the application to be deployed.
The text was updated successfully, but these errors were encountered: