-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement recovery for Kubernetes/OpenShift infrastructures #5919
Comments
Depends on #7785 |
Today we have the following status on this issue: Looks like there is no an ability restore all running workspaces when tomcat is booted using only Kubernetes/OpenShift client and checking create objects on a cluster like a recovery is implemented in Docker Infrastructure. Another proposed way to recovery workspaces was implementing lazy recovery when each workspace will be recovered only when a workspace is requested by a client. In this case request for getting workspaces list (GET /api/workspace) would initiate several requests to K8s/OS cluster and it would increase response time. Because of that, it was decided not to implement it. So, it's needed to persist somewhere (like a database) metainformation of running workspaces for further recovery of them. Also, the scope of this issue was extended and it is required to make Kubernetes/OpenShift infrastructure ready for Rolling Update (Issue description is updated). In this case, recovery should be implemented in the following way:
Since there may be two running Che Server instances at the same time, it's not enough to rework infrastructure, because of Workspace API has own local cache. So Workspace API should be reworked to use local/(persistent or distributed) depending on configuration. More details about Workspace API and Kubernetes/OpenShift changes will be described soon. |
During Rolling Update at some period of time, there will be two instances of Che Server. Kubernetes/OpenShift infrastructure changesIt is proposed to implement OpenShift Recovery in the following way:
In this manner, OpenShift infrastructures will be synchronized on an old Che Server Pod and an updated One. Also, here is one more thing that should be covered properly, it’s servers readiness probes. It should not produce any issues if two Che Servers will do servers checks on RUNNING runtimes. But only one Che Server should perform initial servers checking on STARTING runtimes. Another Che Server should launch own servers checks only when runtimes become RUNNING. Workspace API changesAs about Workspace API is also should be patched a bit. It is required to synchronize workspace statuses cache in WorkspaceRuntimes between Che Servers instances. Looks like using distributed cache without persisting is enough. Because infrastructure will recover all persisted runtimes after Che Server start. Also, not to force users to reload a page, it's needed to sync between instances (maybe persist) JSON RPC subscribers. While Che Server shutdown it should have enough time to finish all workspace related operation, like STARTING or STOPPING of workspaces. Should be disabled a feature of stopping all workspaces(Workspace service termination) before a stop of the Che Server. Some aspects of Rolling Update and OpenShift Runtimes recovering may be missed, but I hope this information shows the plan how OpenShift going to be implemented. |
Created one more separated task that should be done for using Kubernetes/OpenShift recovering functionality. It is about WorkspaceServiceTermination adaptation #9317 |
Kubernetes/OpenShift workspaces are considered as stopped when workspace master is restarted.
It is needed to implement recovery for Kubernetes/OpenShift workspaces, so workspaces will be considered as running after the restart of the master.
Recovering should be adapted to Rolling Update of a workpace master. So, recovery workflow should look like:
The text was updated successfully, but these errors were encountered: