Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First draft of SPI specification #5035

Closed
skabashnyuk opened this issue May 10, 2017 · 15 comments
Closed

First draft of SPI specification #5035

skabashnyuk opened this issue May 10, 2017 · 15 comments
Assignees
Labels
kind/task Internal things, technical debt, and to-do tasks to be performed. status/open-for-dev An issue has had its specification reviewed and confirmed. Waiting for an engineer to take it.

Comments

@skabashnyuk
Copy link
Contributor

The goal of this issue is to provide a document with our current vision about changes in SPI, motivation about the things what we want to change, components interconnection, results of prototyping, vision on things to do. This is not a final specification, this document supposes to be a base for future issues and conversations.

@skabashnyuk skabashnyuk added kind/task Internal things, technical debt, and to-do tasks to be performed. sprint/next status/open-for-dev An issue has had its specification reviewed and confirmed. Waiting for an engineer to take it. team/platform labels May 10, 2017
@gorkem
Copy link
Contributor

gorkem commented May 10, 2017

@skabashnyuk we should have @l0rd as a reviewer on this.

@skabashnyuk
Copy link
Contributor Author

ok

@garagatyi
Copy link

garagatyi commented May 15, 2017

Motivation:

As stated in this issue we have several problems with current workspace infrastructure SPI. It is connected with the fact that it was designed when we had another vision of the product. Then our vision changed several times while we adapted system to new requirements. Also we met some technical issues that we didn’t foresee when we were designing it.

Now we struggle to continue to use the same SPI without redesigning it. So we decided to revise our requirements and redesign it.

Changes:

Basic idea is to make SPI less coupled and clean it up. More responsibilities will be shifted to SPI implementation side to not limit implementation and not try to invent one size fits all model of infrastructure implementation.

So Master will use set of SPI implementations and will communicate with them in such a way:
on workspace environment validation Master asks SPI to validate environment. So Master won’t know what is infrastructure and components (docker, k8s, containers, VMs, etc) are described in environment recipe.
on workspace environment start Master passes environment with all needed additional info to SPI and SPI decides how it should perform environment start.
SPI impl sends events about statuses of a component to Master and master passes it to interested clients.
SPI impl provides URL of a channel where output and other “heavy weight” events can be read to Master. Master provides it to clients and they may connect to it if they want. It is considered as good practice that Master should not receive events that can influence its load from SPI impl. But it’s up to impl whether this URL points to Master or dedicated component that can manage load from streaming logs from all environments.
Start of agents in environment is implementation specific. Master will provide complete configuration of agents in environment into SPI impl (including install scripts). Che will have default implementation of Install agent that starts other agents and notify interested sides about agent statuses and logs. SPI impl can use it or alternate agents start process.
To stop environment Master calls corresponding SPI impl. So environment stop is completely implementation specific.
Project files backup/restore is also implementation specific.
Concept of snapshot should be removed from API. To use/not use snapshots particular properties of workspace should be set. Then SPI impl use these properties to start environment with or without restoring previous state.
We are going to add ability to read environments from SPI impl. So Master can be restarted without stopping of environments. SPI implementation doesn’t have to support this feature as it is optional behavior.

Components interconnection:

Workspace start:
spi-startws-aftermeeting 1

Current state of POC:

We have started implementation of this spec to find out whether it simplifies implementation of SPI and check what disadvantages it has. So this will help us to not write complete specification that won’t satisfy us when its design is finished and implementation of it takes place.

We have branch that is being synced with master branch. Current state is deep work-in-progress. Work on Docker implementation is started and this impl starts environments already. No events/logs are produced by it for now.
One of the things we want to do is k8s implementation. It can help us to check whether SPI specification fits k8s and is not designed specifically for our default docker implementation.

@l0rd
Copy link
Contributor

l0rd commented May 15, 2017

Thanks @garagatyi for this explanation. Would it be possible to have more details about the new SPI you have envisioned? A high level class diagram before and after the new SPI may help understand if the k8s/Openshift connector will fit or not in the new model.

@TylerJewell
Copy link

TylerJewell commented May 15, 2017 via email

@benoitf
Copy link
Contributor

benoitf commented May 16, 2017

I've some questions related to this SPI and the default implementation:

  • Does it allow to have per environment multiple containers mounting the workspace user data (if implementation specific, what about the default implementation ?)
  • Are agents pluggable dynamically : can I enable an agent without having to restart a workspace (it says it's "implementation specific" but there will be an implementation for Che (let say for example I want to enable a LSP dynamically) (restart of server is not needed)
  • Can I share some agents between environments ? I may need to have only one agent per che instance (example : one LSP agent providing all completions for all my other workspaces) (if implementation specific, what will provide default impl ?)

@garagatyi
Copy link

Neither of these features are being targeted by new SPI.

@benoitf
Copy link
Contributor

benoitf commented May 16, 2017

so it's pure technical debt ? no architecture issues solved ?

@slemeur
Copy link
Contributor

slemeur commented May 16, 2017

Good work @garagatyi on documenting the thinking here.

That would be nice to see, as part of this document, the use cases that are addressed by the new SPI. The ones that are not possible today (or hard to achieve) and the new ones which will be enabled.

From reading the doc, it's understood that the new implementation will be better and cleaner than the current ones (which is excellent), but it is hard to figure out the benefits from use case perspective.

For example, you have some sample use cases that are described by @benoitf in his comment but there are probably other ones, which could be interesting to consider as part of the new SPI:

  • ability to dynamically add new machines, while an environment is already started

@skabashnyuk
Copy link
Contributor Author

but it is hard to figure out the benefits from use case perspective.

The main profit is that it will be possible to have Che SPI implementation based on Kubernetes/OpenShift, etc using some clean abstraction. Because for now frankly speaking can be implemented only with tons of hacks and workarounds.

@garagatyi
Copy link

garagatyi commented May 16, 2017

I closed issue since definition of done for it was description of our vision and what we want to achieve with new SPI.
We can still continue discussion of SPI design in comments or other meetings.
Apart from that described design is not final, so we might change it while we are working on POC.

That would be nice to see, as part of this document, the use cases that are addressed by the new SPI. The ones that are not possible today (or hard to achieve) and the new ones which will be enabled.

As Sergii said it's difficult to implement another infrastructure with current SPI.
For example Openshift implementation plays with replacing DockerConnector (which is Docker API client) and hide kubernetes/openshift infrastructure details behind it. The reason why they did so is that SPI is too Docker oriented, there is not enough abstractions to have completely different infrastructure implementations.

Another benefit is that it provides a way to improve scaling of the system where a lot of logs from machines/agents are produced. Now it is proxied through master and can lead to overload of the server and fast growth of memory consumption by master.

@gorkem
Copy link
Contributor

gorkem commented May 16, 2017

I agree the main benefit is to be able to implement Che to run on Openshift/Kubernetes. The Openshift implementation has reached to a level that it can not continue without replacing and replicating DockerContainer.
@garagatyi where should the discussion on this continue, there is a ton of feedback that you can probably receive from Openshift implementation.

@TylerJewell
Copy link

TylerJewell commented May 16, 2017 via email

@gorkem
Copy link
Contributor

gorkem commented May 16, 2017

@TylerJewell Can we expedite #5052 in that case? It appears that we will have to live with the OpenshiftConnector for a while.

@TylerJewell
Copy link

@gazarenkov

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/task Internal things, technical debt, and to-do tasks to be performed. status/open-for-dev An issue has had its specification reviewed and confirmed. Waiting for an engineer to take it.
Projects
None yet
Development

No branches or pull requests

8 participants