First draft of SPI specification #5035

skabashnyuk · 2017-05-10T12:39:08Z

The goal of this issue is to provide a document with our current vision about changes in SPI, motivation about the things what we want to change, components interconnection, results of prototyping, vision on things to do. This is not a final specification, this document supposes to be a base for future issues and conversations.

gorkem · 2017-05-10T17:17:21Z

@skabashnyuk we should have @l0rd as a reviewer on this.

skabashnyuk · 2017-05-10T19:04:39Z

ok

garagatyi · 2017-05-15T14:53:26Z

Motivation:

As stated in this issue we have several problems with current workspace infrastructure SPI. It is connected with the fact that it was designed when we had another vision of the product. Then our vision changed several times while we adapted system to new requirements. Also we met some technical issues that we didn’t foresee when we were designing it.

Now we struggle to continue to use the same SPI without redesigning it. So we decided to revise our requirements and redesign it.

Changes:

Basic idea is to make SPI less coupled and clean it up. More responsibilities will be shifted to SPI implementation side to not limit implementation and not try to invent one size fits all model of infrastructure implementation.

So Master will use set of SPI implementations and will communicate with them in such a way:
on workspace environment validation Master asks SPI to validate environment. So Master won’t know what is infrastructure and components (docker, k8s, containers, VMs, etc) are described in environment recipe.
on workspace environment start Master passes environment with all needed additional info to SPI and SPI decides how it should perform environment start.
SPI impl sends events about statuses of a component to Master and master passes it to interested clients.
SPI impl provides URL of a channel where output and other “heavy weight” events can be read to Master. Master provides it to clients and they may connect to it if they want. It is considered as good practice that Master should not receive events that can influence its load from SPI impl. But it’s up to impl whether this URL points to Master or dedicated component that can manage load from streaming logs from all environments.
Start of agents in environment is implementation specific. Master will provide complete configuration of agents in environment into SPI impl (including install scripts). Che will have default implementation of Install agent that starts other agents and notify interested sides about agent statuses and logs. SPI impl can use it or alternate agents start process.
To stop environment Master calls corresponding SPI impl. So environment stop is completely implementation specific.
Project files backup/restore is also implementation specific.
Concept of snapshot should be removed from API. To use/not use snapshots particular properties of workspace should be set. Then SPI impl use these properties to start environment with or without restoring previous state.
We are going to add ability to read environments from SPI impl. So Master can be restarted without stopping of environments. SPI implementation doesn’t have to support this feature as it is optional behavior.

Components interconnection:

Workspace start:

Current state of POC:

We have started implementation of this spec to find out whether it simplifies implementation of SPI and check what disadvantages it has. So this will help us to not write complete specification that won’t satisfy us when its design is finished and implementation of it takes place.

We have branch that is being synced with master branch. Current state is deep work-in-progress. Work on Docker implementation is started and this impl starts environments already. No events/logs are produced by it for now.
One of the things we want to do is k8s implementation. It can help us to check whether SPI specification fits k8s and is not designed specifically for our default docker implementation.

l0rd · 2017-05-15T16:51:37Z

Thanks @garagatyi for this explanation. Would it be possible to have more details about the new SPI you have envisioned? A high level class diagram before and after the new SPI may help understand if the k8s/Openshift connector will fit or not in the new model.

TylerJewell · 2017-05-15T16:57:57Z

It will be provided in a couple weeks. [Tyler Jewell - Contact Using Hop](http://GetHop.com/?_hmid=1494867472) On May 15, 2017 at 16:51 GMT, Mario Loriedo <notifications@github.com> wrote: Thanks [@garagatyi](https://github.com/garagatyi) for this explanation. Would it be possible to have more details about the new SPI you have envisioned? A high level class diagram before and after the new SPI may help understand if the k8s/Openshift connector will fit or not in the new model. — You are receiving this because you are subscribed to this thread. Reply to this email directly, [view it on GitHub](#5035 (comment)), or [mute the thread](https://github.com/notifications/unsubscribe-auth/AAX9CtgBKNt5uo3uYA4HVj9Iud-b-tFnks5r6IKdgaJpZM4NWmib).

benoitf · 2017-05-16T12:09:51Z

I've some questions related to this SPI and the default implementation:

Does it allow to have per environment multiple containers mounting the workspace user data (if implementation specific, what about the default implementation ?)
Are agents pluggable dynamically : can I enable an agent without having to restart a workspace (it says it's "implementation specific" but there will be an implementation for Che (let say for example I want to enable a LSP dynamically) (restart of server is not needed)
Can I share some agents between environments ? I may need to have only one agent per che instance (example : one LSP agent providing all completions for all my other workspaces) (if implementation specific, what will provide default impl ?)

garagatyi · 2017-05-16T13:42:14Z

Neither of these features are being targeted by new SPI.

benoitf · 2017-05-16T13:47:03Z

so it's pure technical debt ? no architecture issues solved ?

slemeur · 2017-05-16T13:49:31Z

Good work @garagatyi on documenting the thinking here.

That would be nice to see, as part of this document, the use cases that are addressed by the new SPI. The ones that are not possible today (or hard to achieve) and the new ones which will be enabled.

From reading the doc, it's understood that the new implementation will be better and cleaner than the current ones (which is excellent), but it is hard to figure out the benefits from use case perspective.

For example, you have some sample use cases that are described by @benoitf in his comment but there are probably other ones, which could be interesting to consider as part of the new SPI:

ability to dynamically add new machines, while an environment is already started

skabashnyuk · 2017-05-16T13:57:06Z

but it is hard to figure out the benefits from use case perspective.

The main profit is that it will be possible to have Che SPI implementation based on Kubernetes/OpenShift, etc using some clean abstraction. Because for now frankly speaking can be implemented only with tons of hacks and workarounds.

garagatyi · 2017-05-16T14:10:43Z

I closed issue since definition of done for it was description of our vision and what we want to achieve with new SPI.
We can still continue discussion of SPI design in comments or other meetings.
Apart from that described design is not final, so we might change it while we are working on POC.

That would be nice to see, as part of this document, the use cases that are addressed by the new SPI. The ones that are not possible today (or hard to achieve) and the new ones which will be enabled.

As Sergii said it's difficult to implement another infrastructure with current SPI.
For example Openshift implementation plays with replacing DockerConnector (which is Docker API client) and hide kubernetes/openshift infrastructure details behind it. The reason why they did so is that SPI is too Docker oriented, there is not enough abstractions to have completely different infrastructure implementations.

Another benefit is that it provides a way to improve scaling of the system where a lot of logs from machines/agents are produced. Now it is proxied through master and can lead to overload of the server and fast growth of memory consumption by master.

gorkem · 2017-05-16T14:44:12Z

I agree the main benefit is to be able to implement Che to run on Openshift/Kubernetes. The Openshift implementation has reached to a level that it can not continue without replacing and replicating DockerContainer.
@garagatyi where should the discussion on this continue, there is a ton of feedback that you can probably receive from Openshift implementation.

TylerJewell · 2017-05-16T14:45:58Z

New specifications are going to be written over the next month. Those specifications will appear on GitHub and the conversation can continue there. Tyler Jewell | CEO | tyler@codenvy.com | 978.884.5355

…

On Tue, May 16, 2017 at 7:44 AM, Gorkem Ercan ***@***.***> wrote: I agree the main benefit is to be able to implement Che to run on Openshift/Kubernetes. The Openshift implementation has reached to a level that it can not continue without replacing and replicating DockerContainer. @garagatyi <https://github.com/garagatyi> where should the discussion on this continue, there is a ton of feedback that you can probably receive from Openshift implementation. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#5035 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAX9CrUNA40r1fmp_lr9FLFrNFcvA8iqks5r6bY9gaJpZM4NWmib> .

gorkem · 2017-05-16T14:50:25Z

@TylerJewell Can we expedite #5052 in that case? It appears that we will have to live with the OpenshiftConnector for a while.

TylerJewell · 2017-05-16T15:58:12Z

@gazarenkov

skabashnyuk added kind/task Internal things, technical debt, and to-do tasks to be performed. sprint/next status/open-for-dev An issue has had its specification reviewed and confirmed. Waiting for an engineer to take it. team/platform labels May 10, 2017

skabashnyuk assigned garagatyi and voievodin May 10, 2017

gazarenkov mentioned this issue May 10, 2017

Workspace Infrastructure SPI for v6 #4736

Closed

15 tasks

garagatyi closed this as completed May 16, 2017

voievodin mentioned this issue May 22, 2017

Rework GWT, UD clients to use new install agent API for logs streaming/downloading #4102

Closed

skabashnyuk removed the sprint/next label Sep 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First draft of SPI specification #5035

First draft of SPI specification #5035

skabashnyuk commented May 10, 2017

gorkem commented May 10, 2017

skabashnyuk commented May 10, 2017

garagatyi commented May 15, 2017 •

edited

Loading

l0rd commented May 15, 2017

TylerJewell commented May 15, 2017 via email

benoitf commented May 16, 2017

garagatyi commented May 16, 2017

benoitf commented May 16, 2017

slemeur commented May 16, 2017

skabashnyuk commented May 16, 2017

garagatyi commented May 16, 2017 •

edited

Loading

gorkem commented May 16, 2017

TylerJewell commented May 16, 2017 via email

gorkem commented May 16, 2017

TylerJewell commented May 16, 2017

First draft of SPI specification #5035

First draft of SPI specification #5035

Comments

skabashnyuk commented May 10, 2017

gorkem commented May 10, 2017

skabashnyuk commented May 10, 2017

garagatyi commented May 15, 2017 • edited Loading

Motivation:

Changes:

Components interconnection:

Current state of POC:

l0rd commented May 15, 2017

TylerJewell commented May 15, 2017 via email

benoitf commented May 16, 2017

garagatyi commented May 16, 2017

benoitf commented May 16, 2017

slemeur commented May 16, 2017

skabashnyuk commented May 16, 2017

garagatyi commented May 16, 2017 • edited Loading

gorkem commented May 16, 2017

TylerJewell commented May 16, 2017 via email

gorkem commented May 16, 2017

TylerJewell commented May 16, 2017

garagatyi commented May 15, 2017 •

edited

Loading

garagatyi commented May 16, 2017 •

edited

Loading