Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

[RFC] Implement a runtime interface v2 for both containerd and cri-o #907

Closed
sboeuf opened this issue Nov 14, 2018 · 13 comments
Closed

[RFC] Implement a runtime interface v2 for both containerd and cri-o #907

sboeuf opened this issue Nov 14, 2018 · 13 comments

Comments

@sboeuf
Copy link

sboeuf commented Nov 14, 2018

Problem statement

Recently, containerd defined a new interface called containerd-shim-v2. The main goal for this new interface is to make sure that no assumption is based on the PID of the container processes. This way, it allows for a way better support of VM based container runtimes such as Kata Containers.
We currently have a PR in flight to support this interface: #572
One thing to notice that will help to understand the problem statement is that containerd-shim-v2 is a gRPC interface that expects the implementation to be defined through a gRPC server.
The introduction of this new API is a real step forward in making VM based container runtimes first class citizens in the container ecosystem, and I think we're all excited about that.

However, I have one concern regarding the burden maintenance on the long term between different solutions out there. By that, I mean that I would expect that we could progressively lower the maintenance on the kata-runtime CLI, and eventually deprecate it since the containerd-shim-v2 interface is a better fit for VM based runtimes.
Unfortunately, this logic does not work if all the players in the ecosystem don't rely on such interface, but instead stick to the runc CLI one. And here I'm thinking about CRI-O, which is the other CRI implementation widely used to run Kata Containers.

That's why we've started some discussion with CRI-O maintainers (/cc @runcom @mrunalp) in order to fill this gap and get a better support for VM based runtimes through CRI-O too.

Difference between CRI-O and containerd

CRI-O maintainers agree with the need for such interface and they are more than happy to move into this direction rather sooner than later.
There's only one point where they would like something different than what has been introduced by containerd, the gRPC server. Indeed, adding a gRPC protocol brings a clear abstraction layer, but it brings some memory footprint increase due to the extra process running one server per sandbox. Another argument mentioned was the complexity to debug issues when too many gRPC layers are involved when looking at the full stack.

Based on this, we don't want to diverge by implementing one interface for CRI-O and another for containerd, otherwise we would get back to the initial problem of maintaining several code bases according to the interface being used.

Proposal

That's why I thought we could try to define a standard interface, something that would not be specific to containerd or CRI-O, but that both could rely on.

From containerd perspective

containerd-shim-v2 interface would be a simple gRPC wrapper on top of this new interface. This wrapper would be specific to containerd-shim-v2, and would implement only the gRPC server logic.

From CRI-O perspective

CRI-O would rely directly on this interface, and in order to avoid the usage of any client/sever model, it would import this interface definition from a standard package, and it would use Kata Containers implementation as a shared library implementing this interface. This kata_runtime_api_v2.so would be provided through the CRI-O configuration file, and could be different depending on the chosen runtime (relying on RuntimeClass).

More thoughts

This API needs to be defined carefully by making sure it receives the right information/parameters through every function call.
Moreover, I think this new API should assume being run by a long running process, this way the implementation can easily assume that it will be able to retrieve the pointers it needs, without recreating in-memory data from scratch (reading from disk basically).

An ASCII diagram to clarify :)

+----------------------+                 +-----------------------+
|                      |                 |                       |
|      containerd      |                 |         cri-o         |
|                      |                 |                       |
+----------+-----------+                 +-----------+-----------+
           |                                         |
           |                                         |
+----------v-----------+                             |
|  containerd-shim-v2  |                             |
|      gRPC server     |                             |
|                      |                             |
+----------+-----------+                             |
           |                                         |
           |                                         |
+----------v-----------------------------------------v-----------+
|                                                                |
|                    Container runtime API v2                    |
|                                                                |
+-------------------------------+--------------------------------+
                                |
                                |
                                |
+-------------------------------v--------------------------------+
|                                                                |
|                       Kata Containers API                      |
|                                                                |
+----------------------------------------------------------------+

Here is a link to a google doc where we started this conversation and how we could achieve this.

Feedback? Questions?

/cc @egernst @sameo @mcastelino @amshinde @grahamwhaley @jodh-intel @bergwolf @lifupan @gnawux @WeiZhang555 @jon @mrunalp @runcom @crosbymichael @Random-Liu

@gnawux
Copy link
Member

gnawux commented Nov 15, 2018

Thanks @sboeuf for the figure and explanation. I pretty support the work on CRI-O support, however, I have some thoughts on the proposal.

On the new indirect layer

At the well beginning, we implemented the runtime as both a command line tool and a library. As a result, we may have shimv2 (aka #572 ), which is built on top of the library, which is the "kata containers API" in the figure if I understand correctly. And both the command line and shimv2 are client codes for the API.

Then my questions (comments) are:

  • Are there much common stuff between shimv2 and the planned CRI-O interface, but not for runtime?
  • If so, could the common stuff become part of the "kata containers API" instead of another layer on top of it?
  • Furthermore, I think we should reuse major parts of the common stuff in the command line as well, and this could minimize the maintenance burden actually.

On the critiques against gRPC

In general, I don't think gRPC is the best interface for kata, but I don't think it is wrong.

I definitely agree on that gRPC introduced some overhead on both latency and memory consumption. And I personally think that the implementation of containerd is a bit too sophisticated for new developers.

However, I think the "spirit of decoupling every part" behind containerd is somehow correct. The gRPC here is introduced to decouple the daemon and runtime implementation, which implies it hide more detail of the runtime, it may be deployed distributedly or on a heterogeneous platform.

@bergwolf
Copy link
Member

@sboeuf @gnawux I think the most different part between the proposed Container runtime API v2 and current Kata Containers API is kata specific configuration handling. So instead of creating a new Container runtime API v2, we add a new package called e.g. kata-config. Let it define how kata containers integrate OCI spec with additional kata customized configurations and create proper sandbox and container configs to be used by Kata runtime APIs. Then by importing kata-config and virtcontainers, both containerd-shim-v2 and cri-o can translate from their own api interfaces into calling Kata Container APIs.

WDYT?

@sboeuf
Copy link
Author

sboeuf commented Nov 16, 2018

Ok so today I had a call with CRI-O folks that helped to clarify things a lot!
When I said that they didn't want to rely on gPRC, they meant that they didn't want to introduce this for the runc default code path, but they are open to use it from a v2 implementation for Kata.

The part that I misunderstood was that we were talking about a Go interface inside CRI-O itself, and we would provide a Kata Containers implementation for that. They're actually not opposed to host the implementation directly in CRI-O repository as it would simplify the testing (instead of having a shared library for that).

Based on the fact that this implementation would be specific to Kata v2, then we could implement the gRPC client from there, which would simplify a lot here.

This means that we could directly reuse the containerd-shim-v2 interface as a first step. And I can imagine that further down the road, we could try to push containerd-shim-v2 interface as a standard/generic one.

@gnawux

However, I think the "spirit of decoupling every part" behind containerd is somehow correct. The gRPC here is introduced to decouple the daemon and runtime implementation, which implies it hide more detail of the runtime, it may be deployed distributedly or on a heterogeneous platform.

I totally agree here. And if we think about things like live upgrade or high availability (handle crash and restart), it makes more sense to use the decoupling/abstraction that we could benefit from gRPC.

Updated diagram


 +------------------------------------------------------------+
 |                                                            |
 |                          CRI-O                             |
 |                                                            |
 |  +-------------------------+  +-------------------------+  |
 |  |                         |  |                         |  |
 |  |           v1            |  |          v2             |  |
 |  |                         |  |                         |  |
 |  |                         |  | +---------------------+ |  |
 |  | +---------+ +---------+ |  | |                     | |  |
 |  | |         | |         | |  | |   Go interface v2   | |  |
 |  | |  conmon | | conmon  | |  | |                     | |  |
 |  | |         | |         | |  | +---------------------+ |  |
 |  | |    +    | |    +    | |  | +---------+ +---------+ |  |
 |  | |         | |         | |  | | conmon  | |  kata   | |  |
 |  | |  runc   | |  kata   | |  | |   +     | |  gRPC   | |  |   +------------------+
 |  | |         | |         | |  | |  runc   | |  client | |  |   |                  |
 |  | +---------+ +---------+ |  | +---------+ +---------+ |  |   |    containerd    |
 |  +--------------------------+ +-------------------------+  |   |                  |
 +------------------------------------------------------------+   +------------------+
          |            |                 |          |                        |
          |            |                 |          |                        |
          |            |                 |          |                        |
          |            |                 |          |                        |
          |            |                 |          +----------+             |
          |            |                 |                     |             |
          |            |                 |                     |             |
          |            |                 |                     |             |
    +-----v----+ +-----v-----------+ +---v---+           +-----v-------------v-------------+
    |   runc   | |    kata-runtime | | runc  |           |  containerd-shim-v2 API         |
    |          | |                 | |       |           | Kata implementation gRPC server |
    +----------+ +-----------------+ +-------+           |                                 |
                          |                              +---------------------------------+
                          |                                                      |
                          +-------------------+                                  |
                                              |                                  |
                                              |                                  |
                                        +-----v----------------------------------v---------+
                                        |                                                  |
                                        |                    Kata API                      |
                                        |                                                  |
                                        +--------------------------------------------------+


/cc @mcastelino @mrunalp @runcom

@Random-Liu
Copy link

One thing to notice that will help to understand the problem statement is that containerd-shim-v2 is a gRPC interface that expects the implementation to be defined through a gRPC server.

Just make it clear. The interface is a grpc interface, but the server is ttrpc server which is more light weight.

The gRPC here is introduced to decouple the daemon and runtime implementation, which implies it hide more detail of the runtime, it may be deployed distributedly or on a heterogeneous platform.

Agree. Thanks for pointing this out. :)

This means that we could directly reuse the containerd-shim-v2 interface as a first step. And I can imagine that further down the road, we could try to push containerd-shim-v2 interface as a standard/generic one.

Sounds good.

@sboeuf
Copy link
Author

sboeuf commented Nov 16, 2018

Thanks @Random-Liu for the comments :)

@gnawux @bergwolf does the updated proposal make more sense?

Basically, we agree on this, we could get started with the definition/implementation of the new interface in CRI-O, and it should work out the box with the pending implementation of kata-containerd-shim-v2 #572

@WeiZhang555
Copy link
Member

The updated proposal looks good, and we're now more sure that kata-containerd-shim-v2 #572 is on the right way.
Just want to reminder, we still need to keep kata OCI spec compliant and runc cmdline compliant while we are doing the exciting gRPC interface work :-)

@sameo
Copy link

sameo commented Nov 19, 2018

Just want to reminder, we still need to keep kata OCI spec compliant and runc cmdline compliant while we are doing the exciting gRPC interface work :-)

Yes, we're not going to get rid of the runc CLI compliance for now.
@sboeuf @Random-Liu Do we know if/when we'll have to stop supporting the runc CLI to interact with Docker itself? By then will there be any reasons left to support the CLI at all?

Overall I think the direction this goes into makes perfect sense, but in the long run it would be quite beneficial to see a standardized RPC interface for container runtimes, proposed by/to the OCI. Something runc, CRI-O, containerd and all others could converge into.

One final nit: On your v1 diagram, there should only be one runc + conmon box. The Kata specific bits of CRI-O are somwhere above the final runc+conmoin layer. This will also highlights the fact that we're effectively adding one additional abstraction layer (The CRI-O Go interface for runtimes) with this new design.

@sboeuf
Copy link
Author

sboeuf commented Nov 19, 2018

@sameo

Do we know if/when we'll have to stop supporting the runc CLI to interact with Docker itself? By then will there be any reasons left to support the CLI at all?

Well I think latest Docker depends on latest containerd, so the support should be there. That being said, I hope Docker does not make any weird assumptions based on PIDs even when using containerd v2. I think @Random-Liu and @crosbymichael have more input on this.

The only concern I have about removing support of the CLI at some point would be for some of our customers that might run with older versions of Docker, which would not rely on latest containerd.

Overall I think the direction this goes into makes perfect sense, but in the long run it would be quite beneficial to see a standardized RPC interface for container runtimes, proposed by/to the OCI. Something runc, CRI-O, containerd and all others could converge into.

Agreed! I really think that once we can show that we converged both CRI-O and containerd to this interface, we can push for this interface to become a standard one.

One final nit: On your v1 diagram, there should only be one runc + conmon box. The Kata specific bits of CRI-O are somwhere above the final runc+conmoin layer. This will also highlights the fact that we're effectively adding one additional abstraction layer (The CRI-O Go interface for runtimes) with this new design.

Oh yes that's right, the Go interface has to cover everything, which would simplify this diagram :)

@gnawux
Copy link
Member

gnawux commented Nov 20, 2018

@sboeuf Many thanks for the clarification. It makes much more sense then.

@WeiZhang555
Copy link
Member

Overall I think the direction this goes into makes perfect sense, but in the long run it would be quite beneficial to see a standardized RPC interface for container runtimes, proposed by/to the OCI. Something runc, CRI-O, containerd and all others could converge into.

@sameo

I can't agree more! There was once a PR trying to standardize runc CLI but failed, see Add Runtime CLI Spec , I was quite looking forward to it but sadly it seems making a CLI spec is too hard.

Instead, GRPC interface could benifit non-runc OCI runtime a lot and easier to standardize.

Well I think latest Docker depends on latest containerd, so the support should be there. That being said, I hope Docker does not make any weird assumptions based on PIDs even when using containerd v2. I think @Random-Liu and @crosbymichael have more input on this.

@sboeuf
As I know, there're not so many assumptions in docker based on PIDs. Something like docker top or sharing namespace(/proc//ns/net) could use the pid but it seems not that important...

@cyphar
Copy link

cyphar commented Dec 8, 2018

I don't want to be a killjoy, but I personally really don't like the idea of using GRPC for everything.

My primary issue with it is that it requires a long-running daemon for management, something which really shouldn't be necessary (IMHO) just to have sane calling conventions. I already am not a huge fan that conmon is necessary (but at least it's only tied to individual container lifetimes) but it feels quite odd to have a GRPC server which will (at the end of the day) probably just shell out some commands.

But that's just me, and as someone who followed and commented on all of the OCI runtime-spec CLI discussions and I'm well aware how much of a battle that was. But if we're going to have a GRPC interface I would at least request that we standardise this (by which I mean "put it in OCI or similar", not "standardise within the k8s source tree") so it's actually generally usable outside of k8s.

@sboeuf
Copy link
Author

sboeuf commented Dec 10, 2018

@cyphar

My primary issue with it is that it requires a long-running daemon for management, something which really shouldn't be necessary (IMHO) just to have sane calling conventions. I already am not a huge fan that conmon is necessary (but at least it's only tied to individual container lifetimes) but it feels quite odd to have a GRPC server which will (at the end of the day) probably just shell out some commands.

I understand the concerns here but that's also what's allows for a proper abstraction layer so that daemon's such as CRI-O and containerd can be restarted without tearing down some running pods and containers.

But that's just me, and as someone who followed and commented on all of the OCI runtime-spec CLI discussions and I'm well aware how much of a battle that was. But if we're going to have a GRPC interface I would at least request that we standardise this (by which I mean "put it in OCI or similar", not "standardise within the k8s source tree") so it's actually generally usable outside of k8s.

I couldn't agree more on this, here is one of my previous comment on this thread:

I can imagine that further down the road, we could try to push containerd-shim-v2 interface as a standard/generic one.

where @Random-Liu agreed it would be a good thing to have. Also, we've talked about this offline with @mrunalp, and we agreed a neutral place for this interface would be pretty more appropriate than the containerd repo.

@rneilson
Copy link

rneilson commented Apr 3, 2019

Forgive me -- I'm an utter, total outsider here (catching up on 2 1/2 years of convos and spec revisions), but...

Please, don't make everything use gRPC. A CLI API is a known quantity; file descriptors and argv are well-known constructs, fairly portable across languages (unix sockets passing new pty fds possibly excepted). But gRPC is a very particular thing, tied de facto if not de jure to Go[1]. I realize most of the container ecosystem is written in Go, but anything in the OCI spec reliant on Go (or at least Go-first) semantics is a problem[2] for runtimes or engines or orchestrators written in any other language. It's already tricky enough dealing with things like PR_SET_CHILD_SUBREAPER[3] for the start/create split.

How about a nice generic websocket interface instead?

[1] Or at least Protocol Buffers, the serialization format, plus HTTP 2, the network protocol, plus specific state handling, all of which are relatively standard in Go, but are more difficult (without library upon library) in other languages.
[2] Not necessarily a problem of possibility, but definitely a problem of affordances. It's fundamentally more of a barrier using something as specific as gRPC in, say, JavaScript or Python or Elixir, at least not without a ton of extra dependencies.
[3] In Python, Ruby, and JavaScript (well, Node.js anyway), it requires FFIs which are, quite frankly, sketchy escape hatches for going off the ranch. Erlang/Elixir don't even have a way to do that at all.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants