Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

initial shim implementation #1

Merged
merged 6 commits into from
Dec 5, 2017
Merged

Conversation

bergwolf
Copy link
Member

@bergwolf bergwolf commented Nov 21, 2017

This is based on runv's current Hyperstart interface and can be migrated to the new gRPC APIs once they are settled.

Currently the agent address parameter must be an unix socket. To support vsock endpoints, we need a new gRPC dialer for vsock connections, which should be added in runtime gRPC API helpers instead.

TODOs:

  • log engine
  • move to official kata runtime gRPC API
  • vsock endpoints
  • tests and travis CI
  • README

@sameo sameo requested review from amshinde and sboeuf November 21, 2017 18:44
main.go Outdated
proxyWinsize bool
)

flag.BoolVar(&debug, "debug", false, "enable debug logging")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using logrus would allow us to pass a string to --log instead of a boolean.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's stick with glog for now and migrate to logrus once the decision is made.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss tomorrow about that.

main.go Outdated

flag.BoolVar(&debug, "debug", false, "enable debug logging")
flag.StringVar(&logDir, "logdir", "", "the directory for the logging")
flag.StringVar(&root, "root", "", "root directory for storage of container state")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we want to store some state here to handle the shim crashes? Ideally we should be able to restart a shim and get the container state from the agent and have a stateless shim.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What states do you have in mind? IMO the shim should be stateless. The way to restart is to start it again with the same parameters. But I can add states file if necessary.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bergwolf Yes, the shim should be stateless. But passing a -root option for root directory for storage of container state seems to indicate the opposite. Since this option is not used in the code, I'm not sure what the intention here is and I want to make sure the shim is indeed stateless.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sameo I agree. I'll drop it. It was needed by runv shim but not anymore here.

main.go Outdated
}

// wait to be waken up if starting a new pod
if process == "init" {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want to define this "init string as a constant in the code, to be more explicit.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So could you describe who's supposed to wake us up here, and why do we need to wait?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runtime. With it, we can launch shim and create guest in parallel.

main.go Outdated
flag.BoolVar(&proxyExitCode, "proxy-exit-code", true, "proxy exit code of the process")
flag.BoolVar(&proxyStdio, "proxy-stdio", true, "proxy stdio of the process")
flag.BoolVar(&proxySignal, "proxy-signal", true, "proxy signals to the process")
flag.BoolVar(&proxyWinsize, "proxy-winsize", true, "proxy tty size to the process")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any use cases where we wouldn't want to do any of these 3: stdio, signal and tty size?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not one I can think of. These are just in case.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list of parameters seems pretty complex. We should make this simpler. We really need to know about the container/process IDs, and the socket URL. But the rest has to be done in every case, we don't need to enable/disable such things with parameters.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bergwolf I propose we remove the options that are not used for now and simplify the shim CLI. It's easier to add new options when needed than to remove one and potentiall break backward compatibility.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it won't hurt anything if we already have them configurable.

Moreover, I am thinking about split proxy-stdio to in and output, because sometimes we don't want to input anything, for example, logs only.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having options just in case makes both the code base and the CLI/UI more complex. Simplicity should be an important goal for us.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I'll remove unnecessary options. We can add them back when needed in future, but not in the first versions.

shim.go Outdated
}()
}

func (s *Shim) forwardAllSignals() chan os.Signal {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How's SIGKILL handled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not yet...

shim.go Outdated
func NewShim(container, process, addr string) (*Shim, error) {
if agent, err := NewShimAgent(addr); err != nil {
return nil, err
} else {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agent is defined in if statement.

agent.go Outdated
func NewHyperAgent(addr string) (shimAgent, error) {
if h, err := libhyperstart.NewGrpcBasedHyperstart(addr); err != nil {
return nil, err
} else {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: No need for the else here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is needed as h variable is defined in the if statement.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bergwolf Well not really, you can simplify like this:

h, err := libhyperstart.NewGrpcBasedHyperstart(addr);
if err != nil {
		return nil, err
}

return h, nil

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sboeuf It's just another version. I wouldn't call it simpler though :)

agent.go Outdated

// A new shim agent implementation should have the shimAgent interface
// Currently it is a subset of libhyperstart.Hyperstart interface but it
// allows for new agent and protocols.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a huge fan of this requirement. I believe the only requirement for an agent should be the gRPC interface. Is that what you meant by This is based on runv's current Hyperstart interface and can be migrated to the new gRPC APIs once they are settled. ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more like a work around as we have not had an working agent (or mock) for the new gRPC APIs. But we need the shim could be tested.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sameo The shimAgent interface will also be removed once we can import the new gRPC APIs. The idea is not to import runv in shim.go for now. In future, agent.go will not import runv either.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who's responsible for moving gRPC API to kata ? I think we should not merge anything as long as we rely on some runv based packages (even if the content is basically the same.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sboeuf kata-containers/runtime#2 is working on this goal.

agent.go Outdated

// A new shim agent implementation should have the shimAgent interface
// Currently it is a subset of libhyperstart.Hyperstart interface but it
// allows for new agent and protocols.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sameo The shimAgent interface will also be removed once we can import the new gRPC APIs. The idea is not to import runv in shim.go for now. In future, agent.go will not import runv either.

agent.go Outdated
func NewHyperAgent(addr string) (shimAgent, error) {
if h, err := libhyperstart.NewGrpcBasedHyperstart(addr); err != nil {
return nil, err
} else {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is needed as h variable is defined in the if statement.

main.go Outdated
proxyWinsize bool
)

flag.BoolVar(&debug, "debug", false, "enable debug logging")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's stick with glog for now and migrate to logrus once the decision is made.

main.go Outdated

flag.BoolVar(&debug, "debug", false, "enable debug logging")
flag.StringVar(&logDir, "logdir", "", "the directory for the logging")
flag.StringVar(&root, "root", "", "root directory for storage of container state")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What states do you have in mind? IMO the shim should be stateless. The way to restart is to start it again with the same parameters. But I can add states file if necessary.

main.go Outdated
flag.BoolVar(&proxyExitCode, "proxy-exit-code", true, "proxy exit code of the process")
flag.BoolVar(&proxyStdio, "proxy-stdio", true, "proxy stdio of the process")
flag.BoolVar(&proxySignal, "proxy-signal", true, "proxy signals to the process")
flag.BoolVar(&proxyWinsize, "proxy-winsize", true, "proxy tty size to the process")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not one I can think of. These are just in case.

main.go Outdated
}

// wait to be waken up if starting a new pod
if process == "init" {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runtime. With it, we can launch shim and create guest in parallel.

shim.go Outdated
func NewShim(container, process, addr string) (*Shim, error) {
if agent, err := NewShimAgent(addr); err != nil {
return nil, err
} else {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agent is defined in if statement.

shim.go Outdated
}()
}

func (s *Shim) forwardAllSignals() chan os.Signal {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not yet...

@sboeuf
Copy link

sboeuf commented Nov 22, 2017

glog vs logrus is one thing we want to clarify, but there is also govendor vs dep we need to discuss about.

@gnawux
Copy link
Member

gnawux commented Nov 22, 2017

@sboeuf we won't mind to change to golang official dep tool whenever it graduates from experimental stage.

@sameo
Copy link

sameo commented Nov 22, 2017

@gnawux dep is no longer experimental. From https://github.com/golang/dep:

dep is safe for production use. That means two things:

* Any valid metadata file (Gopkg.toml and Gopkg.lock) will be readable and considered valid by any future version of dep.
* The CLI UI is mostly stable. dep init and dep ensure are mostly set; dep status is likely to change a fair bit, and dep prune is going to be absorbed into dep ensure.

@gnawux
Copy link
Member

gnawux commented Nov 22, 2017

@sameo I agree to change to dep.

I notice the message in README of golang/dep as well. I think it is better to use the future official tool, though the readme said is still in experimental.

@bergwolf
Copy link
Member Author

updated with following changes:

  1. remove unnecessary arguments
  2. constify "init" process name
  3. replace govendor with golang official dep

@laijs
Copy link
Contributor

laijs commented Nov 23, 2017

Since the agent-api pr was merged, it is better to switch to it.

... and I'm shocked that the dependence is so large for such a small program.

shim.go Outdated

type Shim struct {
container string
process string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Presumably container is the containerID? If so, I think it would be clearer to name it with the ID suffix.
  • Is process the PID? If so, again, why not specify it as pid int?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

processes' numeric pid are not seen on the host, we don't want to expose them, right?
{containerID, processID} is used to refer the init/exec process in the container.
Both containerID and processID are strings.

(Linux processes spawned by init/exec process which are supposed to be invisible in the host can't not be referenced.)

shim.go Outdated
}

func (s *Shim) forwardAllSignals() chan os.Signal {
sigc := make(chan os.Signal, 2048)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 2048? Could this be a top-level const maybe for clarity?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shim.go Outdated
return sigc
}

func (s *Shim) resizeTty() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate that it's longer, but resizeTerminal() looks a little "easier on the eye" to me? (Same comment to the other areas of the code that mention Tty). That would also be in line with the naming of term.RestoreTerminal().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry but the name tty is in sync with agent gRPC definition TtyWinResize

main.go Outdated
"sync"
"syscall"

"github.com/golang/glog"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to think about switching to logrus for all components.

main.go Outdated
"github.com/moby/moby/pkg/term"
)

const INIT_PROCESS = "init"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a public symbol so maybe initProcess instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to make this global. Also use of underscores is discouraged in golang.

pipe.go Outdated

type inPipe struct {
s shimAgent
c, p string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be clearer if all these members had longer more meaningful names.

shim.go Outdated

go func() {
for sig := range sigc {
if sig == syscall.SIGCHLD || sig == syscall.SIGPIPE || sig == syscall.SIGWINCH {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to consider a few more signals here like SIGBUS, SIGSEGV and SIGABRT. These should be ignored as well as they may indicate runtime exceptions with the shim itself.

main.go Outdated
@@ -20,27 +20,19 @@ func main() {
var (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in the commit message. Also can this be squashed with the previous commit?

@bergwolf bergwolf force-pushed the shim branch 3 times, most recently from 8b9619c to cc8b402 Compare December 1, 2017 02:53
@bergwolf
Copy link
Member Author

bergwolf commented Dec 1, 2017

updated to address @jodh-intel's and @amshinde's comments, change to use logrus and connect to agent gRPC server with vsock supported.

}
return 0, err
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would factorise the *out code and define:

type outPipe struct {
	ctx         context.Context
	agent       *shimAgent
	containerId string
	pid         uint32
        bool        errPipe
}

And then:

func (p *outPipe) Read(data []byte) (n int, err error) {
        var readFn func(ctx context.Context, in *ReadStreamRequest, opts ...grpc1.CallOption) (*ReadStreamResponse, error)

         if p.errPipe {
               readFn = p.agent.ReadStderr
         } else {
               readFn = p.agent.ReadStdout
         }

	resp, err := readFn(p.ctx, &pb.ReadStreamRequest{
		ContainerId: p.containerId,
		PID:         p.pid,
		Len:         uint32(len(data))})
	if err == nil {
		copy(data, resp.Data)
		return len(resp.Data), nil
	}

	// check if it is &grpc.rpcError{code:0xb, desc:"EOF"} and return io.EOF instead
	if grpc.Code(err) == codes.OutOfRange && grpc.ErrorDesc(err) == "EOF" {
		return 0, io.EOF
	}
	return 0, err
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sameo Good point! I'm keeping the pipe struct in sync but using a helper function for both outPipe and errPipe Read() functions. Please see if the new code looks good to you.

main.go Outdated
os.Exit(exitFailure)
}

// wait to be waken up if starting a new sandbox
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to insist on that one, but could you please describe why we need to wait from the shim when being a sandbox shim. And also document who's supposed to send us SIGUSR1, and at which point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dropped it because with current StartContainer() and ExecProcess() APIs, runtime can no longer start shim and create sandbox in parallel.

@sameo
Copy link

sameo commented Dec 4, 2017

This looks good. I only have a couple of minor comments, besides the missing points from the TODO list (most important one being unit tests, imho).

@bergwolf
Copy link
Member Author

bergwolf commented Dec 5, 2017

UT success depends on agent pr kata-containers/agent#36. Will trigger travis once that one is merged.

@sameo
Copy link

sameo commented Dec 5, 2017

LGTM

Approved with PullApprove

Gopkg.toml Outdated


[[constraint]]
branch = "master"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bergwolf You need to revendor to have the tests pass.
Also, could you please lock that one to a specific version (The one that include the dialer fixes)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Makefile Outdated
@@ -0,0 +1,8 @@
all:
go build -o shim
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we name the binary kata-shim (to match main.go:shimName)? A binary called shim is too generic imho. This will require an update to .gitignore too.

I've raised a PR to do the same for the agent: kata-containers/agent#39.

main.go Outdated
// winsize
s, err := term.SetRawTerminal(os.Stdin.Fd())
if err != nil {
shimLog.Errorf("failed to set raw terminal: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be better as:

shimLog.WithError(err).Error("failed to set raw terminal")

Same comment for other .Errorf() calls.

main.go Outdated
// wait until exit
exitcode, err := shim.wait()
if err != nil {
shimLog.Errorf("failed waiting for process %d: %s", pid, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to start using structured logging from the outset, as we've done for the kata agent, so this could become:

shimLog.WithError(err).WithField("pid", pid).Error("failed to wait for shim process")

Same comment for the other logging calls which could use the logrus .WithField() or .WithFields(logrus.Fields{ ... }) calls.

shim.go Outdated
_, err2 := s.agent.CloseStdin(s.ctx, &pb.CloseStdinRequest{
ContainerId: s.containerId,
PID: s.pid})
shimLog.Infof("copy stdin err1 %s err2 %s", err1, err2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to log these errors every time? It would be more conventional to do something like:

if err1 != nil {
    shimLog.WithError(err1).Warn("failed to copy stdin to pipe")

(and the same sort of thing for err2).

Same comment applies to the log calls below.

Signed-off-by: Peng Tao <bergwolf@gmail.com>
Signed-off-by: Peng Tao <bergwolf@gmail.com>
Signed-off-by: Peng Tao <bergwolf@gmail.com>
Signed-off-by: Peng Tao <bergwolf@gmail.com>
@bergwolf
Copy link
Member Author

bergwolf commented Dec 5, 2017

updated to address @jodh-intel's comments.

.gitignore Outdated
@@ -0,0 +1 @@
shim
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this should be kata-shim now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, fixed. Thanks!

@jodh-intel
Copy link
Contributor

jodh-intel commented Dec 5, 2017

Aside from the nit...

lgtm.

Approved with PullApprove

Signed-off-by: Peng Tao <bergwolf@gmail.com>
Signed-off-by: Peng Tao <bergwolf@gmail.com>
@laijs
Copy link
Contributor

laijs commented Dec 5, 2017

LGTM

Approved with PullApprove

@laijs laijs merged commit 58ac96f into kata-containers:master Dec 5, 2017
jcvenegas added a commit to jcvenegas/shim-1 that referenced this pull request Apr 25, 2018
do no merge - only to check move to go 1.10 works

note: make issue number

Fixes: kata-containers#1

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
@jcvenegas jcvenegas mentioned this pull request Apr 25, 2018
jcvenegas added a commit to jcvenegas/shim-1 that referenced this pull request Apr 25, 2018
do no merge - only to check move to go 1.10 works

note: make issue number

Fixes: kata-containers#1

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
jcvenegas added a commit to jcvenegas/shim-1 that referenced this pull request Apr 25, 2018
do no merge - only to check move to go 1.10 works

note: make issue number

Fixes: kata-containers#1

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
jcvenegas added a commit to jcvenegas/shim-1 that referenced this pull request Apr 25, 2018
do no merge - only to check move to go 1.10 works.

note: make issue number

Fixes: kata-containers#1

Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
sboeuf pushed a commit to sboeuf/shim that referenced this pull request Dec 8, 2018
…r-enabling

HACK HACK HACK: get firecracker working
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants