Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forward agent containerd events to shim and monitor task exits via event subscriptions #118

Merged
merged 3 commits into from
Mar 28, 2019

Conversation

sipsma
Copy link
Contributor

@sipsma sipsma commented Feb 28, 2019

Fixes #87

More details in full commit messages, but overall goal of the change is to forward all containerd events from the agent to the shim on the host and then use that to monitor task exits, at which time we can gracefully shut everything down.

Sending this out to get feedback on the implementation right away, but I won't merge until I've also written an integration test (which is likely going to entail addressing #114) and added unit tests for the new EventExchange code. That being said, the testing done so far has been manual. I verified that both killing a task (via ctr kill) and the task exiting on its own result in a task exit event being forwarded to and logged by containerd. The shim and VM exit as expected in both cases too.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@sipsma
Copy link
Contributor Author

sipsma commented Feb 28, 2019

I forgot to run the linter locally before pushing, will fix that before merging

exchangeBridge/exchange.go Outdated Show resolved Hide resolved
exchangeBridge/exchange.go Outdated Show resolved Hide resolved
exchangeBridge/exchange.go Outdated Show resolved Hide resolved
agent/main.go Show resolved Hide resolved
exchangeBridge/exchange.go Outdated Show resolved Hide resolved
exchangeBridge/exchange.go Outdated Show resolved Hide resolved
exchangeBridge/exchange.go Outdated Show resolved Hide resolved
exchangeBridge/exchange.go Outdated Show resolved Hide resolved
exchangeBridge/exchange.go Outdated Show resolved Hide resolved
exchangeBridge/exchange.go Outdated Show resolved Hide resolved
xibz
xibz previously requested changes Feb 28, 2019
Copy link
Contributor

@xibz xibz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also get some tests to ensure this works properly?

exchangeBridge/exchange.go Outdated Show resolved Hide resolved
exchangeBridge/exchange.go Outdated Show resolved Hide resolved
exchangeBridge/exchange.go Outdated Show resolved Hide resolved
exchangeBridge/exchange.go Outdated Show resolved Hide resolved
@sipsma
Copy link
Contributor Author

sipsma commented Feb 28, 2019

Can we also get some tests to ensure this works properly?

Yes, mentioned this in the PR description but I don't plan on merging this until I've gotten an integration test up and running, which may take a few days. Just getting any feedback on the implementation out of the way first.

@sipsma sipsma force-pushed the master branch 2 times, most recently from 21e193f to 4b99eef Compare March 1, 2019 00:50
eventbridge/eventbridge.go Outdated Show resolved Hide resolved
eventbridge/eventbridge.go Outdated Show resolved Hide resolved
@mxpv
Copy link
Contributor

mxpv commented Mar 5, 2019

Ref #24

Copy link
Contributor

@samuelkarp samuelkarp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see tests (as you mentioned in the overview), but other than that the code looks pretty reasonable to me.

@sipsma
Copy link
Contributor Author

sipsma commented Mar 15, 2019

Updated w/ unit tests. I also have an e2e test written in my fork of the repo, will update this PR with it once I have merged the prerequisite #134

@sipsma
Copy link
Contributor Author

sipsma commented Mar 20, 2019

I updated w/ the e2e test. It's a pretty sizable change on top of the original so it's a separate commit.

The travis build is exiting with non-zero for reasons unknown to me at this time. When I run "make lint" on my own from the same commit it exits with 0.

@sipsma
Copy link
Contributor Author

sipsma commented Mar 21, 2019

The travis build was failing due to it only pulling 50 commits of history, which broke our git validation command.

Sam fixed this, I will pick up that fix when I squash/rebase before merging. make lint passes locally though for these changes.

.buildkite/pipeline.yml Outdated Show resolved Hide resolved
.buildkite/pipeline.yml Outdated Show resolved Hide resolved
go.mod Show resolved Hide resolved
On a high-level, this commit:
1. Adds an EventExchange (existing containerd concept) to both the agent
   and shim. This is used by both for all event publishes
2. Creates a new concept, eventbridge.Getter, which is used to forward
   all events published on the agent's exchange to the shim's exchange.

The eventbridge.Getter is implemented as a TTRPC-based client/server
that provides just one API, GetEvent. When called from the client,
GetEvent will block until an event has been published on the server's
exchange, at which time that event will be returned. A separate
function, eventbridge.Attach, does that actual work of calling GetEvent
in a loop and forwarding the events to the local exchange.

This long-polling model is used because TTRPC does not support
streaming at this time.

Signed-off-by: Erik Sipsma <sipsma@amazon.com>
agent/main.go Outdated
)

const (
defaultPort = 10789
defaultPort = 10789
defaultNamespace = "default"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is defined in namespaces.Default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed while squashing commits

This replaces the previous implementation that called GetStatus every
second with a more general approach of subscribing to TaskExit events,
which are published by the agent and then forwarded to the shim's
event exchange via the exchange bridge.

Signed-off-by: Erik Sipsma <sipsma@amazon.com>
This is the first e2e test that requires being run in isolation, as
being run on a host (or in a container) with multiple
firecracker-containerd instances would result in interference.

This thus adds a mechanism that allows buildkite to run each such
isolated test in its own docker container. The mechanism chosen here
is to name tests that require isolation with "_Isolated" at the end
and add a test util similar to the existing one for snapshotter root
tests that skips a test in an env var is not explicitly provided.

There is a script invoked as part of buildkite that finds all tests
with the "_Isolated" tag in their name and runs each one in its own
docker container.

This mechanism is minimally invasive to the actual test code itself
while not creating massive amounts of complication in our test infra
(such as other alternative methods like using build tags).

Signed-off-by: Erik Sipsma <sipsma@amazon.com>
@sipsma sipsma dismissed xibz’s stale review March 28, 2019 21:31

Addressed previous comments

@sipsma sipsma merged commit 03b790f into firecracker-microvm:master Mar 28, 2019
sipsma added a commit to sipsma/firecracker-containerd that referenced this pull request Apr 1, 2019
Forward agent containerd events to shim and monitor task exits via event subscriptions
xibz pushed a commit to xibz/firecracker-containerd that referenced this pull request Apr 16, 2019
Forward agent containerd events to shim and monitor task exits via event subscriptions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants