Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add runtime state configuration and structs #87

Merged
merged 1 commit into from
Sep 3, 2015

Conversation

crosbymichael
Copy link
Member

This adds runtime state information for oci container's so that it can
be persisted and used by external tools.

Closes #72
Closes #41

Signed-off-by: Michael Crosby crosbymichael@gmail.com

@wking
Copy link
Contributor

wking commented Jul 29, 2015

On Wed, Jul 29, 2015 at 02:10:46PM -0700, Michael Crosby wrote:

This adds runtime state information for oci container's so that it can
be persisted and used by external tools.

Closes #72

I think closing #41 (the state spec) is important, but I'm still not
sold on the directory-binding proposed in #72. Besides comments
supporting storage-agnostic state by me, @LK4D4 [1,2], and
@kunalkushwaha 3 in #72, see the discussion in
opencontainers/runc#159.

Can we restrict this PR to just address the spec content (#41), which
I think we all agree should be addressed in this spec)? Then the more
controversial storage location (#72) won't bog down these changes.

The directory structure for a container is `<root>/<containerID>/state.json`.

* **id** (string) ID is the container's ID.
* **pid** (int) Pid is the main processes id within the container.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“processes” → “process's”. And also probably “Pid” → “PID” (at least, that's what kill(1) uses) and “id” → “ID” (that's what chown(1) uses).

@wking
Copy link
Contributor

wking commented Jul 29, 2015

The state content here looks good to me (modulo a few minor copy-edits
which I commented on inline). The only things missing from #41 are:

  • device nodes
  • rlimits
  • sysctls
  • mounts

Folks might also be interested in things like the TTY associated with
the main process. But I'd expect all of that can be extracted
canonically from the kernel itself (once you know the main PID), so we
don't have to pass it along in the state JSON. On the other hand, you
can also get the cgroups and namespaces from the main PID by looking
through /sys, so I'm not sure where we draw the line between “we'll
bundle that in the state JSON” and “look it up yourself”.

@mrunalp
Copy link
Contributor

mrunalp commented Jul 29, 2015

@wking We are being conservative in what we add to the state right now.

@wking
Copy link
Contributor

wking commented Jul 29, 2015

On Wed, Jul 29, 2015 at 03:59:11PM -0700, Mrunal Patel wrote:

We are being conservative in what we add to the state right now.

That makes sense, but then why not leave cgroups and namespaces out of
the spec? Interested hooks can always use ‘ps -o cgroup $PID’, ‘ps -o
ipcns $PID’, etc. to figure those things out given the main process's
PID.

@wking
Copy link
Contributor

wking commented Jul 29, 2015

On Wed, Jul 29, 2015 at 04:11:49PM -0700, W. Trevor King wrote:

… but then why not leave cgroups and namespaces out of the spec?

And the external file descriptors, since ‘lsof -p $PID’ should be able
to give you those.

On Linux based systems the state information is stored in `/run/oci`.
The directory structure for a container is `<root>/<containerID>/state.json`.

* **id** (string) ID is the container's ID.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id is a bit under specified; are there any restrictions on the format of the string? what is the uniqueness domain?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about all lowercase alphabets and digits?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Wed, Aug 05, 2015 at 09:19:02AM -0700, Mrunal Patel wrote:

+* id (string) ID is the container's ID.

How about all lowercase alphabets and digits?

That sounds good to me, although we might want to allow hyphens for
folks who want to use UUIDs. And currently runC defaults to the
bundle's directory name, so we'd need to figure out how we wanted to
map other characters into the valid character set.

Do we want to add requirements for the uniqueness domain? Unique to
the host seems like a minimum, but we can require universal uniqueness
(e.g. “container IDs must be UUIDs 1”) or leave that up to the
implementation. I'd rather leave it up to the implementation, since
as long as UUIDs are valid container identifiers, runtime callers
that want universal uniqueness are free to use them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add a comment that the reason we have an ID is because the hooks only get an open fd to this json document so they won't know the unique ID which may be useful for removal, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also need to note that containerID MUST match the containerID subdirectory field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Wed, Aug 26, 2015 at 10:12:52AM -0700, Brandon Philips wrote:

+* id (string) ID is the container's ID.

We need to add a comment that the reason we have an ID is because
the hooks only get an open fd to this json document so they won't
know the unique ID which may be useful for removal, etc.

I wasn't clear on how it would be useful for removal. Everyone got on
board when someone mentioned network teardown, but the connection
between that and the container ID wasn't clear to me. Not that we
have to be completely specific when motivating this entry. I'm just
pointing out that if we want to explain why we need the container ID
in a post-stop hook, we might want to provide more detail than we had
in today's meeting.

@crosbymichael
Copy link
Member Author

any other comments, questions, concerns?

@wking
Copy link
Contributor

wking commented Aug 4, 2015

On Tue, Aug 04, 2015 at 11:23:02AM -0700, Michael Crosby wrote:

any other comments, questions, concerns?

If that's “can we merge this now?”, I'd suggest at least addressing
the typos first [1,2,3]. The other comments that still apply to 4
all look reasonable and easy to address as well.

I still think the issue of “how much more than the main PID do we
need?” is open 5, since we're currently professing conservatism 6
but still seem to be supplying some PID-calculable data [7,8]. We can
continue that discussion in a follow-up issue if we want, but it's
hard to pull information back out of a spec that's intended to be
infinitely forward compatible 9.

@crosbymichael
Copy link
Member Author

If that's “can we merge this now?”, I'd suggest at least addressing the typos first [1,2,3].

@wking Yes, I know a little bit about contributing to open source, I like to get additional feedback before I start addressing the comments so I don't have to update, push, feedback, update, push, feedback multiple times.


* **id** (string) ID is the container's ID.
* **pid** (int) Pid is the main processes id within the container.
* **root** (string) Root is the path to the container's root filesystem specified in the configuration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe replace “$ENTRY ($TYPE) $ENTRY is the …” with “$ENTRY ($TYPE) The …”.

@wking
Copy link
Contributor

wking commented Aug 4, 2015

On Tue, Aug 04, 2015 at 12:40:41PM -0700, Michael Crosby wrote:

Yes, I know a little bit about contributing to open source, I like
to get additional feedback before I start addressing the comments so
I don't have to update, push, feedback, update, push, feedback
multiple times.

That makes sense, although rebase/push is reasonably cheap when
there's just a single commit.

While I'm summarizing issues that could be addressed in a reroll 1,
I think the only one left out was my suggestion to drop the storage
phrasing (or at least pull it out into a separate PR) 2.

@crosbymichael
Copy link
Member Author

I updated this PR addressing comments and rebasing.

@wking
Copy link
Contributor

wking commented Aug 5, 2015

On Wed, Aug 05, 2015 at 01:09:49PM -0700, Michael Crosby wrote:

I updated this PR addressing comments and rebasing.

Do we have a reason for not moving LinuxState from spec_linux.go to
state_linux.go 1?

@crosbymichael
Copy link
Member Author

@wking because I don't want alot of files

@wking
Copy link
Contributor

wking commented Aug 5, 2015

On Wed, Aug 05, 2015 at 01:23:35PM -0700, W. Trevor King wrote:

Do we have a reason for not moving LinuxState from spec_linux.go to
state_linux.go 1?

And I didn't mention this explicitly on the first pass, but I'd move
LinuxStateDirectory from spec_linux.go to state_linux.go as well (if
we don't drop the storage stuff or spin it off to a separate PR 1).

@wking
Copy link
Contributor

wking commented Aug 5, 2015

On Wed, Aug 05, 2015 at 01:09:49PM -0700, Michael Crosby wrote:

I updated this PR addressing comments and rebasing.

Also no longer visible after the rebase 1:

Maybe replace “$ENTRY ($TYPE) $ENTRY is the …” with “$ENTRY ($TYPE) The …”.

@wking
Copy link
Contributor

wking commented Aug 5, 2015

On Wed, Aug 05, 2015 at 01:27:58PM -0700, Michael Crosby wrote:

@wking because I don't want alot of files

Then why not squash state.go into spec.go? I think “we don't separate
the spec from the config” makes sense, and “we do separate the spec
from the config” makes sense, but I'd rather avoid “we sometimes
separate the spec from the config”.

@@ -78,3 +78,13 @@ type Hook struct {
Args []string `json:"args"`
Env []string `json:"env"`
}

// State holds information about the runtime state of the container
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We introduce the Spec entry with:

Spec is the base configuration for the container. It specifies platform independent configuration.

Since the State type is the root of a new JSON structure, independent of Spec, we may want to echo that here with something like:

State is the base type for the container's runtime state. It specifies platform independent state.

Or something like that to help folks avoid confusing the spec types and the state types.

@crosbymichael
Copy link
Member Author

Yes, i moved state into the spec.go file now


* **namespaces** Paths to the Linux namespaces setup for the container.
* **cgroups** Paths to the container's cgroups.
* **externalFds** Paths to the container's open file descriptors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the utility of this? Why can't it be derived from the pid?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Wed, Aug 05, 2015 at 03:38:33PM -0700, Brandon Philips wrote:

+Linux systems add some platform specific information to the state.
+
+* namespaces Paths to the Linux namespaces setup for the container.
+* cgroups Paths to the container's cgroups.
+* externalFds Paths to the container's open file descriptors.

what is the utility of this? Why can't it be derived from the pid?

+1 to dropping these (previous discussion in [1,2,3,4]).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@avagin @boucher Can you comment on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@philips @crosbymichael @avagin
For checkpoint restore, we need to pass the external pathnames to CRIU via --ext-mount-map.

Docker containers, for example, currently have two sets of external paths: (1) statically named external bind mounts /etc/hostname, /etc/hosts, and /etc/resolv.conf which are in config.json as HostnamePath, HostsPath, and ResolvConfPath (2) dynamically named external volumes (which may or may not exist for a container) and are listed under Volumes.

In short, for checkpoint restore via CRIU, the runtime state configuration has to have all external pathnames.

This adds runtime state information for oci container's so that it can
be persisted and used by external tools.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
@crosbymichael
Copy link
Member Author

Updated

@wking
Copy link
Contributor

wking commented Sep 2, 2015

On Wed, Sep 02, 2015 at 11:16:05AM -0700, Michael Crosby wrote:

Updated

With a79d565→180df9d, LinuxState and ExternalFds have been punted.
But I think we still need resolution on:

  • whether we need to record the path to the bundle root [1,2], and if
    so, what it might be used for.
  • whether we need the /containers/ namespacing for
    /run/oci/containers/<containerID>/state.json, and if so, under
    what conditions would an application chose something other than
    /containers/ 3.
  • whether it makes sense to put State in config.go (part of the Spec
    tree) and LinuxStateDirectory in runtime_config_linux.go (part of
    the RuntimeSpec tree). I'd still prefer a separate state.go and
    state_linux.go, but if you want to avoid adding more files, I'd
    shift State to runtime_config.go 4.

All the other changes I'm interested in are wording changes that don't
impact the semantics, so I'm happy to PR them separately after this
lands.

@mrunalp
Copy link
Contributor

mrunalp commented Sep 3, 2015

LGTM ping @vbatts @philips

@vbatts
Copy link
Member

vbatts commented Sep 3, 2015

LGTM

vbatts added a commit that referenced this pull request Sep 3, 2015
Add runtime state configuration and structs
@vbatts vbatts merged commit 3b330ad into opencontainers:master Sep 3, 2015
wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Sep 3, 2015
Michael was adding these to existing Go files to avoid having a lot of
files [1].  But 7232e4b (specs: introduce the concept of a
runtime.json, 2015-07-30, opencontainers#88) split the runtime-configuration into
two file sets (one rooted in config.go for config.json and another
rooted in runtime_config.go for runtime.json).  In the face of a
two-set split for a single task (feeding the runtime while it
manipulates the container lifecycle), it seems odd to push the type
definition for a completely different task (sharing
container/application state with other tools) into an existing file
set.

[1]: opencontainers#87 (comment)

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Sep 3, 2015
The version field was added while 180df9d (Add runtime state
configuration and structs, 2015-07-29, opencontainers#87) was in-flight [1], and it
missed getting documented in the example.

[1]: opencontainers#87 (comment)

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Sep 3, 2015
Allow the runtime to use it's own scheme, but let the caller use UUIDs
if they want.  Jonathan asked for clarification as part of opencontainers#87, but
didn't suggest a particular approach [1].  When we discussed it in the
2015-08-26 meeting [2], the consensus was to just allow everything.
With container IDs like 'a/b/c' leading to state entries like
'/var/oci/containers/a/b/c/state.json'.  But that could get ugly with
container IDs that contain '../' etc.  And perhaps there are some
filesystems out there that cannot represent ASCII characters
(actually, I'm not even sure what charset our JSON is in ;).  I'd
rather pick this minimal charset which can handle UUIDs, and make life
easy for runtime implementers and safe for bundle consumers at a
slight cost of flexibility for bundle-authors.

[1]: opencontainers#87 (comment)
[2]: https://github.com/opencontainers/specs/wiki/Meeting-Minutes-2015-08-26

Reported-by: Jonathan Boulle <jonathanboulle@gmail.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Sep 3, 2015
Allow the runtime to use it's own scheme, but let the caller use UUIDs
if they want.  Jonathan asked for clarification as part of opencontainers#87, but
didn't suggest a particular approach [1].  When we discussed it in the
2015-08-26 meeting [2], the consensus was to just allow everything.
With container IDs like 'a/b/c' leading to state entries like
'/var/oci/containers/a/b/c/state.json'.  But that could get ugly with
container IDs that contain '../' etc.  And perhaps there are some
filesystems out there that cannot represent non-ASCII characters
(actually, I'm not even sure what charset our JSON is in ;).  I'd
rather pick this minimal charset which can handle UUIDs, and make life
easy for runtime implementers and safe for bundle consumers at a
slight cost of flexibility for bundle-authors.

[1]: opencontainers#87 (comment)
[2]: https://github.com/opencontainers/specs/wiki/Meeting-Minutes-2015-08-26

Reported-by: Jonathan Boulle <jonathanboulle@gmail.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Sep 14, 2015
Allow the runtime to use it's own scheme, but let the caller use UUIDs
if they want.  Jonathan asked for clarification as part of opencontainers#87, but
didn't suggest a particular approach [1].  When we discussed it in the
2015-08-26 meeting [2], the consensus was to just allow everything.
With container IDs like 'a/b/c' leading to state entries like
'/var/oci/containers/a/b/c/state.json'.  But that could get ugly with
container IDs that contain '../' etc.  And perhaps there are some
filesystems out there that cannot represent non-ASCII characters
(actually, I'm not even sure what charset our JSON is in ;).  I'd
rather pick this minimal charset which can handle UUIDs, and make life
easy for runtime implementers and safe for bundle consumers at a
slight cost of flexibility for bundle-authors.

There was some confusion on the list about what "ASCII letters" meant
[3], so I've explicitly listed the allowed character ranges.  Here's a
Python 3 script that shows the associated Unicode logic:

  import unicodedata

  # http://www.unicode.org/reports/tr44/tr44-4.html#GC_Values_Table
  category = {
    'Ll': 'lowercase letter',
    'Lu': 'uppercase letter',
    'Nd': 'decimal number',
    'Pd': 'dash punctuation',
  }

  for i in range(1<<7):
      char = chr(i)
      abbr = unicodedata.category(char)
      if abbr[0] in ['L', 'N'] or abbr == 'Pd':
          cat = category[abbr]
          print('{:02x} {} {}'.format(i, char, cat))

[1]: opencontainers#87 (comment)
[2]: https://github.com/opencontainers/specs/wiki/Meeting-Minutes-2015-08-26
[3]: https://groups.google.com/a/opencontainers.org/d/msg/dev/P9gZBYhiqDE/-ptpOcQ5FwAJ
     Message-Id: <7ec9cff6-c1a6-4beb-82de-16eb412bf2f8@opencontainers.org>

Reported-by: Jonathan Boulle <jonathanboulle@gmail.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
@crosbymichael crosbymichael deleted the state branch October 5, 2015 18:50
wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Oct 5, 2015
The version field was added while 180df9d (Add runtime state
configuration and structs, 2015-07-29, opencontainers#87) was in-flight [1], and it
missed getting documented in the example.

[1]: opencontainers#87 (comment)

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Dec 1, 2015
The version field was added while 180df9d (Add runtime state
configuration and structs, 2015-07-29, opencontainers#87) was in-flight [1], and it
missed getting documented in the example.

[1]: opencontainers#87 (comment)

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/oci-command-line-api that referenced this pull request Dec 17, 2015
This gives us an easy way to share state JSON (because sometimes the
PID is insufficient, e.g. on Linux you may need externalFds for
efficient checkpointing [1,2]) [3].

Possible alternatives for transmitting state information, and why I
feel this approach is superior:

* Writing a state file from a pre-start hook [4].  This is pretty
  close to the --state option, but the --state option allows callers
  to recover the JSON via a named pipe like /dev/fd/3.  That sort of
  direct connection would be trickier to setup with a hook-based
  approach.  Pre-start and post-stop hooks may still be the best
  solution for (de)registering a container with a monitoring service,
  but that's not quite the same application.

* Writing a state file to a global directory [5].  Atomic changes,
  change-monitoring, garbage collection, and ownership/access issues
  on a global directory are all hard to get right (or have ambiguous
  values of "right") [6].

* Requiring runtimes to maintain an internal registry of containers
  they launch [7].  This gives runtimes more flexibility than having a
  single, global directory.  But ownership/access issues are still
  difficult (if one unprivileged user registers a container, can
  another unprivileged user see that entry?  What elevated permissions
  would you need to see that entry?  To remove that entry?).  And the
  easiest way to get atomic changes and garbage-collection is by
  registering with a daemon, while not requiring a daemon is currently
  the #1 feature listed on [8].

In the event that any of those arguments seem leaky, callers that
prefer a different approach can easily use hooks (without setting
--state) or write wrappers that use a named pipe approach like
(--state /dev/fd/3) to collect the JSON and then write it to their
preferred registry.  So the --state approach seems easy for the
runtime to implement reliably, and also compatible with any of the
suggested alternatives.  The converse is not true; requiring a write
to a global or per-runtime registry is not compatible with use-cases
that prefer the anonymity of not writing the state at all (which is
possible just by leaving off the --state option).

[1]: opencontainers/runtime-spec#87 (comment)
[2]: https://groups.google.com/a/opencontainers.org/d/msg/dev/z25xQsF3pHA/ixyeTrxyFwAJ
     Subject: Re: Drop /run/opencontainer entirely?
     Date: Fri, 4 Sep 2015 21:30:24 -0700
     Message-ID: <20150905043024.GB25638@odin.tremily.us>
[3]: https://groups.google.com/a/opencontainers.org/d/msg/dev/q6TYqVZOcX8/W1RVyCXCCQAJ
     Subject: Re: removal of /run/opencontainer/containers, add --state?
     Date: Tue, 8 Dec 2015 15:49:57 -0800
     Message-ID: <20151208234957.GK2767@odin.tremily.us>
[4]: https://groups.google.com/a/opencontainers.org/d/msg/dev/q6TYqVZOcX8/GQs0zkRHBwAJ
     Subject: Re: removal of /run/opencontainer/containers
     Date: Mon, 30 Nov 2015 13:55:40 -0800
     Message-ID: <20151130215540.GF23595@odin.tremily.us>
[5]: https://github.com/opencontainers/specs/blob/v0.1.1/runtime.md#state
[6]: https://groups.google.com/a/opencontainers.org/d/msg/dev/q6TYqVZOcX8/JRm4auylBQAJ
     Subject: removal of /run/opencontainer/containers
     Date: Wed, 25 Nov 2015 14:29:35 +0000
     Message-ID: <CAD2oYtNipt3i_C6=J4Bc-jwauo5YAvKXUqTROnPNP3vZ9+C5Vw@mail.gmail.com>
[7]: https://groups.google.com/a/opencontainers.org/d/msg/dev/q6TYqVZOcX8/wHYucS-rBQAJ
     Subject: Re: removal of /run/opencontainer/containers
     Date: Wed, 25 Nov 2015 08:06:10 -0800
     Message-ID: <CANEZBD7VCWj6w5dFAEbULrL6WsY=FSRhVsWFreYXUOHwsUJkwA@mail.gmail.com>
[8]: http://runc.io/
     Embeddable
     Containers are started as a child process of runC and can be
     embedded into various other systems without having to run a
     Docker daemon.

Signed-off-by: W. Trevor King <wking@tremily.us>
Mashimiao pushed a commit to Mashimiao/specs that referenced this pull request Aug 19, 2016
The version field was added while 180df9d (Add runtime state
configuration and structs, 2015-07-29, opencontainers#87) was in-flight [1], and it
missed getting documented in the example.

[1]: opencontainers#87 (comment)

Signed-off-by: W. Trevor King <wking@tremily.us>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants