Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split create and start #384

Merged
merged 1 commit into from
May 26, 2016
Merged

Split create and start #384

merged 1 commit into from
May 26, 2016

Conversation

duglin
Copy link
Contributor

@duglin duglin commented Apr 13, 2016

This PR does the following:

  • splits start into create and start. Does not define run - that's an impl choice.

Lifecycle flow after with this PR:

  • create() is invoked to setup the container's NSs, cgroups, etc., but the user process is not yet started. The PID NS is created but paused waiting for the start() operation to be invoked. Control is returned to the caller.
  • At this point any additional processing (e.g. pre-start hooks) can be invoked by the management/orchestration tooling.
  • start() is invoked. The config.json's process property is used to start the user-defined process that is to be executed in the PID NS. Whether control is returned to the caller or not will depend on whether a --detach type of option was specified - so this is runtime dependent.
  • The user-defined process will eventually stop - either by exiting or via some external action (like the kill() operation being invoked).
  • All of the container resources (NS, cgroups, etc.) created for this container are deleted. Note that any resources used by the container but not created during the create() phase are not deleted as they are owned by some other container.

Motivating usecases:

  • to simplify the flows/interaction patterns by removing hooks, but still allow for the same functionality
  • allow for the container's resources to be created prior to defining the user-defined process being defined/run
  • allow a single orchestrator process to set up the container before the user process starts. Avoid needing to persist potentially private information to disk to make it available to args/env vars of hooks since now the hooks can be within the orchestrator, thus have access to its in-memory state.
  • allow for a cleaner "sandbox" model where one container (and its resources) are created for the purpose of then being used by other (subordinate) containers.

Signed-off-by: Doug Davis dug@us.ibm.com

@@ -118,12 +125,7 @@ Example:
"cwd": "...",
}
```
This specification does not manadate the name of this JSON file.
This specification does not mandate the name of this JSON file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was fixed in #353, so you probably want to rebase onto master.

@mrunalp
Copy link
Contributor

mrunalp commented Apr 13, 2016

We need an action that replaces the start today, something like run as discussed in the call.

@duglin
Copy link
Contributor Author

duglin commented Apr 14, 2016

PR has been updated to include a “run” operation - defined as create() followed by start()

-Doug

On Apr 13, 2016, at 7:09 PM, Mrunal Patel notifications@github.com wrote:

We need an action that replaces the start today, something like run as discussed in the call.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub #384 (comment)


This operation MUST generate an error if it is not provided the container ID.
This operation MUST invoke the `create` operation, and if there are no errors, followed by the `start` operation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "MUST invoke the create operation and then immediately start the process named in config.json. The runtime is not required to pause between creating the container and executing the requested process." (To clarify the intent that run need not actually perform the socket/cgroup/other-implementation steps implied by create/start)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to avoid duplicating what start() says because then we need to keep them in-sync.
Let's try to reword it to address you're concern....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. My main concern is create explicitly says "MUST NOT execute the users process at this time" so I think may be worth explicitly weakening that language in run. Maybe sthg like "..MUST invoke the create operation, followed by the start operation. The implementation MAY immediately execute start without any pause after create completes. For example if the implementation used a temporary process to implement create, it need not do this in the implementation of 'run'."?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that - thanks - updated
-Doug

On Apr 14, 2016, at 9:59 AM, Julian Friedman notifications@github.com wrote:

In runtime.md #384 (comment):

This operation MUST generate an error if it is not provided the container ID.
+This operation MUST invoke the create operation, and if there are no errors, followed by the start operation.
Maybe "MUST invoke the create operation and then immediately start the process named in config.json. The runtime is not required to pause between creating the container and executing the requested process." (To clarify the intent that run need not actually perform the socket/cgroup/other-implementation steps implied by create/start)


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub https://github.com/opencontainers/runtime-spec/pull/384/files/6d80e4687703d9dae5f669b48b5072f61dc3cd9f#r59683493

Using the data in [`config.json`](config.md), that is in the root of the bundle's directory, this operation MUST create a new container.
This includes creating, or entering, the namespaces specified in the [`config.json`](config.md), resource limits, etc and configuring the appropriate capabilities for the container.
If the `config.json` specifies that a PID namespace is to be created then one MUST be created, but the user-specified code within that namespace MUST NOT be created at this time.
In some implementations this means that a temporary process is created in the PID namespace but it pauses until the `start` operation is invoked before replacing the process with the user-specified code.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can the process be pauses/unpauses ( freeze / spin lock ) ? By specifying that PID ns must be created, i think that it's very hard to implementation to do this in a sane way.

I think its clearer to look at create as a step to prepare the necessary environment for an application to start succesfully (vs sandbox which is the full isolated environment for the application). For me resources means cgroups , uid/gid, network, device, filesystems etc. but doesn't include the pid namespace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Thu, Apr 14, 2016 at 03:26:20AM -0700, Daniel, Dao Quang Minh wrote:

+In some implementations this means that a temporary process is
created in the PID namespace but it pauses until the start
operation is invoked before replacing the process with the
user-specified code.

How can the process be pauses/unpauses ( freeze / spin lock ) ? By
specifying that PID ns must be created, i think that it's very hard
to implementation to do this in a sane way.

The container process is probably blocking on a Unix socket, because
it needs to receive the ‘process’ config from the ‘start’ call (which
may have been adjusted since the ‘create’ call).

It's not clear to me from the current wording whether the host-process
from ‘create’ or the ‘start’ process provides the standard streams for
the container process. I'd guess the container process would inherit
and use the ‘create’ process's standard streams until a ‘start’ call,
and then dup over to the ‘start’ call's standard streams before
executing the user-specified ‘process’ received from ‘start’. That
means you'd need a Unix socket to transfer the file descriptors (and
potentially also a pseudoterminal master/slave depending on
‘process.terminal’).

I think its clearer to look at create as a step to prepare the
necessary environment for an application to start succesfully (vs
sandbox which is the full isolated environment for the
application). For me resources means cgroups , uid/gid, network,
device, filesystems etc. but doesn't include the pid namespace.

That's also true for @vishh 1, but as @julz points out, not true in
general [2,3]. And you need ‘create’-time container process if you
need the PID before calling ‘start’ (use-case (e) in 4).

 Subject: Re: Splitting Start into create() and run()
 Date: Thu, 24 Mar 2016 15:04:14 -0700
 Message-ID: <20160324220414.GC23066@odin.tremily.us>

@wking
Copy link
Contributor

wking commented Apr 14, 2016

I've left some comments inline, but my main concern at the moment is
specifying standard-stream, ‘process.terminal’, and exit-code handling
(see 1). I'd like to have that cleared up (probably rolling
runtime-linux.md into runtime.md, editing, and attaching it to a
particular operation). The main issue is whether the host-side
process from ‘create’ or the process from ‘start’ is going to supply
the standard streams and other file descriptors for the container
process. Whichever process supplies those file descriptors should
also be the one waiting on and returning the container process' exit
code.

My personal preference would be to have:

  • The runtime-specified container-process code not use its standard
    streams at all.
  • The host-side process from ‘create’ not wait on the container
    process's exit code.
  • The ‘start’ process supply standard streams and other file
    descriptors for the container process via a Unix socket.
  • The ‘start’ process wait on the container process, and exit with its
    exit code.

That lets us use the ‘create’ return to signal when a container
process is up and ready for a ‘start’ invocation (vs. polling the
state registry?).

‘process.terminal’ is currently not all that clear to me 2, so I'm
fine if that is still unclear after this PR lands ;).

 Subject: More detail for process.terminal?
 Date: Thu, 14 Jan 2016 14:44:08 -0800
 Message-ID: <20160114224408.GM6362@odin.tremily.us>

6. Additional actions such as pausing the container, resuming the container or signaling the container MAY be performed using the runtime interface.
1. OCI compliant runtime's `create` command is invoked with a reference to the location of the bundle and a unique identifier.
How these references are passed to the runtime is an implementation detail.
2. The container's runtime environment (namespaces, mounts, etc.) MUST be created according to the configuration in [`config.json`](config.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It's worth detailing what the sandbox or environment MUST contain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Thu, Apr 14, 2016 at 01:46:32PM -0700, Vish Kannan wrote:

+2. The container's runtime environment (namespaces, mounts, etc.)
MUST be created according to the configuration in
config.json.

nit: It's worth detailing what the sandbox or environment
MUST contain.

My preference here is “everything that's not in ‘process’, while also
moving rlimits back out of ‘process’ 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be invalid to create a container that just entered the namespaces from other container and never created its own? If that's ok, does it actually "contain" anything? Just not sure what text I could add here.

I guess this is like @julz's comment: https://github.com/opencontainers/runtime-spec/pull/384/files/5813d96a74b5e753851fdad11ce4731ac7be0576#r60785563

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Fri, Apr 22, 2016 at 12:12:53PM -0700, Doug Davis wrote:

Would it be invalid to create a container that just entered the
namespaces from other container and never created its own?

I hope not, because it would break ‘exec’ (see #391 for emulating
‘exec’ using ‘start’).

If that's ok, does it actually "contain" anything? Just not sure
what text I could add here.

I'm fine if we still want to refer to it as a “container”, even though
it's the degenerate case where that process doesn't own any of the
sandbox walls. But see also “does this really need a container ID”
thoughts in #391.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wking can you elaborate on how it would break exec()? I would think that it would be transitive and trying to exec a container that just used another's NSs would be the equivalent of exec'ing that "used" container.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Fri, Apr 22, 2016 at 12:31:27PM -0700, Doug Davis:
@wking can you elaborate on how it would break exec()?”

Spun off to 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the above, @vishh what would you like for us to say that a sandbox (which I don't think is defined yet) "contains" ? Seems to me it could be, sort of, empty and just use NSs from other containers.

@vishh
Copy link
Contributor

vishh commented Apr 14, 2016

@duglin Completed a review pass. Thanks for pushing this forward!!

@wking
Copy link
Contributor

wking commented Apr 23, 2016

On Thu, Apr 14, 2016 at 11:11:06AM -0700, W. Trevor King:
“my main concern at the moment is specifying standard-stream,
‘process.terminal’, and exit-code handling”.

The fact that waitpid(2) and friends only work for child processes
hadn't sunk in. That means that the host process from ‘create’ has to
stick around and return the container exit code if we want to return
that exit code to a caller who is waiting on a direct child.

This means that we need a different way than “the initial ‘create’
process has exited” to notify interested parties that the container
process is start-able (e.g., so you can run the stuff you used to put
in pre-start hooks). The notification issue has come up before in the
context of exec [1], with @julz and @crosbymichael talking about event
file descriptors and such. This may be enough of a reason to keep
pre-start hooks around (although it would probably be good to rename
them post-create commands, since they run after the ‘create’ process
finishes the setup, but possibly well before a ‘start’ command is
called, if ever).

If keeping the ‘create’ host process around is unacceptable, the
caller would have to get the container PID from the ‘create’ call and
wait on the container process directly. It may be possible to do that
robustly, but it sounds like a lot of work.

I don't see any technical roadblocks for either ‘create’ or ‘start’
supplying the standard streams and terminal handling, but I'd expect
that they should stay with the process handling the exit code for
simplicity.

[1]: From #opencontainers on 2015-09-15 (before we had online logs).

 11:23 < julz> btw just a random thing, I wish there were a way we
    could have the runtime tell me when it has successfully
   launched the process

 and through the next five minutes or so.

@duglin
Copy link
Contributor Author

duglin commented Apr 25, 2016

Another round of edits have been made.

@wking
Copy link
Contributor

wking commented Apr 25, 2016

On Mon, Apr 25, 2016 at 07:15:50AM -0700, Doug Davis wrote:

Another round of edits have been made.

With 6ae7c16, the lifecycle section looks more like an overview of
how the various operations (create, start, …) fit together. Instead
of splitting requirements for those operations between the
per-operation sections and the lifecycle section, I'd rather collect
all the requirements in the per-operation section. The remaining
lifecycle section would be an informative introduction, linking out to
the per-operation sections for details. Something like:

  1. The runtime's create command is invoked to create the
    container.

    The following MAY happen in any order:

    • The runtime's state operation MAY be used to
      fetch the container's state JSON.
    • The runtime's start command MAY be invoked to run the
      user-specified code in the container process.
    • The runtime's stop operation MAY be used to stop the
      container process.
    • Additional actions such as pausing the container, resuming the
      container or signaling the container MAY be performed.
    • The container MAY error out, exit, or crash.

    After the container process exits...

  2. Runtime's delete command MAY be invoked with the identifier of
    the container.

Although I'm still a fan of not requiring a delete command, and just
having the ‘create’ host process handle the cleanup automatically.

wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Nov 11, 2016
Since be59415 (Split create and start, 2016-04-01, opencontainers#384), it's
possible for a container process to never execute user-specified code
(e.g. you can call 'create', 'kill', 'delete' without calling
'start').  For folks who expect to do that, there's no reason to
define process.args.

The only other process property required for all platforms is 'cwd',
but the runtime's idler code isn't specified in sufficient detail for
the configuration author to have an opinion about what its working
directory should be.

On Linux and Solaris, 'user' is also required for 'uid' and 'gid'.  My
preferred approach here is to make those optional and define defaults
[1,2]:

  If unset, the runtime will not attempt to manipulate the user ID
  (e.g. not calling setuid(2) or similar).

But the maintainer consensus is that they want those to be explicitly
required properties [3,4,5].  With the current spec, one option could
be to make process optional (with the idler's working directory
unspecified) for OSes besides Linux and Solaris, but the main reason
that Windows doesn't have a user property is that we don't know how to
specify it [6].  I expect all platforms will have some sort of
required user field, which means they'll all have to define 'process'.

While I'm indenting the sub-properties, also wrap them to one line per
sentence (style.md).

[1]: opencontainers#417 (comment)
[2]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/DWdystx5X3A
     Subject: Exposing platform defaults
     Date: Thu, 14 Jan 2016 15:36:26 -0800
     Message-ID: <20160114233625.GN6362@odin.tremily.us>
[3]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-05-04-17.00.log.html#l-44
[4]: opencontainers#417 (comment)
[5]: opencontainers#417 (comment)
[6]: opencontainers#96 (comment)

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Nov 11, 2016
Since be59415 (Split create and start, 2016-04-01, opencontainers#384), it's
possible for a container process to never execute user-specified code
(e.g. you can call 'create', 'kill', 'delete' without calling
'start').  For folks who expect to do that, there's no reason to
define process.args.

The only other process property required for all platforms is 'cwd',
but the runtime's idler code isn't specified in sufficient detail for
the configuration author to have an opinion about what its working
directory should be.

On Linux and Solaris, 'user' is also required for 'uid' and 'gid'.  My
preferred approach here is to make those optional and define defaults
[1,2]:

  If unset, the runtime will not attempt to manipulate the user ID
  (e.g. not calling setuid(2) or similar).

But the maintainer consensus is that they want those to be explicitly
required properties [3,4,5].  With the current spec, one option could
be to make process optional (with the idler's working directory
unspecified) for OSes besides Linux and Solaris.  On Windows, username
is optional, but it's not clear how intentional that was [6].

[1]: opencontainers#417 (comment)
[2]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/DWdystx5X3A
     Subject: Exposing platform defaults
     Date: Thu, 14 Jan 2016 15:36:26 -0800
     Message-ID: <20160114233625.GN6362@odin.tremily.us>
[3]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-05-04-17.00.log.html#l-44
[4]: opencontainers#417 (comment)
[5]: opencontainers#417 (comment)
[6]: opencontainers#618

Signed-off-by: W. Trevor King <wking@tremily.us>
wking pushed a commit to wking/opencontainer-runtime-spec that referenced this pull request Nov 28, 2016
# Commands

## create

The --bundle [start-pr-bundle] and --pid-file options and ID argument
[runc-start-id] match runC's interface.

One benefit of the early-exit 'create' is that the exit code does not
conflate container process exits with "failed to setup the sandbox"
exits.  We can take advantage of that and use non-zero 'create' exits
to allow stderr writing (so the runtime can log errors while dying
without having to successfully connect to syslog or some such).
Trevor still likes the long-running 'create' API because it makes
collecting the exit code easier, see the entry under rejected-for-now
avenues at the end of this commit message.

### --pid-file

You can get the PID by calling 'state' [container-pid-from-state], and
container PIDs may not be portable [container-pid-not-portable].  But
it's a common way for interfacing with init systems like systemd
[systemd-pid], and for this first pass at the command line API folks
are ok with some Linux-centrism [linux-centric].

### Document LISTEN_FDS for passing open file descriptors

This landed in runC with [runc-listen-fds], but the bundle-author <->
runtime specs explicitly avoided talking about how this is set (since
the bundle-author didn't care about the runtime-caller <-> runtime
interface) [runtime-spec-caller-api-agnostic].  This commit steps away
from that agnosticism.

Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out,
since he doesn't see how the runtime-caller would choose anything
other than 1 for its value.  It seems like something that a process
would have to set for itself (because guessing the PID of a child
before spawning it seems racy ;).  In any event, the runC
implementation seems to set this to 1 regardless of what systemd
passes to it [listen-fds-description].

We've borrowed Shishir's wording for the example
[listen-fds-description].

## state [state-pr]

Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand
on the definition of our ops, 2015-10-13,
opencontainers#225, v0.4.0).  The state example is
adapted from runtime.md, but we defer the actual specification of the
JSON to that file.

The encoding for the output JSON (and all standard-stream activity) is
covered by the "Character encodings" section.  In cases where the
runtime ignores the SHOULD (still technically compliant), RFC 7159
makes encoding detection reasonably straightforward [rfc7159-s8.1].
The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although
these were dropped in RFC 7518 [rfc7518-aA], probably as a result of
removing the constraint that "JSON text" be an object or array
[rfc7518-aA]).  The hints should still apply to the state output,
because we know it will be an object.  If that ends up being too dicey
and we want to certify runtimes that do not respect their
operating-system conventions, we can add an --encoding option later.

## kill

Partially catch up with opencontainers/runtime-spec@be594153 (Split
create and start, 2016-04-01, opencontainers#384).  The
interface is based on POSIX [posix-kill], util-linux
[util-linux-kill], and GNU coreutils [coreutils-kill].

The TERM/KILL requirement is a minimum portability requirement for
soft/hard stops.  Windows lacks POSIX signals [windows-signals], and
currently supports soft stops in Docker with whatever is behind
hcsshim.ShutdownComputeSystem [docker-hcsshim].  The docs we're
landing here explicitly allow that sort of substitution, because we
need to have soft/hard stop on those platforms but *can't* use POSIX
signals.  They borrow wording from
opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for
platform.os and .arch, 2016-05-19, opencontainers#441) to
recommend runtime authors document the alternative technology so
bundle-authors can prepare (e.g. by installing the equivalent to a
SIGTERM signal handler).

# Command style

Use imperative phrasing for command summaries, to follow the practice
recommended by Python's PEP 257 [pep257-docstring]:

  The docstring is a phrase ending in a period. It prescribes the
  function or method's effect as a command ("Do this", "Return that"),
  not as a description; e.g. don't write "Returns the pathname ...".

The commands have the following layout:

  ### {command name}

  {one-line description}

  * *Options:* ...
  ...
  * *Exit code:* ...

  {additional notes}

  #### Example

  {example}

The four-space list indents follow opencontainers/runtime-spec@7795661
(runtime.md: Fix sub-bullet indentation, 2016-06-08,
opencontainers#495).  From [markdown-syntax]:

  List items may consist of multiple paragraphs.  Each subsequent
  paragraph in a list item must be indented by either 4 spaces or one
  tab...

Trevor expects that's intended to be read with "block element" instead
of "paragraph", in which case it applies to nested lists too.

And while GitHub supports two-space indents [github-lists]:

  You can create nested lists by indenting lines by two spaces.

it seems that pandoc does not.

# Versioning

The command-line interface is largely orthogonal to the config format,
and config authors and runtime callers may be entirely different sets
of people.  Zhang Wei called for more explicit versioning for the CLI
[interface-versioning], and the approach taken here follows the
approach taken by Python's email package [python-email-version].

Wedging multiple, independently versioned entities into a single
repository can be awkward, but earlier proposals to put the CLI in its
own repository [separate-repository-proposed] were unsuccessful
because compliance testing requires both a CLI and a config
specification [separate-repository-refused].  Trevor doesn't think
that's a solid reason [separate-repository-refusal-rebutted], but
discussion along that line stalled out, so the approach taken here is
to keep both independently versioned entities in the same repository.

# Global options

This section is intended to allow runtimes to extend the command line
API with additional options and commands as they see fit without
interfering with the commands and options specified in this document.
The last line in this section makes it explicit that any later
specification (e.g. "MUST print the state JSON to its stdout") do not
apply to cases where the caller has included an unspecified option or
command (e.g. --format=protobuf).  For extensive discussion on this
point see [extensions-unspecified].

With regard to the statement "Command names MUST NOT start with
hyphens", the rationale behind this decision is to distinguish
unrecognized commands from unrecognized options
[distinguish-unrecognized-commands] because we want to allow (but not
require) runtimes to fail fast when faced with an unrecognized
command [optional-fail-fast].

# Long options

Use GNU-style long options to avoid ambiguous, one-character options
in the spec, while still allowing the runtime to support one-character
options with packing.  We don't specify one-character options in this
spec, because portable callers can use the long form, and not
specifying short forms leaves runtimes free to assign those as they
see fit.

# Character encodings

Punt to the operating system for character encodings.  Without this,
the character set for the state JSON or other command output seemed
too ambiguous.

Trevor wishes there were cleaner references for the
{language}.{encoding} locales like en_US.UTF-8 and UTF-8.  But
[wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't
find a more targetted UTF-8 link than just dropping folks into a
Unicode chapter (which is what [wikipedia-utf-8] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

The TR35 link is for:

  In addition, POSIX locales may also specify the character encoding,
  which requires the data to be transformed into that target encoding.

and the POSIX §6.2 link is for:

  In other locales, the presence, meaning, and representation of any
  additional characters are locale-specific.

# Standard streams

The "MUST NOT attempt to read from its stdin" means a generic caller
can safely exec the command with a closed or null stdin, and not have
to worry about the command blocking or crashing because of that.  The
stdout spec for start/delete is more lenient, because runtimes are
unlikely to change their behavior because they are unable to write to
stdout.  If this assumption proves troublesome, we may have to tighten
it up later.

Aleksa Sarai also raised concerns over the safety of potentially
giving the container process access to terminal ioctl escapes
[stdio-ioctl] and feels like the stdio file-descriptor pass-through is
surprising [stdio-surprise].

# Console socket protocol

Based on in-flight work by Aleksa in opencontainers/runc#1018, this
commit makes the following choices:

* SOCK_SEQPACKET instead of SOCK_STREAM, because this is a
  message-based protocol, so it seems more natural to use a
  message-oriented socket type.

* A string 'type' field for all messages, so we can add additional
  message types in the future without breaking backwards compatibility
  (new console-socket servers will still support old clients).  Aleksa
  favored splitting this identifier into an integer 'type' and
  'version' fields [runc-socket-type-version], but I don't see the
  point if they're both opaque integers without internal structure.
  And I expect this protocol to be stable enough that it's not worth
  involving SemVer and its structured versioning.

* Response messages, so the client can tell whether the request was
  received and processed successfully or not.  That gives the client a
  way to bail out if, for example, the server does not support the
  'terminal' message type.

* Add a sub-package specs-go/socket.  Even though there aren't many
  new types, these are fairly different from the rest of specs-go and
  that namespace was getting crowded.

# Event triggers

The "Callers MAY block..." wording is going to be hard to enforce, but
with the runC model, clients rely on the command exits to trigger
post-create and post-start activity.  The longer the runtime hangs
around after completing its action, the laggier those triggers will
be.

For an alternative event trigger approach, see the discussion of an
'event' command in the rejected-for-now avenues at the end of this
commit message.

# Lifecycle notes

These aren't documented in the current runtime-spec, and may no longer
be true.  But they were true at one point, and informed the
development of this specification.

## Process cleanup

On IRC on 2015-09-15 (with PDT timestamps):

  10:56 < crosbymichael> if the main process dies in the container,
    all other process are killed
  ...
  10:58 < julz> crosbymichael: I'm assuming what you mean is you kill
    everything in the cgroup -> everything in the container dies?
  10:58 < crosbymichael> julz: yes, that is how its implemented
  ...
  10:59 < crosbymichael> julz: we actually freeze first, send the
    KILL, then unfreeze so we don't have races

## Container IDs for namespace joiners

You can create a config that adds no isolation vs. the runtime
namespace or completely joins another set of existing namespaces.  It
seems odd to call that a new "container", but the ID is really more of
a process ID, and less of a container ID.  The "container" phrasing is
just a useful hint that there might be some isolation going on.  And
we're always creating a new "container process" with 'create'.

# Other changes

This commit also moves the file-descriptor docs from runtime-linux.md
into runtime.md and the command-line docs.  Both affect runtime
authors, but:

* The runtime.md entry is more useful for bundle authors than the old
  wording, because it gives them confidence that the runtime caller
  will have the power to set these up as they see fit (within POSIX's
  limits).  It is also API-agnostic, so bundle authors know they won't
  have to worry about which API will be used to launch the container
  before deciding whether it is safe to rely on runtime-caller
  file-descriptor control.

* The command line entry is more useful for runtime-callers than the
  old wording, because it tells you how to setup the file descriptors
  instead of just telling you that they MAY be setup.

I moved the bundle-author language from runtime-linux.md to runtime.md
because it's relying on POSIX primitives that aren't Linux-specific.

# Avenues pursued but rejected (for now)

* Early versions of this specification had 'start' taking '--config'
  and '--runtime', but this commit uses '--bundle' [start-pr-bundle].

  The single config file change [single-config-proposal] went through,
  but Trevor would also like to be able to pipe a config into the
  'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path)
  [runc-config-via-stdin], and he has a working example that supports
  this without difficulty [ccon-config-via-stdin].  But since
  [runc-bundle-option] landed on 2015-11-16, runC has replaced their
  --config-file and --runtime-file flags with --bundle, and the
  current goal of this API is "keeping as much similarity with the
  existing runC command-line as possible", not "makes sense to Trevor"
  ;).  It looks like runC was reacting [runc-required-config-file] to
  strict wording in the spec [runtime-spec-required-config-file], so
  we might be able to revisit this if/when we lift that restriction.

* Having 'start' (now 'create') take a --state option to write state
  to a file [start-pr-state].  This is my preferred approach to
  sharing container state, since it punts a persistent state registry
  to higher-level tooling [punt-state-registry].  But runtime-spec
  currently requires the runtime to maintain such a registry
  [state-registry], and we don't need two ways to do that ;).

  On systems like Solaris, the kernel maintains a registry of
  container IDs directly, so they don't need an external registry
  [solaris-kernel-state].

* Having 'start' (now 'create') take an --id option instead of a
  required ID argument, and requiring the runtime to generate a unique
  ID if the option was not set.  When there is a long-running host
  process waiting on the container process to perform cleanup, the
  runtime-caller may not need to know the container ID.  However, runC
  has been requiring a user-specified ID since [runc-start-id], and
  this spec follows the early-exit 'create' from [runc-create-start],
  so we require one here.  We can revisit this if we regain a
  long-running 'create' process.

* Having 'create' take a '--console-socket PATH' option (required when
  process.terminal is true) with a path to a SOCK_SEQPACKET Unix
  socket for use with the console-socket protocol.  The current
  'LISTEN_FDS + 3' approach was proposed by Michael Crosby
  [console-socket-fd], but Trevor doesn't have a clear idea of the
  motivation for the change and would have preferred '--console-socket
  FD'.

* Having a long-running 'create' process.  Trevor is not a big fan of
  this early-exit 'create', which requires platform-specific magic to
  collect the container process's exit code.  The ptrace idea in this
  commit is from Mrunal [mrunal-ptrace].

  Trevor has a proposal for an 'event' operation [event] which would
  provide a convenient created trigger.  With 'event' in place, we
  don't need the 'create' process exit to serve as that trigger, and
  could have a long-running 'create' that collects the container
  process exit code using the portable waitid() family.  But the
  consensus after this week's meeting was to table that while we land
  docs for the runC API [mimic-runc].

* Having a 'version' command to make it easy for a caller to report
  which runtime they're using.  But we don't have a use-case that
  makes it strictly necessary for interop, so we're leaving it out for
  now [no-version].

* Using 'sh' syntax highlighting [syntax-highlighting] for the fenced
  code blocks.  The 'sh' keyword comes from [linguist-languages].  But
  the new fenced code blocks are shell sessions, not scripts, and we
  don't want shell-syntax highlighting in the command output.

[ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration
[console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30
[container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376
  Subject: Add initial pass at a cmd line spec
[container-pid-not-portable]: opencontainers#459
  Subject: [ Runtime ] Allow for excluding pid from state
[coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html
[distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167
  Subject: Clarity for commands vs global options
[docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230
  moby/moby@bc503ca8 (Windows: [TP4] docker kill handling,
  2015-10-12, moby/moby#16997)
[event]: opencontainers#508
  Subject: runtime: Add an 'event' operation for subscribing to pushes
[extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56
[github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists
[interface-versioning]: opencontainers#513 (comment)
[linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml
[linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39
[listen-fds-description]: opencontainers/runc#231 (comment)
  Subject: Systemd integration with runc, for on-demand socket
    activation
[markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list
[mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15
[mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54
[no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75
[optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617
  Subject: Use RFC 2119's keywords (MUST, MAY, ...)
[pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings
[posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html
[punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79
[python-email-version]: https://docs.python.org/3/library/email.html#package-history
[rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3
[rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A
  RFC 7518 is currently identical to 7519.
[rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1
[runc-bundle-option]: opencontainers/runc#373
  Subject: adding support for --bundle
[runc-config-via-stdin]: opencontainers/runc#202
  Subject: Can runc take its configuration on stdin?
[runc-listen-fds]: opencontainers/runc#231
  Subject: Systemd integration with runc, for on-demand socket
    activation
[runc-required-config-file]: opencontainers/runc#310 (comment)
  Subject: specifying a spec file on cmd line?
[runc-socket-type-version]: opencontainers/runc#1018 (comment)
  Subject: Consoles, consoles, consoles.
[runc-start-id]: opencontainers/runc#541
  opencontainers/runc@a7278cad (Require container id as arg1,
  2016-02-08, opencontainers/runc#541)
[runtime-spec-caller-api-agnostic]: opencontainers#113 (comment)
  Subject: Add fd section for linux container process
[runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14
  106ec2d (Cleanup bundle.md, 2015-10-02,
  opencontainers#210)
[sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html
[separate-repository-proposed]: opencontainers#513 (comment)
[separate-repository-refused]: opencontainers#513 (comment)
[separate-repository-refusal-rebutted]: opencontainers#513 (comment)
[single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY
  Subject: Single, unified config file (i.e. rolling back specs#88)
  Date: Wed, 4 Nov 2015 09:53:20 -0800
  Message-ID: <20151104175320.GC24652@odin.tremily.us>
[solaris-kernel-state]: wking/oci-command-line-api#3 (comment)
  Subject: Drop exec, pause, resume, and signal
[start-pr-bundle]: wking/oci-command-line-api#11
  Subject: start: Change --config and --runtime to --bundle
[start-pr-state]: wking/oci-command-line-api#14
  Subject: start: Add a --state option
[state-pr]: wking/oci-command-line-api#16
  Subject: runtime: Add a 'state' command
[state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61
  7117ede (Expand on the definition of our ops, 2015-10-13,
  opencontainers#225)
[stdio-ioctl]: opencontainers#513 (comment)
[stdio-surprise]: opencontainers#513 (comment)
[syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting
[systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69
[util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html
[wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8
[wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms
[windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE
  Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356)
  Date: Thu, 26 May 2016 11:03:29 -0700
  Message-ID: <20160526180329.GL17496@odin.tremily.us>

Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Hopefully-Signed-off-by: Mike Brown <brownwm@us.ibm.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
Reviewed-by: Jesse Butler <jeeves.butler@gmail.com>
wking pushed a commit to wking/opencontainer-runtime-spec that referenced this pull request Nov 28, 2016
# Commands

## create

The --bundle [start-pr-bundle] and --pid-file options and ID argument
[runc-start-id] match runC's interface.

One benefit of the early-exit 'create' is that the exit code does not
conflate container process exits with "failed to setup the sandbox"
exits.  We can take advantage of that and use non-zero 'create' exits
to allow stderr writing (so the runtime can log errors while dying
without having to successfully connect to syslog or some such).
Trevor still likes the long-running 'create' API because it makes
collecting the exit code easier, see the entry under rejected-for-now
avenues at the end of this commit message.

### --pid-file

You can get the PID by calling 'state' [container-pid-from-state], and
container PIDs may not be portable [container-pid-not-portable].  But
it's a common way for interfacing with init systems like systemd
[systemd-pid], and for this first pass at the command line API folks
are ok with some Linux-centrism [linux-centric].

### Document LISTEN_FDS for passing open file descriptors

This landed in runC with [runc-listen-fds], but the bundle-author <->
runtime specs explicitly avoided talking about how this is set (since
the bundle-author didn't care about the runtime-caller <-> runtime
interface) [runtime-spec-caller-api-agnostic].  This commit steps away
from that agnosticism.

Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out,
since he doesn't see how the runtime-caller would choose anything
other than 1 for its value.  It seems like something that a process
would have to set for itself (because guessing the PID of a child
before spawning it seems racy ;).  In any event, the runC
implementation seems to set this to 1 regardless of what systemd
passes to it [listen-fds-description].

We've borrowed Shishir's wording for the example
[listen-fds-description].

## state [state-pr]

Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand
on the definition of our ops, 2015-10-13,
opencontainers#225, v0.4.0).  The state example is
adapted from runtime.md, but we defer the actual specification of the
JSON to that file.

The encoding for the output JSON (and all standard-stream activity) is
covered by the "Character encodings" section.  In cases where the
runtime ignores the SHOULD (still technically compliant), RFC 7159
makes encoding detection reasonably straightforward [rfc7159-s8.1].
The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although
these were dropped in RFC 7518 [rfc7518-aA], probably as a result of
removing the constraint that "JSON text" be an object or array
[rfc7518-aA]).  The hints should still apply to the state output,
because we know it will be an object.  If that ends up being too dicey
and we want to certify runtimes that do not respect their
operating-system conventions, we can add an --encoding option later.

## kill

Partially catch up with opencontainers/runtime-spec@be594153 (Split
create and start, 2016-04-01, opencontainers#384).  The
interface is based on POSIX [posix-kill], util-linux
[util-linux-kill], and GNU coreutils [coreutils-kill].

The TERM/KILL requirement is a minimum portability requirement for
soft/hard stops.  Windows lacks POSIX signals [windows-signals], and
currently supports soft stops in Docker with whatever is behind
hcsshim.ShutdownComputeSystem [docker-hcsshim].  The docs we're
landing here explicitly allow that sort of substitution, because we
need to have soft/hard stop on those platforms but *can't* use POSIX
signals.  They borrow wording from
opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for
platform.os and .arch, 2016-05-19, opencontainers#441) to
recommend runtime authors document the alternative technology so
bundle-authors can prepare (e.g. by installing the equivalent to a
SIGTERM signal handler).

# Command style

Use imperative phrasing for command summaries, to follow the practice
recommended by Python's PEP 257 [pep257-docstring]:

  The docstring is a phrase ending in a period. It prescribes the
  function or method's effect as a command ("Do this", "Return that"),
  not as a description; e.g. don't write "Returns the pathname ...".

The commands have the following layout:

  ### {command name}

  {one-line description}

  * *Options:* ...
  ...
  * *Exit code:* ...

  {additional notes}

  #### Example

  {example}

The four-space list indents follow opencontainers/runtime-spec@7795661
(runtime.md: Fix sub-bullet indentation, 2016-06-08,
opencontainers#495).  From [markdown-syntax]:

  List items may consist of multiple paragraphs.  Each subsequent
  paragraph in a list item must be indented by either 4 spaces or one
  tab...

Trevor expects that's intended to be read with "block element" instead
of "paragraph", in which case it applies to nested lists too.

And while GitHub supports two-space indents [github-lists]:

  You can create nested lists by indenting lines by two spaces.

it seems that pandoc does not.

# Versioning

The command-line interface is largely orthogonal to the config format,
and config authors and runtime callers may be entirely different sets
of people.  Zhang Wei called for more explicit versioning for the CLI
[interface-versioning], and the approach taken here follows the
approach taken by Python's email package [python-email-version].

Wedging multiple, independently versioned entities into a single
repository can be awkward, but earlier proposals to put the CLI in its
own repository [separate-repository-proposed] were unsuccessful
because compliance testing requires both a CLI and a config
specification [separate-repository-refused].  Trevor doesn't think
that's a solid reason [separate-repository-refusal-rebutted], but
discussion along that line stalled out, so the approach taken here is
to keep both independently versioned entities in the same repository.

# Global options

This section is intended to allow runtimes to extend the command line
API with additional options and commands as they see fit without
interfering with the commands and options specified in this document.
The last line in this section makes it explicit that any later
specification (e.g. "MUST print the state JSON to its stdout") do not
apply to cases where the caller has included an unspecified option or
command (e.g. --format=protobuf).  For extensive discussion on this
point see [extensions-unspecified].

With regard to the statement "Command names MUST NOT start with
hyphens", the rationale behind this decision is to distinguish
unrecognized commands from unrecognized options
[distinguish-unrecognized-commands] because we want to allow (but not
require) runtimes to fail fast when faced with an unrecognized
command [optional-fail-fast].

# Long options

Use GNU-style long options to avoid ambiguous, one-character options
in the spec, while still allowing the runtime to support one-character
options with packing.  We don't specify one-character options in this
spec, because portable callers can use the long form, and not
specifying short forms leaves runtimes free to assign those as they
see fit.

# Character encodings

Punt to the operating system for character encodings.  Without this,
the character set for the state JSON or other command output seemed
too ambiguous.

Trevor wishes there were cleaner references for the
{language}.{encoding} locales like en_US.UTF-8 and UTF-8.  But
[wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't
find a more targetted UTF-8 link than just dropping folks into a
Unicode chapter (which is what [wikipedia-utf-8] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

The TR35 link is for:

  In addition, POSIX locales may also specify the character encoding,
  which requires the data to be transformed into that target encoding.

and the POSIX §6.2 link is for:

  In other locales, the presence, meaning, and representation of any
  additional characters are locale-specific.

# Standard streams

The "MUST NOT attempt to read from its stdin" means a generic caller
can safely exec the command with a closed or null stdin, and not have
to worry about the command blocking or crashing because of that.  The
stdout spec for start/delete is more lenient, because runtimes are
unlikely to change their behavior because they are unable to write to
stdout.  If this assumption proves troublesome, we may have to tighten
it up later.

Aleksa Sarai also raised concerns over the safety of potentially
giving the container process access to terminal ioctl escapes
[stdio-ioctl] and feels like the stdio file-descriptor pass-through is
surprising [stdio-surprise].

# Console socket protocol

Based on in-flight work by Aleksa in opencontainers/runc#1018, this
commit makes the following choices:

* SOCK_SEQPACKET instead of SOCK_STREAM, because this is a
  message-based protocol, so it seems more natural to use a
  message-oriented socket type.

* A string 'type' field for all messages, so we can add additional
  message types in the future without breaking backwards compatibility
  (new console-socket servers will still support old clients).  Aleksa
  favored splitting this identifier into an integer 'type' and
  'version' fields [runc-socket-type-version], but I don't see the
  point if they're both opaque integers without internal structure.
  And I expect this protocol to be stable enough that it's not worth
  involving SemVer and its structured versioning.

* Response messages, so the client can tell whether the request was
  received and processed successfully or not.  That gives the client a
  way to bail out if, for example, the server does not support the
  'terminal' message type.

* Add a sub-package specs-go/socket.  Even though there aren't many
  new types, these are fairly different from the rest of specs-go and
  that namespace was getting crowded.

# Event triggers

The "Callers MAY block..." wording is going to be hard to enforce, but
with the runC model, clients rely on the command exits to trigger
post-create and post-start activity.  The longer the runtime hangs
around after completing its action, the laggier those triggers will
be.

For an alternative event trigger approach, see the discussion of an
'event' command in the rejected-for-now avenues at the end of this
commit message.

# Lifecycle notes

These aren't documented in the current runtime-spec, and may no longer
be true.  But they were true at one point, and informed the
development of this specification.

## Process cleanup

On IRC on 2015-09-15 (with PDT timestamps):

  10:56 < crosbymichael> if the main process dies in the container,
    all other process are killed
  ...
  10:58 < julz> crosbymichael: I'm assuming what you mean is you kill
    everything in the cgroup -> everything in the container dies?
  10:58 < crosbymichael> julz: yes, that is how its implemented
  ...
  10:59 < crosbymichael> julz: we actually freeze first, send the
    KILL, then unfreeze so we don't have races

## Container IDs for namespace joiners

You can create a config that adds no isolation vs. the runtime
namespace or completely joins another set of existing namespaces.  It
seems odd to call that a new "container", but the ID is really more of
a process ID, and less of a container ID.  The "container" phrasing is
just a useful hint that there might be some isolation going on.  And
we're always creating a new "container process" with 'create'.

# Other changes

This commit also moves the file-descriptor docs from runtime-linux.md
into runtime.md and the command-line docs.  Both affect runtime
authors, but:

* The runtime.md entry is more useful for bundle authors than the old
  wording, because it gives them confidence that the runtime caller
  will have the power to set these up as they see fit (within POSIX's
  limits).  It is also API-agnostic, so bundle authors know they won't
  have to worry about which API will be used to launch the container
  before deciding whether it is safe to rely on runtime-caller
  file-descriptor control.

* The command line entry is more useful for runtime-callers than the
  old wording, because it tells you how to setup the file descriptors
  instead of just telling you that they MAY be setup.

I moved the bundle-author language from runtime-linux.md to runtime.md
because it's relying on POSIX primitives that aren't Linux-specific.

# Avenues pursued but rejected (for now)

* Early versions of this specification had 'start' taking '--config'
  and '--runtime', but this commit uses '--bundle' [start-pr-bundle].

  The single config file change [single-config-proposal] went through,
  but Trevor would also like to be able to pipe a config into the
  'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path)
  [runc-config-via-stdin], and he has a working example that supports
  this without difficulty [ccon-config-via-stdin].  But since
  [runc-bundle-option] landed on 2015-11-16, runC has replaced their
  --config-file and --runtime-file flags with --bundle, and the
  current goal of this API is "keeping as much similarity with the
  existing runC command-line as possible", not "makes sense to Trevor"
  ;).  It looks like runC was reacting [runc-required-config-file] to
  strict wording in the spec [runtime-spec-required-config-file], so
  we might be able to revisit this if/when we lift that restriction.

* Having 'start' (now 'create') take a --state option to write state
  to a file [start-pr-state].  This is my preferred approach to
  sharing container state, since it punts a persistent state registry
  to higher-level tooling [punt-state-registry].  But runtime-spec
  currently requires the runtime to maintain such a registry
  [state-registry], and we don't need two ways to do that ;).

  On systems like Solaris, the kernel maintains a registry of
  container IDs directly, so they don't need an external registry
  [solaris-kernel-state].

* Having 'start' (now 'create') take an --id option instead of a
  required ID argument, and requiring the runtime to generate a unique
  ID if the option was not set.  When there is a long-running host
  process waiting on the container process to perform cleanup, the
  runtime-caller may not need to know the container ID.  However, runC
  has been requiring a user-specified ID since [runc-start-id], and
  this spec follows the early-exit 'create' from [runc-create-start],
  so we require one here.  We can revisit this if we regain a
  long-running 'create' process.

* Having 'create' take a '--console-socket PATH' option (required when
  process.terminal is true) with a path to a SOCK_SEQPACKET Unix
  socket for use with the console-socket protocol.  The current
  'LISTEN_FDS + 3' approach was proposed by Michael Crosby
  [console-socket-fd], but Trevor doesn't have a clear idea of the
  motivation for the change and would have preferred '--console-socket
  FD'.

* Having a long-running 'create' process.  Trevor is not a big fan of
  this early-exit 'create', which requires platform-specific magic to
  collect the container process's exit code.  The ptrace idea in this
  commit is from Mrunal [mrunal-ptrace].

  Trevor has a proposal for an 'event' operation [event] which would
  provide a convenient created trigger.  With 'event' in place, we
  don't need the 'create' process exit to serve as that trigger, and
  could have a long-running 'create' that collects the container
  process exit code using the portable waitid() family.  But the
  consensus after this week's meeting was to table that while we land
  docs for the runC API [mimic-runc].

* Having a 'version' command to make it easy for a caller to report
  which runtime they're using.  But we don't have a use-case that
  makes it strictly necessary for interop, so we're leaving it out for
  now [no-version].

* Using 'sh' syntax highlighting [syntax-highlighting] for the fenced
  code blocks.  The 'sh' keyword comes from [linguist-languages].  But
  the new fenced code blocks are shell sessions, not scripts, and we
  don't want shell-syntax highlighting in the command output.

[ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration
[console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30
[container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376
  Subject: Add initial pass at a cmd line spec
[container-pid-not-portable]: opencontainers#459
  Subject: [ Runtime ] Allow for excluding pid from state
[coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html
[distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167
  Subject: Clarity for commands vs global options
[docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230
  moby/moby@bc503ca8 (Windows: [TP4] docker kill handling,
  2015-10-12, moby/moby#16997)
[event]: opencontainers#508
  Subject: runtime: Add an 'event' operation for subscribing to pushes
[extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56
[github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists
[interface-versioning]: opencontainers#513 (comment)
[linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml
[linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39
[listen-fds-description]: opencontainers/runc#231 (comment)
  Subject: Systemd integration with runc, for on-demand socket
    activation
[markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list
[mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15
[mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54
[no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75
[optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617
  Subject: Use RFC 2119's keywords (MUST, MAY, ...)
[pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings
[posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html
[punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79
[python-email-version]: https://docs.python.org/3/library/email.html#package-history
[rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3
[rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A
  RFC 7518 is currently identical to 7519.
[rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1
[runc-bundle-option]: opencontainers/runc#373
  Subject: adding support for --bundle
[runc-config-via-stdin]: opencontainers/runc#202
  Subject: Can runc take its configuration on stdin?
[runc-listen-fds]: opencontainers/runc#231
  Subject: Systemd integration with runc, for on-demand socket
    activation
[runc-required-config-file]: opencontainers/runc#310 (comment)
  Subject: specifying a spec file on cmd line?
[runc-socket-type-version]: opencontainers/runc#1018 (comment)
  Subject: Consoles, consoles, consoles.
[runc-start-id]: opencontainers/runc#541
  opencontainers/runc@a7278cad (Require container id as arg1,
  2016-02-08, opencontainers/runc#541)
[runtime-spec-caller-api-agnostic]: opencontainers#113 (comment)
  Subject: Add fd section for linux container process
[runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14
  106ec2d (Cleanup bundle.md, 2015-10-02,
  opencontainers#210)
[sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html
[separate-repository-proposed]: opencontainers#513 (comment)
[separate-repository-refused]: opencontainers#513 (comment)
[separate-repository-refusal-rebutted]: opencontainers#513 (comment)
[single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY
  Subject: Single, unified config file (i.e. rolling back specs#88)
  Date: Wed, 4 Nov 2015 09:53:20 -0800
  Message-ID: <20151104175320.GC24652@odin.tremily.us>
[solaris-kernel-state]: wking/oci-command-line-api#3 (comment)
  Subject: Drop exec, pause, resume, and signal
[start-pr-bundle]: wking/oci-command-line-api#11
  Subject: start: Change --config and --runtime to --bundle
[start-pr-state]: wking/oci-command-line-api#14
  Subject: start: Add a --state option
[state-pr]: wking/oci-command-line-api#16
  Subject: runtime: Add a 'state' command
[state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61
  7117ede (Expand on the definition of our ops, 2015-10-13,
  opencontainers#225)
[stdio-ioctl]: opencontainers#513 (comment)
[stdio-surprise]: opencontainers#513 (comment)
[syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting
[systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69
[util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html
[wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8
[wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms
[windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE
  Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356)
  Date: Thu, 26 May 2016 11:03:29 -0700
  Message-ID: <20160526180329.GL17496@odin.tremily.us>

Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Hopefully-Signed-off-by: Mike Brown <brownwm@us.ibm.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
Reviewed-by: Jesse Butler <jeeves.butler@gmail.com>
wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Jan 6, 2017
proc(5) describes the following state entries in proc/[pid]/stat [1]
(for modern kernels):

* R Running
* S Sleeping in an interruptible wait
* D Waiting in uninterruptible disk sleep
* Z Zombie
* T Stopped (on a signal)
* t Tracing stop
* X Dead

and ps(1) has a bit more context [2] (for modern kernels):

* D uninterruptible sleep (usually IO)
* R running or runnable (on run queue)
* S interruptible sleep (waiting for an event to complete)
* T stopped by job control signal
* t stopped by debugger during the tracing
* X dead (should never be seen)
* Z defunct ("zombie") process, terminated but not reaped by its
  parent

So I expect "stopped" to mean "process still exists but is paused,
e.g. by SIGSTOP".  And I expect "exited" to mean "process has finished
and is either a zombie or dead".

After this commit, 'git grep -i stop' only turns up the "stopped"
state (which I've left alone for backwards compat), some poststop-hook
stuff, a reference in principles.md, a "stoppage" in LICENSE, and some
ChangeLog entries.

Also replace "container's process" with "container process" to match
usage in the rest of the repository.  After this commit:

  $ git grep -i "container process" | wc -l
  20
  $ git grep -i "container's process" | wc -l
  1

Also reword status entries to avoid "running", which is less precise
in our spec (e.g. it also includes "sleeping", "waiting", ...).

Also removes a "them" leftover from a partial plural -> singular
reroll of be59415 (Split create and start, 2016-04-01, opencontainers#384).

[1]: http://man7.org/linux/man-pages/man5/proc.5.html
[2]: http://man7.org/linux/man-pages/man1/ps.1.html

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Jan 11, 2017
Since be59415 (Split create and start, 2016-04-01, opencontainers#384), it's
possible for a container process to never execute user-specified code
(e.g. you can call 'create', 'kill', 'delete' without calling
'start').  For folks who expect to do that, there's no reason to
define process.args.

The only other process property required for all platforms is 'cwd',
but the runtime's idler code isn't specified in sufficient detail for
the configuration author to have an opinion about what its working
directory should be.

On Linux and Solaris, 'user' is also required for 'uid' and 'gid'.  My
preferred approach here is to make those optional and define defaults
[1,2]:

  If unset, the runtime will not attempt to manipulate the user ID
  (e.g. not calling setuid(2) or similar).

But the maintainer consensus is that they want those to be explicitly
required properties [3,4,5].  With the current spec, one option could
be to make process optional (with the idler's working directory
unspecified) for OSes besides Linux and Solaris.  On Windows, username
is optional, but it's not clear how intentional that was [6].

[1]: opencontainers#417 (comment)
[2]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/DWdystx5X3A
     Subject: Exposing platform defaults
     Date: Thu, 14 Jan 2016 15:36:26 -0800
     Message-ID: <20160114233625.GN6362@odin.tremily.us>
[3]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-05-04-17.00.log.html#l-44
[4]: opencontainers#417 (comment)
[5]: opencontainers#417 (comment)
[6]: opencontainers#618

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Jan 11, 2017
Since be59415 (Split create and start, 2016-04-01, opencontainers#384), it's
possible for a container process to never execute user-specified code
(e.g. you can call 'create', 'kill', 'delete' without calling
'start').  For folks who expect to do that, there's no reason to
define process.args.

The only other process property required for all platforms is 'cwd',
but the runtime's idler code isn't specified in sufficient detail for
the configuration author to have an opinion about what its working
directory should be.

On Linux and Solaris, 'user' is also required for 'uid' and 'gid'.  My
preferred approach here is to make those optional and define defaults
[1,2]:

  If unset, the runtime will not attempt to manipulate the user ID
  (e.g. not calling setuid(2) or similar).

But the maintainer consensus is that they want those to be explicitly
required properties [3,4,5].  With the current spec, one option could
be to make process optional (with the idler's working directory
unspecified) for OSes besides Linux and Solaris.  On Windows, username
is optional, but it's not clear how intentional that was [6].

[1]: opencontainers#417 (comment)
[2]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/DWdystx5X3A
     Subject: Exposing platform defaults
     Date: Thu, 14 Jan 2016 15:36:26 -0800
     Message-ID: <20160114233625.GN6362@odin.tremily.us>
[3]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-05-04-17.00.log.html#l-44
[4]: opencontainers#417 (comment)
[5]: opencontainers#417 (comment)
[6]: opencontainers#618

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Jan 11, 2017
Since be59415 (Split create and start, 2016-04-01, opencontainers#384), it's
possible for a container process to never execute user-specified code
(e.g. you can call 'create', 'kill', 'delete' without calling
'start').  For folks who expect to do that, there's no reason to
define process.args.

The only other process property required for all platforms is 'cwd',
but the runtime's idler code isn't specified in sufficient detail for
the configuration author to have an opinion about what its working
directory should be.

On Linux and Solaris, 'user' is also required for 'uid' and 'gid'.  My
preferred approach here is to make those optional and define defaults
[1,2]:

  If unset, the runtime will not attempt to manipulate the user ID
  (e.g. not calling setuid(2) or similar).

But the maintainer consensus is that they want those to be explicitly
required properties [3,4,5].  With the current spec, one option could
be to make process optional (with the idler's working directory
unspecified) for OSes besides Linux and Solaris.  On Windows, username
is optional, but it's not clear how intentional that was [6].

[1]: opencontainers#417 (comment)
[2]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/DWdystx5X3A
     Subject: Exposing platform defaults
     Date: Thu, 14 Jan 2016 15:36:26 -0800
     Message-ID: <20160114233625.GN6362@odin.tremily.us>
[3]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-05-04-17.00.log.html#l-44
[4]: opencontainers#417 (comment)
[5]: opencontainers#417 (comment)
[6]: opencontainers#618

Signed-off-by: W. Trevor King <wking@tremily.us>
wking pushed a commit to wking/opencontainer-runtime-spec that referenced this pull request Feb 1, 2017
# Commands

## create

The --bundle [start-pr-bundle] and --pid-file options and ID argument
[runc-start-id] match runC's interface.

One benefit of the early-exit 'create' is that the exit code does not
conflate container process exits with "failed to setup the sandbox"
exits.  We can take advantage of that and use non-zero 'create' exits
to allow stderr writing (so the runtime can log errors while dying
without having to successfully connect to syslog or some such).
Trevor still likes the long-running 'create' API because it makes
collecting the exit code easier, see the entry under rejected-for-now
avenues at the end of this commit message.

### --pid-file

You can get the PID by calling 'state' [container-pid-from-state], and
container PIDs may not be portable [container-pid-not-portable].  But
it's a common way for interfacing with init systems like systemd
[systemd-pid], and for this first pass at the command line API folks
are ok with some Linux-centrism [linux-centric].

### Document LISTEN_FDS for passing open file descriptors

This landed in runC with [runc-listen-fds], but the bundle-author <->
runtime specs explicitly avoided talking about how this is set (since
the bundle-author didn't care about the runtime-caller <-> runtime
interface) [runtime-spec-caller-api-agnostic].  This commit steps away
from that agnosticism.

Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out,
since he doesn't see how the runtime-caller would choose anything
other than 1 for its value.  It seems like something that a process
would have to set for itself (because guessing the PID of a child
before spawning it seems racy ;).  In any event, the runC
implementation seems to set this to 1 regardless of what systemd
passes to it [listen-fds-description].

We've borrowed Shishir's wording for the example
[listen-fds-description].

## state [state-pr]

Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand
on the definition of our ops, 2015-10-13,
opencontainers#225, v0.4.0).  The state example is
adapted from runtime.md, but we defer the actual specification of the
JSON to that file.

The encoding for the output JSON (and all standard-stream activity) is
covered by the "Character encodings" section.  In cases where the
runtime ignores the SHOULD (still technically compliant), RFC 7159
makes encoding detection reasonably straightforward [rfc7159-s8.1].
The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although
these were dropped in RFC 7518 [rfc7518-aA], probably as a result of
removing the constraint that "JSON text" be an object or array
[rfc7518-aA]).  The hints should still apply to the state output,
because we know it will be an object.  If that ends up being too dicey
and we want to certify runtimes that do not respect their
operating-system conventions, we can add an --encoding option later.

## kill

Partially catch up with opencontainers/runtime-spec@be594153 (Split
create and start, 2016-04-01, opencontainers#384).  The
interface is based on POSIX [posix-kill], util-linux
[util-linux-kill], and GNU coreutils [coreutils-kill].

The TERM/KILL requirement is a minimum portability requirement for
soft/hard stops.  Windows lacks POSIX signals [windows-signals], and
currently supports soft stops in Docker with whatever is behind
hcsshim.ShutdownComputeSystem [docker-hcsshim].  The docs we're
landing here explicitly allow that sort of substitution, because we
need to have soft/hard stop on those platforms but *can't* use POSIX
signals.  They borrow wording from
opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for
platform.os and .arch, 2016-05-19, opencontainers#441) to
recommend runtime authors document the alternative technology so
bundle-authors can prepare (e.g. by installing the equivalent to a
SIGTERM signal handler).

# Command style

Use imperative phrasing for command summaries, to follow the practice
recommended by Python's PEP 257 [pep257-docstring]:

  The docstring is a phrase ending in a period. It prescribes the
  function or method's effect as a command ("Do this", "Return that"),
  not as a description; e.g. don't write "Returns the pathname ...".

The commands have the following layout:

  ### {command name}

  {one-line description}

  * *Options:* ...
  ...
  * *Exit code:* ...

  {additional notes}

  #### Example

  {example}

The four-space list indents follow opencontainers/runtime-spec@7795661
(runtime.md: Fix sub-bullet indentation, 2016-06-08,
opencontainers#495).  From [markdown-syntax]:

  List items may consist of multiple paragraphs.  Each subsequent
  paragraph in a list item must be indented by either 4 spaces or one
  tab...

Trevor expects that's intended to be read with "block element" instead
of "paragraph", in which case it applies to nested lists too.

And while GitHub supports two-space indents [github-lists]:

  You can create nested lists by indenting lines by two spaces.

it seems that pandoc does not.

# Versioning

The command-line interface is largely orthogonal to the config format,
and config authors and runtime callers may be entirely different sets
of people.  Zhang Wei called for more explicit versioning for the CLI
[interface-versioning], and the approach taken here follows the
approach taken by Python's email package [python-email-version].

Wedging multiple, independently versioned entities into a single
repository can be awkward, but earlier proposals to put the CLI in its
own repository [separate-repository-proposed] were unsuccessful
because compliance testing requires both a CLI and a config
specification [separate-repository-refused].  Trevor doesn't think
that's a solid reason [separate-repository-refusal-rebutted], but
discussion along that line stalled out, so the approach taken here is
to keep both independently versioned entities in the same repository.

# Global options

This section is intended to allow runtimes to extend the command line
API with additional options and commands as they see fit without
interfering with the commands and options specified in this document.
The last line in this section makes it explicit that any later
specification (e.g. "MUST print the state JSON to its stdout") do not
apply to cases where the caller has included an unspecified option or
command (e.g. --format=protobuf).  For extensive discussion on this
point see [extensions-unspecified].

With regard to the statement "Command names MUST NOT start with
hyphens", the rationale behind this decision is to distinguish
unrecognized commands from unrecognized options
[distinguish-unrecognized-commands] because we want to allow (but not
require) runtimes to fail fast when faced with an unrecognized
command [optional-fail-fast].

# Long options

Use GNU-style long options to avoid ambiguous, one-character options
in the spec, while still allowing the runtime to support one-character
options with packing.  We don't specify one-character options in this
spec, because portable callers can use the long form, and not
specifying short forms leaves runtimes free to assign those as they
see fit.

# Character encodings

Punt to the operating system for character encodings.  Without this,
the character set for the state JSON or other command output seemed
too ambiguous.

Trevor wishes there were cleaner references for the
{language}.{encoding} locales like en_US.UTF-8 and UTF-8.  But
[wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't
find a more targetted UTF-8 link than just dropping folks into a
Unicode chapter (which is what [wikipedia-utf-8] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

The TR35 link is for:

  In addition, POSIX locales may also specify the character encoding,
  which requires the data to be transformed into that target encoding.

and the POSIX §6.2 link is for:

  In other locales, the presence, meaning, and representation of any
  additional characters are locale-specific.

# Standard streams

The "MUST NOT attempt to read from its stdin" means a generic caller
can safely exec the command with a closed or null stdin, and not have
to worry about the command blocking or crashing because of that.  The
stdout spec for start/delete is more lenient, because runtimes are
unlikely to change their behavior because they are unable to write to
stdout.  If this assumption proves troublesome, we may have to tighten
it up later.

Aleksa Sarai also raised concerns over the safety of potentially
giving the container process access to terminal ioctl escapes
[stdio-ioctl] and feels like the stdio file-descriptor pass-through is
surprising [stdio-surprise].

# Console socket protocol

Based on in-flight work by Aleksa in opencontainers/runc#1018, this
commit makes the following choices:

* SOCK_SEQPACKET instead of SOCK_STREAM, because this is a
  message-based protocol, so it seems more natural to use a
  message-oriented socket type.

* A string 'type' field for all messages, so we can add additional
  message types in the future without breaking backwards compatibility
  (new console-socket servers will still support old clients).  Aleksa
  favored splitting this identifier into an integer 'type' and
  'version' fields [runc-socket-type-version], but I don't see the
  point if they're both opaque integers without internal structure.
  And I expect this protocol to be stable enough that it's not worth
  involving SemVer and its structured versioning.

* Response messages, so the client can tell whether the request was
  received and processed successfully or not.  That gives the client a
  way to bail out if, for example, the server does not support the
  'terminal' message type.

* Add a sub-package specs-go/socket.  Even though there aren't many
  new types, these are fairly different from the rest of specs-go and
  that namespace was getting crowded.

# Event triggers

The "Callers MAY block..." wording is going to be hard to enforce, but
with the runC model, clients rely on the command exits to trigger
post-create and post-start activity.  The longer the runtime hangs
around after completing its action, the laggier those triggers will
be.

For an alternative event trigger approach, see the discussion of an
'event' command in the rejected-for-now avenues at the end of this
commit message.

# Lifecycle notes

These aren't documented in the current runtime-spec, and may no longer
be true.  But they were true at one point, and informed the
development of this specification.

## Process cleanup

On IRC on 2015-09-15 (with PDT timestamps):

  10:56 < crosbymichael> if the main process dies in the container,
    all other process are killed
  ...
  10:58 < julz> crosbymichael: I'm assuming what you mean is you kill
    everything in the cgroup -> everything in the container dies?
  10:58 < crosbymichael> julz: yes, that is how its implemented
  ...
  10:59 < crosbymichael> julz: we actually freeze first, send the
    KILL, then unfreeze so we don't have races

## Container IDs for namespace joiners

You can create a config that adds no isolation vs. the runtime
namespace or completely joins another set of existing namespaces.  It
seems odd to call that a new "container", but the ID is really more of
a process ID, and less of a container ID.  The "container" phrasing is
just a useful hint that there might be some isolation going on.  And
we're always creating a new "container process" with 'create'.

# Other changes

This commit also moves the file-descriptor docs from runtime-linux.md
into runtime.md and the command-line docs.  Both affect runtime
authors, but:

* The runtime.md entry is more useful for bundle authors than the old
  wording, because it gives them confidence that the runtime caller
  will have the power to set these up as they see fit (within POSIX's
  limits).  It is also API-agnostic, so bundle authors know they won't
  have to worry about which API will be used to launch the container
  before deciding whether it is safe to rely on runtime-caller
  file-descriptor control.

* The command line entry is more useful for runtime-callers than the
  old wording, because it tells you how to setup the file descriptors
  instead of just telling you that they MAY be setup.

I moved the bundle-author language from runtime-linux.md to runtime.md
because it's relying on POSIX primitives that aren't Linux-specific.

# Avenues pursued but rejected (for now)

* Early versions of this specification had 'start' taking '--config'
  and '--runtime', but this commit uses '--bundle' [start-pr-bundle].

  The single config file change [single-config-proposal] went through,
  but Trevor would also like to be able to pipe a config into the
  'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path)
  [runc-config-via-stdin], and he has a working example that supports
  this without difficulty [ccon-config-via-stdin].  But since
  [runc-bundle-option] landed on 2015-11-16, runC has replaced their
  --config-file and --runtime-file flags with --bundle, and the
  current goal of this API is "keeping as much similarity with the
  existing runC command-line as possible", not "makes sense to Trevor"
  ;).  It looks like runC was reacting [runc-required-config-file] to
  strict wording in the spec [runtime-spec-required-config-file], so
  we might be able to revisit this if/when we lift that restriction.

* Having 'start' (now 'create') take a --state option to write state
  to a file [start-pr-state].  This is my preferred approach to
  sharing container state, since it punts a persistent state registry
  to higher-level tooling [punt-state-registry].  But runtime-spec
  currently requires the runtime to maintain such a registry
  [state-registry], and we don't need two ways to do that ;).

  On systems like Solaris, the kernel maintains a registry of
  container IDs directly, so they don't need an external registry
  [solaris-kernel-state].

* Having 'start' (now 'create') take an --id option instead of a
  required ID argument, and requiring the runtime to generate a unique
  ID if the option was not set.  When there is a long-running host
  process waiting on the container process to perform cleanup, the
  runtime-caller may not need to know the container ID.  However, runC
  has been requiring a user-specified ID since [runc-start-id], and
  this spec follows the early-exit 'create' from [runc-create-start],
  so we require one here.  We can revisit this if we regain a
  long-running 'create' process.

* Having 'create' take a '--console-socket PATH' option (required when
  process.terminal is true) with a path to a SOCK_SEQPACKET Unix
  socket for use with the console-socket protocol.  The current
  'LISTEN_FDS + 3' approach was proposed by Michael Crosby
  [console-socket-fd], but Trevor doesn't have a clear idea of the
  motivation for the change and would have preferred '--console-socket
  FD'.

* Having a long-running 'create' process.  Trevor is not a big fan of
  this early-exit 'create', which requires platform-specific magic to
  collect the container process's exit code.  The ptrace idea in this
  commit is from Mrunal [mrunal-ptrace].

  Trevor has a proposal for an 'event' operation [event] which would
  provide a convenient created trigger.  With 'event' in place, we
  don't need the 'create' process exit to serve as that trigger, and
  could have a long-running 'create' that collects the container
  process exit code using the portable waitid() family.  But the
  consensus after this week's meeting was to table that while we land
  docs for the runC API [mimic-runc].

* Having a 'version' command to make it easy for a caller to report
  which runtime they're using.  But we don't have a use-case that
  makes it strictly necessary for interop, so we're leaving it out for
  now [no-version].

* Using 'sh' syntax highlighting [syntax-highlighting] for the fenced
  code blocks.  The 'sh' keyword comes from [linguist-languages].  But
  the new fenced code blocks are shell sessions, not scripts, and we
  don't want shell-syntax highlighting in the command output.

[ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration
[console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30
[container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376
  Subject: Add initial pass at a cmd line spec
[container-pid-not-portable]: opencontainers#459
  Subject: [ Runtime ] Allow for excluding pid from state
[coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html
[distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167
  Subject: Clarity for commands vs global options
[docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230
  moby/moby@bc503ca8 (Windows: [TP4] docker kill handling,
  2015-10-12, moby/moby#16997)
[event]: opencontainers#508
  Subject: runtime: Add an 'event' operation for subscribing to pushes
[extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56
[github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists
[interface-versioning]: opencontainers#513 (comment)
[linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml
[linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39
[listen-fds-description]: opencontainers/runc#231 (comment)
  Subject: Systemd integration with runc, for on-demand socket
    activation
[markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list
[mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15
[mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54
[no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75
[optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617
  Subject: Use RFC 2119's keywords (MUST, MAY, ...)
[pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings
[posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html
[punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79
[python-email-version]: https://docs.python.org/3/library/email.html#package-history
[rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3
[rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A
  RFC 7518 is currently identical to 7519.
[rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1
[runc-bundle-option]: opencontainers/runc#373
  Subject: adding support for --bundle
[runc-config-via-stdin]: opencontainers/runc#202
  Subject: Can runc take its configuration on stdin?
[runc-listen-fds]: opencontainers/runc#231
  Subject: Systemd integration with runc, for on-demand socket
    activation
[runc-required-config-file]: opencontainers/runc#310 (comment)
  Subject: specifying a spec file on cmd line?
[runc-socket-type-version]: opencontainers/runc#1018 (comment)
  Subject: Consoles, consoles, consoles.
[runc-start-id]: opencontainers/runc#541
  opencontainers/runc@a7278cad (Require container id as arg1,
  2016-02-08, opencontainers/runc#541)
[runtime-spec-caller-api-agnostic]: opencontainers#113 (comment)
  Subject: Add fd section for linux container process
[runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14
  106ec2d (Cleanup bundle.md, 2015-10-02,
  opencontainers#210)
[sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html
[separate-repository-proposed]: opencontainers#513 (comment)
[separate-repository-refused]: opencontainers#513 (comment)
[separate-repository-refusal-rebutted]: opencontainers#513 (comment)
[single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY
  Subject: Single, unified config file (i.e. rolling back specs#88)
  Date: Wed, 4 Nov 2015 09:53:20 -0800
  Message-ID: <20151104175320.GC24652@odin.tremily.us>
[solaris-kernel-state]: wking/oci-command-line-api#3 (comment)
  Subject: Drop exec, pause, resume, and signal
[start-pr-bundle]: wking/oci-command-line-api#11
  Subject: start: Change --config and --runtime to --bundle
[start-pr-state]: wking/oci-command-line-api#14
  Subject: start: Add a --state option
[state-pr]: wking/oci-command-line-api#16
  Subject: runtime: Add a 'state' command
[state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61
  7117ede (Expand on the definition of our ops, 2015-10-13,
  opencontainers#225)
[stdio-ioctl]: opencontainers#513 (comment)
[stdio-surprise]: opencontainers#513 (comment)
[syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting
[systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69
[util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html
[wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8
[wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms
[windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE
  Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356)
  Date: Thu, 26 May 2016 11:03:29 -0700
  Message-ID: <20160526180329.GL17496@odin.tremily.us>

Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Hopefully-Signed-off-by: Mike Brown <brownwm@us.ibm.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
Reviewed-by: Jesse Butler <jeeves.butler@gmail.com>
wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Feb 3, 2017
I expect the lifecycle information was removed accidentally in
be59415 (Split create and start, 2016-04-01, opencontainers#384), because for a
time it seemed like that PR would also be removing hooks.  Putting the
lifecycle information back in, I made some tweaks to adjust to the new
environment, for example:

* Put the pre-start hooks after the 'start' call, but before the meat
  of the start call (the container-process exec trigger).  Folks who
  want a post-create hook can add one with that name.  I'd like to
  have renamed poststop to post-delete to avoid confusion like [1].
  But the motivation for keeping hooks was backwards compatibility [2]
  so I've left the name alone.

* Put each "...command is invoked..." lifecycle entry in its own list
  entry, to match the 'create' list entry.

* Move the rules about what happens on hook failure into the
  lifecycle.  This matches pre-split entries like:

    If any prestart hook fails, then the container MUST be stopped and
    the lifecycle continues at step 7.

  and avoids respecifying that information in a second location
  (config.md).

* I added the warning section to try and follow post-split's generic
  "generates an error" approach while respecting the pre-split desire
  to see what failed (we had "then an error including the exit code
  and the stderr is returned to the caller" and "then an error is
  logged").

* I left the state 'id' context out, since Michael didn't want it [3].

[1]: opencontainers#395
     Subject: Run post-stop hooks before the container sandbox is deleted.
[2]: opencontainers#483 (comment)
     Subject: *: Remove hooks
[3]: opencontainers#532 (comment)
     Subject: Restore hook language removed by create/start split

Signed-off-by: W. Trevor King <wking@tremily.us>
wking pushed a commit to wking/opencontainer-runtime-spec that referenced this pull request Feb 8, 2017
# Commands

## create

The --bundle [start-pr-bundle] and --pid-file options and ID argument
[runc-start-id] match runC's interface.

One benefit of the early-exit 'create' is that the exit code does not
conflate container process exits with "failed to setup the sandbox"
exits.  We can take advantage of that and use non-zero 'create' exits
to allow stderr writing (so the runtime can log errors while dying
without having to successfully connect to syslog or some such).
Trevor still likes the long-running 'create' API because it makes
collecting the exit code easier, see the entry under rejected-for-now
avenues at the end of this commit message.

### --pid-file

You can get the PID by calling 'state' [container-pid-from-state], and
container PIDs may not be portable [container-pid-not-portable].  But
it's a common way for interfacing with init systems like systemd
[systemd-pid], and for this first pass at the command line API folks
are ok with some Linux-centrism [linux-centric].

### Document LISTEN_FDS for passing open file descriptors

This landed in runC with [runc-listen-fds], but the bundle-author <->
runtime specs explicitly avoided talking about how this is set (since
the bundle-author didn't care about the runtime-caller <-> runtime
interface) [runtime-spec-caller-api-agnostic].  This commit steps away
from that agnosticism.

Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out,
since he doesn't see how the runtime-caller would choose anything
other than 1 for its value.  It seems like something that a process
would have to set for itself (because guessing the PID of a child
before spawning it seems racy ;).  In any event, the runC
implementation seems to set this to 1 regardless of what systemd
passes to it [listen-fds-description].

We've borrowed Shishir's wording for the example
[listen-fds-description].

## state [state-pr]

Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand
on the definition of our ops, 2015-10-13,
opencontainers#225, v0.4.0).  The state example is
adapted from runtime.md, but we defer the actual specification of the
JSON to that file.

The encoding for the output JSON (and all standard-stream activity) is
covered by the "Character encodings" section.  In cases where the
runtime ignores the SHOULD (still technically compliant), RFC 7159
makes encoding detection reasonably straightforward [rfc7159-s8.1].
The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although
these were dropped in RFC 7518 [rfc7518-aA], probably as a result of
removing the constraint that "JSON text" be an object or array
[rfc7518-aA]).  The hints should still apply to the state output,
because we know it will be an object.  If that ends up being too dicey
and we want to certify runtimes that do not respect their
operating-system conventions, we can add an --encoding option later.

## kill

Partially catch up with opencontainers/runtime-spec@be594153 (Split
create and start, 2016-04-01, opencontainers#384).  The
interface is based on POSIX [posix-kill], util-linux
[util-linux-kill], and GNU coreutils [coreutils-kill].

The TERM/KILL requirement is a minimum portability requirement for
soft/hard stops.  Windows lacks POSIX signals [windows-signals], and
currently supports soft stops in Docker with whatever is behind
hcsshim.ShutdownComputeSystem [docker-hcsshim].  The docs we're
landing here explicitly allow that sort of substitution, because we
need to have soft/hard stop on those platforms but *can't* use POSIX
signals.  They borrow wording from
opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for
platform.os and .arch, 2016-05-19, opencontainers#441) to
recommend runtime authors document the alternative technology so
bundle-authors can prepare (e.g. by installing the equivalent to a
SIGTERM signal handler).

# Command style

Use imperative phrasing for command summaries, to follow the practice
recommended by Python's PEP 257 [pep257-docstring]:

  The docstring is a phrase ending in a period. It prescribes the
  function or method's effect as a command ("Do this", "Return that"),
  not as a description; e.g. don't write "Returns the pathname ...".

The commands have the following layout:

  ### {command name}

  {one-line description}

  * *Options:* ...
  ...
  * *Exit code:* ...

  {additional notes}

  #### Example

  {example}

The four-space list indents follow opencontainers/runtime-spec@7795661
(runtime.md: Fix sub-bullet indentation, 2016-06-08,
opencontainers#495).  From [markdown-syntax]:

  List items may consist of multiple paragraphs.  Each subsequent
  paragraph in a list item must be indented by either 4 spaces or one
  tab...

Trevor expects that's intended to be read with "block element" instead
of "paragraph", in which case it applies to nested lists too.

And while GitHub supports two-space indents [github-lists]:

  You can create nested lists by indenting lines by two spaces.

it seems that pandoc does not.

# Versioning

The command-line interface is largely orthogonal to the config format,
and config authors and runtime callers may be entirely different sets
of people.  Zhang Wei called for more explicit versioning for the CLI
[interface-versioning], and the approach taken here follows the
approach taken by Python's email package [python-email-version].

Wedging multiple, independently versioned entities into a single
repository can be awkward, but earlier proposals to put the CLI in its
own repository [separate-repository-proposed] were unsuccessful
because compliance testing requires both a CLI and a config
specification [separate-repository-refused].  Trevor doesn't think
that's a solid reason [separate-repository-refusal-rebutted], but
discussion along that line stalled out, so the approach taken here is
to keep both independently versioned entities in the same repository.

# Global options

This section is intended to allow runtimes to extend the command line
API with additional options and commands as they see fit without
interfering with the commands and options specified in this document.
The last line in this section makes it explicit that any later
specification (e.g. "MUST print the state JSON to its stdout") do not
apply to cases where the caller has included an unspecified option or
command (e.g. --format=protobuf).  For extensive discussion on this
point see [extensions-unspecified].

With regard to the statement "Command names MUST NOT start with
hyphens", the rationale behind this decision is to distinguish
unrecognized commands from unrecognized options
[distinguish-unrecognized-commands] because we want to allow (but not
require) runtimes to fail fast when faced with an unrecognized
command [optional-fail-fast].

# Long options

Use GNU-style long options to avoid ambiguous, one-character options
in the spec, while still allowing the runtime to support one-character
options with packing.  We don't specify one-character options in this
spec, because portable callers can use the long form, and not
specifying short forms leaves runtimes free to assign those as they
see fit.

# Character encodings

Punt to the operating system for character encodings.  Without this,
the character set for the state JSON or other command output seemed
too ambiguous.

Trevor wishes there were cleaner references for the
{language}.{encoding} locales like en_US.UTF-8 and UTF-8.  But
[wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't
find a more targetted UTF-8 link than just dropping folks into a
Unicode chapter (which is what [wikipedia-utf-8] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

The TR35 link is for:

  In addition, POSIX locales may also specify the character encoding,
  which requires the data to be transformed into that target encoding.

and the POSIX §6.2 link is for:

  In other locales, the presence, meaning, and representation of any
  additional characters are locale-specific.

# Standard streams

The "MUST NOT attempt to read from its stdin" means a generic caller
can safely exec the command with a closed or null stdin, and not have
to worry about the command blocking or crashing because of that.  The
stdout spec for start/delete is more lenient, because runtimes are
unlikely to change their behavior because they are unable to write to
stdout.  If this assumption proves troublesome, we may have to tighten
it up later.

Aleksa Sarai also raised concerns over the safety of potentially
giving the container process access to terminal ioctl escapes
[stdio-ioctl] and feels like the stdio file-descriptor pass-through is
surprising [stdio-surprise].

# Console socket protocol

Based on in-flight work by Aleksa in opencontainers/runc#1018, this
commit makes the following choices:

* SOCK_SEQPACKET instead of SOCK_STREAM, because this is a
  message-based protocol, so it seems more natural to use a
  message-oriented socket type.

* A string 'type' field for all messages, so we can add additional
  message types in the future without breaking backwards compatibility
  (new console-socket servers will still support old clients).  Aleksa
  favored splitting this identifier into an integer 'type' and
  'version' fields [runc-socket-type-version], but I don't see the
  point if they're both opaque integers without internal structure.
  And I expect this protocol to be stable enough that it's not worth
  involving SemVer and its structured versioning.

* Response messages, so the client can tell whether the request was
  received and processed successfully or not.  That gives the client a
  way to bail out if, for example, the server does not support the
  'terminal' message type.

* Add a sub-package specs-go/socket.  Even though there aren't many
  new types, these are fairly different from the rest of specs-go and
  that namespace was getting crowded.

# Event triggers

The "Callers MAY block..." wording is going to be hard to enforce, but
with the runC model, clients rely on the command exits to trigger
post-create and post-start activity.  The longer the runtime hangs
around after completing its action, the laggier those triggers will
be.

For an alternative event trigger approach, see the discussion of an
'event' command in the rejected-for-now avenues at the end of this
commit message.

# Lifecycle notes

These aren't documented in the current runtime-spec, and may no longer
be true.  But they were true at one point, and informed the
development of this specification.

## Process cleanup

On IRC on 2015-09-15 (with PDT timestamps):

  10:56 < crosbymichael> if the main process dies in the container,
    all other process are killed
  ...
  10:58 < julz> crosbymichael: I'm assuming what you mean is you kill
    everything in the cgroup -> everything in the container dies?
  10:58 < crosbymichael> julz: yes, that is how its implemented
  ...
  10:59 < crosbymichael> julz: we actually freeze first, send the
    KILL, then unfreeze so we don't have races

## Container IDs for namespace joiners

You can create a config that adds no isolation vs. the runtime
namespace or completely joins another set of existing namespaces.  It
seems odd to call that a new "container", but the ID is really more of
a process ID, and less of a container ID.  The "container" phrasing is
just a useful hint that there might be some isolation going on.  And
we're always creating a new "container process" with 'create'.

# Other changes

This commit also moves the file-descriptor docs from runtime-linux.md
into runtime.md and the command-line docs.  Both affect runtime
authors, but:

* The runtime.md entry is more useful for bundle authors than the old
  wording, because it gives them confidence that the runtime caller
  will have the power to set these up as they see fit (within POSIX's
  limits).  It is also API-agnostic, so bundle authors know they won't
  have to worry about which API will be used to launch the container
  before deciding whether it is safe to rely on runtime-caller
  file-descriptor control.

* The command line entry is more useful for runtime-callers than the
  old wording, because it tells you how to setup the file descriptors
  instead of just telling you that they MAY be setup.

I moved the bundle-author language from runtime-linux.md to runtime.md
because it's relying on POSIX primitives that aren't Linux-specific.

# Avenues pursued but rejected (for now)

* Early versions of this specification had 'start' taking '--config'
  and '--runtime', but this commit uses '--bundle' [start-pr-bundle].

  The single config file change [single-config-proposal] went through,
  but Trevor would also like to be able to pipe a config into the
  'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path)
  [runc-config-via-stdin], and he has a working example that supports
  this without difficulty [ccon-config-via-stdin].  But since
  [runc-bundle-option] landed on 2015-11-16, runC has replaced their
  --config-file and --runtime-file flags with --bundle, and the
  current goal of this API is "keeping as much similarity with the
  existing runC command-line as possible", not "makes sense to Trevor"
  ;).  It looks like runC was reacting [runc-required-config-file] to
  strict wording in the spec [runtime-spec-required-config-file], so
  we might be able to revisit this if/when we lift that restriction.

* Having 'start' (now 'create') take a --state option to write state
  to a file [start-pr-state].  This is my preferred approach to
  sharing container state, since it punts a persistent state registry
  to higher-level tooling [punt-state-registry].  But runtime-spec
  currently requires the runtime to maintain such a registry
  [state-registry], and we don't need two ways to do that ;).

  On systems like Solaris, the kernel maintains a registry of
  container IDs directly, so they don't need an external registry
  [solaris-kernel-state].

* Having 'start' (now 'create') take an --id option instead of a
  required ID argument, and requiring the runtime to generate a unique
  ID if the option was not set.  When there is a long-running host
  process waiting on the container process to perform cleanup, the
  runtime-caller may not need to know the container ID.  However, runC
  has been requiring a user-specified ID since [runc-start-id], and
  this spec follows the early-exit 'create' from [runc-create-start],
  so we require one here.  We can revisit this if we regain a
  long-running 'create' process.

* Having 'create' take a '--console-socket PATH' option (required when
  process.terminal is true) with a path to a SOCK_SEQPACKET Unix
  socket for use with the console-socket protocol.  The current
  'LISTEN_FDS + 3' approach was proposed by Michael Crosby
  [console-socket-fd], but Trevor doesn't have a clear idea of the
  motivation for the change and would have preferred '--console-socket
  FD'.

* Having a long-running 'create' process.  Trevor is not a big fan of
  this early-exit 'create', which requires platform-specific magic to
  collect the container process's exit code.  The ptrace idea in this
  commit is from Mrunal [mrunal-ptrace].

  Trevor has a proposal for an 'event' operation [event] which would
  provide a convenient created trigger.  With 'event' in place, we
  don't need the 'create' process exit to serve as that trigger, and
  could have a long-running 'create' that collects the container
  process exit code using the portable waitid() family.  But the
  consensus after this week's meeting was to table that while we land
  docs for the runC API [mimic-runc].

* Having a 'version' command to make it easy for a caller to report
  which runtime they're using.  But we don't have a use-case that
  makes it strictly necessary for interop, so we're leaving it out for
  now [no-version].

* Using 'sh' syntax highlighting [syntax-highlighting] for the fenced
  code blocks.  The 'sh' keyword comes from [linguist-languages].  But
  the new fenced code blocks are shell sessions, not scripts, and we
  don't want shell-syntax highlighting in the command output.

[ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration
[console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30
[container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376
  Subject: Add initial pass at a cmd line spec
[container-pid-not-portable]: opencontainers#459
  Subject: [ Runtime ] Allow for excluding pid from state
[coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html
[distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167
  Subject: Clarity for commands vs global options
[docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230
  moby/moby@bc503ca8 (Windows: [TP4] docker kill handling,
  2015-10-12, moby/moby#16997)
[event]: opencontainers#508
  Subject: runtime: Add an 'event' operation for subscribing to pushes
[extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56
[github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists
[interface-versioning]: opencontainers#513 (comment)
[linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml
[linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39
[listen-fds-description]: opencontainers/runc#231 (comment)
  Subject: Systemd integration with runc, for on-demand socket
    activation
[markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list
[mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15
[mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54
[no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75
[optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617
  Subject: Use RFC 2119's keywords (MUST, MAY, ...)
[pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings
[posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html
[punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79
[python-email-version]: https://docs.python.org/3/library/email.html#package-history
[rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3
[rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A
  RFC 7518 is currently identical to 7519.
[rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1
[runc-bundle-option]: opencontainers/runc#373
  Subject: adding support for --bundle
[runc-config-via-stdin]: opencontainers/runc#202
  Subject: Can runc take its configuration on stdin?
[runc-listen-fds]: opencontainers/runc#231
  Subject: Systemd integration with runc, for on-demand socket
    activation
[runc-required-config-file]: opencontainers/runc#310 (comment)
  Subject: specifying a spec file on cmd line?
[runc-socket-type-version]: opencontainers/runc#1018 (comment)
  Subject: Consoles, consoles, consoles.
[runc-start-id]: opencontainers/runc#541
  opencontainers/runc@a7278cad (Require container id as arg1,
  2016-02-08, opencontainers/runc#541)
[runtime-spec-caller-api-agnostic]: opencontainers#113 (comment)
  Subject: Add fd section for linux container process
[runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14
  106ec2d (Cleanup bundle.md, 2015-10-02,
  opencontainers#210)
[sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html
[separate-repository-proposed]: opencontainers#513 (comment)
[separate-repository-refused]: opencontainers#513 (comment)
[separate-repository-refusal-rebutted]: opencontainers#513 (comment)
[single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY
  Subject: Single, unified config file (i.e. rolling back specs#88)
  Date: Wed, 4 Nov 2015 09:53:20 -0800
  Message-ID: <20151104175320.GC24652@odin.tremily.us>
[solaris-kernel-state]: wking/oci-command-line-api#3 (comment)
  Subject: Drop exec, pause, resume, and signal
[start-pr-bundle]: wking/oci-command-line-api#11
  Subject: start: Change --config and --runtime to --bundle
[start-pr-state]: wking/oci-command-line-api#14
  Subject: start: Add a --state option
[state-pr]: wking/oci-command-line-api#16
  Subject: runtime: Add a 'state' command
[state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61
  7117ede (Expand on the definition of our ops, 2015-10-13,
  opencontainers#225)
[stdio-ioctl]: opencontainers#513 (comment)
[stdio-surprise]: opencontainers#513 (comment)
[syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting
[systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69
[util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html
[wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8
[wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms
[windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE
  Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356)
  Date: Thu, 26 May 2016 11:03:29 -0700
  Message-ID: <20160526180329.GL17496@odin.tremily.us>

Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Hopefully-Signed-off-by: Mike Brown <brownwm@us.ibm.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
Reviewed-by: Jesse Butler <jeeves.butler@gmail.com>
wking added a commit to wking/oci-command-line-api that referenced this pull request Feb 9, 2017
Similar to the 'signal' command removed in b922732 (Drop exec, pause,
resume, and signal, 2015-12-02, #3).  The
runtime-spec gained a kill operation as part of
opencontainers/runtime-spec@be594153 (Split create and start,
2016-04-01, opencontainers/runtime-spec#384).  The interface is based
on POSIX [1], util-linux [2], and GNU coreutils [3].

The TERM/KILL requirement is a minimum portability requirement for
soft/hard stops.  Windows lacks POSIX signals [4], and currently
supports soft stops in Docker with whatever is behind
hcsshim.ShutdownComputeSystem [5].  The docs we're landing here
explicitly allow that sort of substitution, because we need to have
soft/hard stop on those platforms but *can't* use POSIX signals.  They
borrow wording from opencontainers/runtime-spec@35b0e9ee (config:
Clarify MUST for platform.os and .arch, 2016-05-19,
opencontainers/runtime-spec#441) to recommend runtime authors document
the alternative technology so bundle-authors can prepare (e.g. by
installing the equivalent to a SIGTERM signal handler).

[1]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html
[2]: http://man7.org/linux/man-pages/man1/kill.1.html
[3]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html
[4]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE
     Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356)
     Date: Thu, 26 May 2016 11:03:29 -0700
     Message-ID: <20160526180329.GL17496@odin.tremily.us>
[5]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230
     moby/moby@bc503ca8 (Windows: [TP4] docker kill handling,
     2015-10-12, moby/moby#16997)

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/oci-command-line-api that referenced this pull request Feb 9, 2017
Catch up with opencontainers/runtime-spec@be594153 (Split create and
start, 2016-04-01, opencontainers/runtime-spec#384).

One benefit of the early-exit 'create' is that the exit code does not
conflate container process exits with "failed to setup the sandbox"
exits.  We can take advantage of that and use non-zero 'create' exits
to allow stderr writing (so the runtime can log errors while dying
without having to successfully connect to syslog or some such).

I still likes the long-running 'create' API because it makes
collecting the exit code easier.  I've proposed an 'event' operation
[1] which would provide a convenient created trigger.  With 'event' in
place, we don't need the 'create' process exit to serve as that
trigger, and could have a long-running 'create' that collects the
container process exit code using the portable waitid() family.  But
the consensus after the 2016-07-13 meeting was to table that while we
land docs for the runC API [2], and runC has an early-exit create [3].

The "Callers MAY block..." wording is going to be hard to enforce, but
with the runC model, clients rely on the command exits to trigger
post-create and post-start activity.  The longer the runtime hangs
around after completing its action, the laggier those triggers will
be.

The "MUST NOT attempt to read from its stdin" means a generic caller
can safely exec the command with a closed or null stdin, and not have
to worry about the command blocking or crashing because of that.  The
stdout spec for start/delete is more lenient, because runtimes are
unlikely to change their behavior because they are unable to write to
stdout.  If this assumption proves troublesome, we may have to tighten
it up later.

The ptrace idea in this commit is from Mrunal [4].

[1]: opencontainers/runtime-spec#508
     Subject: runtime: Add an 'event' operation for subscribing to pushes
[2]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15
[3]: opencontainers/runc#827
     Summary: Implement create and start
[4]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/oci-command-line-api that referenced this pull request Feb 9, 2017
Similar to the 'signal' command removed in b922732 (Drop exec, pause,
resume, and signal, 2015-12-02, #3).  The
runtime-spec gained a kill operation as part of
opencontainers/runtime-spec@be594153 (Split create and start,
2016-04-01, opencontainers/runtime-spec#384).  The interface is based
on POSIX [1], util-linux [2], and GNU coreutils [3].

The TERM/KILL requirement is a minimum portability requirement for
soft/hard stops.  Windows lacks POSIX signals [4], and currently
supports soft stops in Docker with whatever is behind
hcsshim.ShutdownComputeSystem [5].  The docs we're landing here
explicitly allow that sort of substitution, because we need to have
soft/hard stop on those platforms but *can't* use POSIX signals.  They
borrow wording from opencontainers/runtime-spec@35b0e9ee (config:
Clarify MUST for platform.os and .arch, 2016-05-19,
opencontainers/runtime-spec#441) to recommend runtime authors document
the alternative technology so bundle-authors can prepare (e.g. by
installing the equivalent to a SIGTERM signal handler).

[1]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html
[2]: http://man7.org/linux/man-pages/man1/kill.1.html
[3]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html
[4]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE
     Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356)
     Date: Thu, 26 May 2016 11:03:29 -0700
     Message-ID: <20160526180329.GL17496@odin.tremily.us>
[5]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230
     moby/moby@bc503ca8 (Windows: [TP4] docker kill handling,
     2015-10-12, moby/moby#16997)

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/oci-command-line-api that referenced this pull request Feb 9, 2017
Catch up with opencontainers/runtime-spec@be594153 (Split create and
start, 2016-04-01, opencontainers/runtime-spec#384).

One benefit of the early-exit 'create' is that the exit code does not
conflate container process exits with "failed to setup the sandbox"
exits.  We can take advantage of that and use non-zero 'create' exits
to allow stderr writing (so the runtime can log errors while dying
without having to successfully connect to syslog or some such).

I still likes the long-running 'create' API because it makes
collecting the exit code easier.  I've proposed an 'event' operation
[1] which would provide a convenient created trigger.  With 'event' in
place, we don't need the 'create' process exit to serve as that
trigger, and could have a long-running 'create' that collects the
container process exit code using the portable waitid() family.  But
the consensus after the 2016-07-13 meeting was to table that while we
land docs for the runC API [2], and runC has an early-exit create [3].

The "Callers MAY block..." wording is going to be hard to enforce, but
with the runC model, clients rely on the command exits to trigger
post-create and post-start activity.  The longer the runtime hangs
around after completing its action, the laggier those triggers will
be.

The "MUST NOT attempt to read from its stdin" means a generic caller
can safely exec the command with a closed or null stdin, and not have
to worry about the command blocking or crashing because of that.  The
stdout spec for start/delete is more lenient, because runtimes are
unlikely to change their behavior because they are unable to write to
stdout.  If this assumption proves troublesome, we may have to tighten
it up later.

The ptrace idea in this commit is from Mrunal [4].

[1]: opencontainers/runtime-spec#508
     Subject: runtime: Add an 'event' operation for subscribing to pushes
[2]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15
[3]: opencontainers/runc#827
     Summary: Implement create and start
[4]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Feb 27, 2017
Since be59415 (Split create and start, 2016-04-01, opencontainers#384), it's
possible for a container process to never execute user-specified code
(e.g. you can call 'create', 'kill', 'delete' without calling
'start').  For folks who expect to do that, there's no reason to
define process.args.

The only other process property required for all platforms is 'cwd',
but the runtime's idler code isn't specified in sufficient detail for
the configuration author to have an opinion about what its working
directory should be.

On Linux and Solaris, 'user' is also required for 'uid' and 'gid'.  My
preferred approach here is to make those optional and define defaults
[1,2]:

  If unset, the runtime will not attempt to manipulate the user ID
  (e.g. not calling setuid(2) or similar).

But the maintainer consensus is that they want those to be explicitly
required properties [3,4,5].  With the current spec, one option could
be to make process optional (with the idler's working directory
unspecified) for OSes besides Linux and Solaris.  On Windows, username
is optional, but that was likely accidental [6].

So an unspecified 'process' would leave process.cwd and process.user
unset.  What that means for the implementation-defined container
process between 'create' and 'start' is unclear, but clarifying how
that is handled is a separate issue [7] independent of whether
'process' is optional or not.

[1]: opencontainers#417 (comment)
[2]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/DWdystx5X3A
     Subject: Exposing platform defaults
     Date: Thu, 14 Jan 2016 15:36:26 -0800
     Message-ID: <20160114233625.GN6362@odin.tremily.us>
[3]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-05-04-17.00.log.html#l-44
[4]: opencontainers#417 (comment)
[5]: opencontainers#417 (comment)
[6]: opencontainers#618 (comment)
[7]: opencontainers#700

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Mar 1, 2017
I expect the lifecycle information was removed accidentally in
be59415 (Split create and start, 2016-04-01, opencontainers#384), because for a
time it seemed like that PR would also be removing hooks.  Putting the
lifecycle information back in, I made some tweaks to adjust to the new
environment, for example:

* Put the pre-start hooks after the 'start' call, but before the meat
  of the start call (the container-process exec trigger).  Folks who
  want a post-create hook can add one with that name.  I'd like to
  have renamed poststop to post-delete to avoid confusion like [1].
  But the motivation for keeping hooks was backwards compatibility [2]
  so I've left the name alone.

* Put each "...command is invoked..." lifecycle entry in its own list
  entry, to match the 'create' list entry.

* Move the rules about what happens on hook failure into the
  lifecycle.  This matches pre-split entries like:

    If any prestart hook fails, then the container MUST be stopped and
    the lifecycle continues at step 7.

  and avoids respecifying that information in a second location
  (config.md).

* I added the warning section to try and follow post-split's generic
  "generates an error" approach while respecting the pre-split desire
  to see what failed (we had "then an error including the exit code
  and the stderr is returned to the caller" and "then an error is
  logged").

* I left the state 'id' context out, since Michael didn't want it [3].

[1]: opencontainers#395
     Subject: Run post-stop hooks before the container sandbox is deleted.
[2]: opencontainers#483 (comment)
     Subject: *: Remove hooks
[3]: opencontainers#532 (comment)
     Subject: Restore hook language removed by create/start split

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Mar 3, 2017
I expect the lifecycle information was removed accidentally in
be59415 (Split create and start, 2016-04-01, opencontainers#384), because for a
time it seemed like that PR would also be removing hooks.  Putting the
lifecycle information back in, I made some tweaks to adjust to the new
environment, for example:

* Put the pre-start hooks after the 'start' call, but before the meat
  of the start call (the container-process exec trigger).  Folks who
  want a post-create hook can add one with that name.  I'd like to
  have renamed poststop to post-delete to avoid confusion like [1].
  But the motivation for keeping hooks was backwards compatibility [2]
  so I've left the name alone.

* Put each "...command is invoked..." lifecycle entry in its own list
  entry, to match the 'create' list entry.

* Move the rules about what happens on hook failure into the
  lifecycle.  This matches pre-split entries like:

    If any prestart hook fails, then the container MUST be stopped and
    the lifecycle continues at step 7.

  and avoids respecifying that information in a second location
  (config.md).

* I added the warning section to try and follow post-split's generic
  "generates an error" approach while respecting the pre-split desire
  to see what failed (we had "then an error including the exit code
  and the stderr is returned to the caller" and "then an error is
  logged").

* I left the state 'id' context out, since Michael didn't want it [3].

* Make runtime.md references to "generate an error" and "log a
  warning" links, so readers have an easier time finding more detail
  on that wording.

Where I reference a section, I'm still using the auto-generated anchor
for that header and not the anchors which were added in 41839d7 (Merge
pull request opencontainers#707 from mrunalp/anchor_tags, 2017-03-03) and similar.
Mrunal suggested that the manually-added anchors were mainly intended
for the validation tooling [4].

[1]: opencontainers#395
     Subject: Run post-stop hooks before the container sandbox is deleted.
[2]: opencontainers#483 (comment)
     Subject: *: Remove hooks
[3]: opencontainers#532 (comment)
     Subject: Restore hook language removed by create/start split
[4]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2017-03-03.log.html#t2017-03-03T18:02:12

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Mar 16, 2017
Step 3 of the lifecycle from before this commit had two sentences
which both landed in be59415 (Split create and start, 2016-04-01,
opencontainers#384).  I pushed back a bit on the entry then [1,2], but we seem to be
pretty comfortable with the current "keep all lifecyle entries in a
one-layer enumerated list" approach, so I'm leaving that alone in this
commit.  Step 3 isn't really a lifecycle step though, it's more about
clarifying that you can jump around in the lifecycle instead of
hitting all the steps in consecutive order.  In this commit, I've
addressed that in a new paragraph that follows the list.

The other sentence from the old step 3 doesn't need replacing, because
the limits are already covered in more detail in the operation
sections themselves.  For example, the 'delete' operation has:

  Attempting to delete a container that does not exist MUST generate
  an error.  Attempting to delete a container whose process is still
  running MUST generate an error.

I don't see the need to call generic attention to that idea, and
especially do not think that an entry in the lifecycle list is the
right place for such a generic call-out.

[1]: opencontainers#384 (comment)
[2]: opencontainers#384 (comment)

Signed-off-by: W. Trevor King <wking@tremily.us>
@wking wking mentioned this pull request Apr 4, 2017
wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request May 10, 2017
Step 3 of the lifecycle from before this commit had two sentences
which both landed in be59415 (Split create and start, 2016-04-01,
opencontainers#384).  I pushed back a bit on the entry then [1,2], but we seem to be
pretty comfortable with the current "keep all lifecyle entries in a
one-layer enumerated list" approach, so I'm leaving that alone in this
commit.  Step 3 isn't really a lifecycle step though, it's more about
clarifying that you can jump around in the lifecycle instead of
hitting all the steps in consecutive order.  I'd floated a new
paragraph addressing that jumping, but was unable to form a consensus
around wording, and the jumping is already somewhat covered by the
current list entries (e.g. "The container process exits.").  This
commit just drops the old step 3, and Michael will follow up with
wording about jumping [3].

The other sentence from the old step 3 doesn't need replacing, because
the limits are already covered in more detail in the operation
sections themselves.  For example, the 'delete' operation has:

  Attempting to delete a container that does not exist MUST generate
  an error.  Attempting to delete a container whose process is still
  running MUST generate an error.

I don't see the need to call generic attention to that idea, and
especially do not think that an entry in the lifecycle list is the
right place for such a generic call-out.

[1]: opencontainers#384 (comment)
[2]: opencontainers#384 (comment)
[3]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2017/opencontainers.2017-05-10-21.03.log.html#l-79

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/nmbug-oci that referenced this pull request Jul 26, 2017
With [1], which landed on 2016-05-26 [2], 'kill' is signalling "the
process in the container".  And since [3] (2017-06-04), the only
remaining "processes" references have to do with namespaces, where the
namespace in question is fairly clear.  So I don't think there's much
uncertainty left about what "container processes" means, or any
implication that it's a single set of processes.

[1]: https://github.com/duglin/runtime-spec/blob/be594153b522f52bebd0500cef6fe0f1d77f6a59/runtime.md#kill
[2]: opencontainers/runtime-spec#384 (comment)
[3]: opencontainers/runtime-spec#809 (comment)
wking added a commit to wking/ccon that referenced this pull request Feb 26, 2018
Along the lines of the create/start split we've been discussing for
the OCI [1,2].  Pull some common functionality into a libccon,
although I don't intend to make that a more than an internal helper.

Post-error cleanup in ccon-cli is pretty slim, since the process is
just about to die anyway.  I'll probably go back through and add
proper cleanup later.

serve_socket gets too deeply nested for my taste, but it's not quite
to the point where I'd pull out the request handling into a
handle_start_request helper or some such.  Mostly because I'd have to
either setup a new buf/iov/msg in the helper's stack or pass that all
through from serve_socket ;).

A few notes on the tests:

I don't have a stand-alone 'wait' on my system (it's built into most
shells [3]), but I've added an explicit check for it because POSIX
doesn't require it to be built in.

The waiting in the create/start tests is a bit awkward, but here's the
reasoning for the various steps:

* Put the socket in a 'sock' subdirectory so we don't have to mess
  with inotifywait's --exclude (when what we really want is an
  --include).  In most cases, the socket would be the first thing
  created in the test directory, but the process.host test will create
  a pivot-root.* before creating the socket.

* Clear a 'wait' file before launching the inotifywait/start subshell
  and append to it from that subshell so grep has something to look at
  (even if it racily starts looking before the subshell processes the
  inotifywait line).

* Block on a busy-wait grep until inotifywait sets up its watcher.
  This ensures we don't call ccon and create the socket before the
  watcher is looking.

  The zero-second sleep while we wait for inotifywait to setup its
  watcher is busy, but POSIX doesn't require 'sleep' to support
  non-integer times [4].

All of these issues could be abstracted out into an 'event' command
[5], but they're fine for a proof-of-concept create/start split.

[1]: https://groups.google.com/a/opencontainers.org/d/msg/dev/qWHoKs8Fsrk/k55FQrBzBgAJ
     Subject: Re: Splitting Start into create() and run()
     Date: Thu, 24 Mar 2016 15:04:14 -0700
     Message-ID: <20160324220414.GC23066@odin.tremily.us>
[2]: opencontainers/runtime-spec#384
     Subject: Split create and start
[3]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap01.html#tag_17_06
[4]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sleep.html
[5]: opencontainers/runtime-spec#508
     Subject: runtime: Add an 'event' operation for subscribing to
       pushes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants