Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: updates for CSI plugin improvements for 1.3.0 #12466

Merged
merged 2 commits into from
Apr 5, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 24 additions & 15 deletions website/content/docs/internals/plugins/csi.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -55,17 +55,25 @@ that perform both the controller and node roles in the same
instance. Not every plugin provider has or needs a controller; that's
specific to the provider implementation.

You should always run node plugins as Nomad `system` jobs and use the
`-ignore-system` flag on the `nomad node drain` command to ensure that the
node plugins are still running while the node is being drained. Use
constraints for the node plugin jobs based on the availability of volumes. For
example, AWS EBS volumes are specific to particular availability zones with a
region. Controller plugins can be run as `service` jobs.
Plugins mount and unmount volumes but are not in the data path once
the volume is mounted for a task. Plugin tasks are needed when tasks
using their volumes stop, so plugins should be left running on a Nomad
client until all tasks using their volumes are stopped. The `nomad
node drain` command handles this automatically by stopping plugin
tasks last.

Typically, you should run node plugins as Nomad `system` jobs so they
can mount volumes on any client where they are running. Controller
plugins can create and attach volumes anywhere they can communicate
with the storage provider's API, so they can usually be run as
`service` jobs. You should always run more than one controller plugin
allocation for high availability.

Nomad exposes a Unix domain socket named `csi.sock` inside each CSI
plugin task, and communicates over the gRPC protocol expected by the
CSI specification. The `mount_dir` field tells Nomad where the plugin
expects to find the socket file.
expects to find the socket file. The path to this socket is exposed in
the container as the `CSI_ENDPOINT` environment variable.

### Plugin Lifecycle and State

Expand Down Expand Up @@ -94,7 +102,7 @@ server and waits for a response; the allocation's tasks won't start
until the volume has been claimed and is ready.

If the volume's plugin requires a controller, the server will send an
RPC to the Nomad client where that controller is running. The Nomad
RPC to any Nomad client where that controller is running. The Nomad
client will forward this request over the controller plugin's gRPC
socket. The controller plugin will make the request volume available
to the node that needs it.
Expand All @@ -110,13 +118,14 @@ client, and the node plugin mounts the volume to a staging area in
the Nomad data directory. Nomad will bind-mount this staged directory
into each task that mounts the volume.

This cycle is reversed when a task that claims a volume becomes terminal. The
client will send an "unpublish" RPC to the server, which will send "detach"
RPCs to the node plugin. The node plugin unmounts the bind-mount from the
allocation and unmounts the volume from the plugin (if it's not in use by
another task). The server will then send "unpublish" RPCs to the controller
plugin (if any), and decrement the claim count for the volume. At this point
the volume’s claim capacity has been freed up for scheduling.
This cycle is reversed when a task that claims a volume becomes
terminal. The client frees the volume locally by making "unpublish"
RPCs to the node plugin. The node plugin unmounts the bind-mount from
the allocation and unmounts the volume from the plugin (if it's not in
use by another task). The client will then send an "unpublish" RPC to
the server, which will forward it to the the controller plugin (if
any), and decrement the claim count for the volume. At this point the
volume’s claim capacity has been freed up for scheduling.

[csi-spec]: https://github.com/container-storage-interface/spec
[csi-drivers-list]: https://kubernetes-csi.github.io/docs/drivers.html
Expand Down
29 changes: 15 additions & 14 deletions website/content/docs/job-specification/csi_plugin.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -51,28 +51,28 @@ option.

## Recommendations for Deploying CSI Plugins

CSI plugins run as Nomad jobs but after mounting the volume are not in the
data path for the volume. Jobs that mount volumes write and read directly to
CSI plugins run as Nomad tasks, but after mounting the volume are not in the
data path for the volume. Tasks that mount volumes write and read directly to
the volume via a bind-mount and there is no communication between the job and
the CSI plugin. But when an allocation that mounts a volume stops, Nomad will
need to communicate with the plugin on that allocation's node to unmount the
volume. This has implications on how to deploy CSI plugins:

* During node drains, jobs that claim volumes must be moved before the `node`
or `monolith` plugin for those volumes. You should run `node` or `monolith`
plugins as [`system`][system] jobs and use the `-ignore-system` flag on
`nomad node drain` to ensure that the plugins are running while the node is
being drained.
* If you are stopping jobs on a node, you must stop tasks that claim
volumes before stopping the `node` or `monolith` plugin for those
volumes. If you use the `node drain` feature, plugin tasks will
automatically be drained last.

* Only one plugin instance of a given plugin ID and type (controller or node)
should be deployed on any given client node. Use a constraint as shown
below.
* Only the most recently-placed allocation for a given plugin ID and
type (controller or node) will be used by any given client node. Run
`node` plugins as system jobs and distribute `controller` plugins
across client nodes using a constraint as shown below.

* Some plugins will create volumes only in the same location as the
plugin. For example, the AWS EBS plugin will create and mount volumes only
within the same Availability Zone. You should deploy these plugins with a
unique-per-AZ `plugin_id` to allow Nomad to place allocations in the correct
AZ.
plugin. For example, the AWS EBS plugin will create and mount
volumes only within the same Availability Zone. You should configure
your plugin task as recommended by the plugin's documentation to use
the [`topology_request`] field in your volume specification.

## `csi_plugin` Examples

Expand Down Expand Up @@ -124,3 +124,4 @@ job "plugin-efs" {
[csi]: https://github.com/container-storage-interface/spec
[csi_volumes]: /docs/job-specification/volume
[system]: /docs/schedulers#system
[`topology_request`]: /docs/commands/volume/create#topology_request