Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new gcs hooks, add expected mounts to security policy #1258

Merged
merged 3 commits into from
Mar 22, 2022

Conversation

anmaxvl
Copy link
Contributor

@anmaxvl anmaxvl commented Dec 29, 2021

Introduce a new wait-paths binary, which polls file system
until requested paths are available or a timeout is reached.

Security policy has been updated to have ExpectedMounts entries,
which will be used in conjunction with "wait-paths" hook for
synchronization purposes.

Refactor oci-hook logic into its own internal package and update
existing code to use that package. Copy runc HookName and constants
definitions to break dependency on runc

Introduce ExpectedMounts as part of security policy language and
the logic to enforce the policy, which resolves the expected mounts
in the UVM and adds a wait-paths hook to the spec.

Signed-off-by: Maksim An maksiman@microsoft.com

@anmaxvl anmaxvl marked this pull request as ready for review December 30, 2021 18:50
@anmaxvl anmaxvl requested a review from a team as a code owner December 30, 2021 18:50
@anmaxvl
Copy link
Contributor Author

anmaxvl commented Jan 4, 2022

@SeanTAllen @KenGordon @svolos FYI

@katiewasnothere katiewasnothere self-assigned this Jan 4, 2022
@dcantah dcantah self-assigned this Jan 5, 2022
@anmaxvl anmaxvl force-pushed the sync-hooks branch 2 times, most recently from 263fca3 to b8c51a7 Compare January 12, 2022 21:18
@SeanTAllen
Copy link
Contributor

I checked this out a while ago, but forgot to comment that it looked good and I didnt see anything problematic with the approach.

@anmaxvl anmaxvl force-pushed the sync-hooks branch 3 times, most recently from a5d40c5 to 8f037da Compare February 17, 2022 17:50
@katiewasnothere
Copy link
Contributor

Do we have any tests for the expected mounts work in here?

@anmaxvl
Copy link
Contributor Author

anmaxvl commented Feb 24, 2022

Do we have any tests for the expected mounts work in here?

If you're talking about CRI integration tests, then it's currently blocked on the integrity validation PRs. I am planning on adding the tests once those are merged in as a separate PR. I've done local testing though.

@anmaxvl
Copy link
Contributor Author

anmaxvl commented Mar 3, 2022

@katiewasnothere PTAL when you get a chance.

@anmaxvl anmaxvl force-pushed the sync-hooks branch 2 times, most recently from 25f5700 to 309451b Compare March 14, 2022 23:57
Introduce a new `wait-paths` binary, which polls file system
until requested paths are available or a timeout is reached.

Security policy has been updated to have ExpectedMounts entries,
which will be used in conjunction with "wait-paths" hook for
synchronization purposes.

Refactor oci-hook logic into its own internal package and update
existing code to use that package. Copy runc HookName and constants
definitions to break dependency on runc

Introduce ExpectedMounts as part of security policy language and
the logic to enforce the policy, which resolves the expected mounts
in the UVM and adds a wait-paths hook to the spec.

Signed-off-by: Maksim An <maksiman@microsoft.com>
Per pr feedback, only container paths under sandbox mounts are
supported as wait-paths. The support for other 2 scenarios can be
added as needed.

Add positive and negative CRI tests.

Signed-off-by: Maksim An <maksiman@microsoft.com>
Signed-off-by: Maksim An <maksiman@microsoft.com>
Copy link
Contributor

@katiewasnothere katiewasnothere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@anmaxvl anmaxvl merged commit 51aee6b into microsoft:master Mar 22, 2022
@anmaxvl anmaxvl deleted the sync-hooks branch March 22, 2022 04:45
anmaxvl added a commit to anmaxvl/hcsshim that referenced this pull request Apr 15, 2022
Prior to this change, GCS allowed only one CreateContainer operation
at a time. This isn't an issue in general case, however this doesn't
work properly with synchronization via OCI runtime hook.

Synchronization via runtime hook was introduced in:
microsoft#1258
It injects a CreateRuntime OCI hook, if security policy provides
wait paths.
This allows container-A to run after container-B, where container-B
writes to an empty directory volume shared between the two containers
to signal that it's done some setup container-A depends on.
In general case, container-A can be started before container-B which
results in a deadlock, because CreateContainer request holds a lock
to a map, which keeps track of running containers.

To resolve the issue, the code has been updated to do a more granular
locking when reading/updating the containers map:
  - Add a new "Status" field to Container object, which can be either
    "Running" or "Creating".
  - Remove locking from CreateContainer function
  - Rework "GetContainer" to "GetRunningContainer", which returns
    the container object only when it's in "Running" state, otherwise
    either `gcserr.HrVmcomputeSystemNotFound` or `gcserr.HrVmcomputeInvalidState`
    error returned.
  - Add new "AddContainer(id, container)" method, which updates the
    containers map with new container instances.
  - Rework CreateContainer to initially add new container objects into
    the containers map and set the "Status" to "Creating" at the start
    of the function and set it to "Running" only when the container
    successfully starts.

Changin "GetContainer" to "GetRunningContainer" seemed to be the least
invasive change, which allows us to limit code changes in the affected
places. If "GetContainer" is left unchanged, then handling of containers
in status "Creating" needs to take place and this requires handling cases
when (e.g.) a modification request is sent to a container which isn't yet
running.

Signed-off-by: Maksim An <maksiman@microsoft.com>
anmaxvl added a commit to anmaxvl/hcsshim that referenced this pull request Apr 15, 2022
Prior to this change, GCS allowed only one CreateContainer operation
at a time. This isn't an issue in general case, however this doesn't
work properly with synchronization via OCI runtime hook.

Synchronization via runtime hook was introduced in:
microsoft#1258
It injects a CreateRuntime OCI hook, if security policy provides
wait paths.
This allows container-A to run after container-B, where container-B
writes to an empty directory volume shared between the two containers
to signal that it's done some setup container-A depends on.
In general case, container-A can be started before container-B which
results in a deadlock, because CreateContainer request holds a lock
to a map, which keeps track of running containers.

To resolve the issue, the code has been updated to do a more granular
locking when reading/updating the containers map:
  - Add a new "Status" field to Container object, which can be either
    "Running" or "Creating".
  - Remove locking from CreateContainer function
  - Rework "GetContainer" to "GetRunningContainer", which returns
    the container object only when it's in "Running" state, otherwise
    either `gcserr.HrVmcomputeSystemNotFound` or `gcserr.HrVmcomputeInvalidState`
    error returned.
  - Add new "AddContainer(id, container)" function, which updates the
    containers map with new container instances.
  - Rework CreateContainer to initially add new container objects into
    the containers map and set the "Status" to "Creating" at the start
    of the function and set it to "Running" only when the container
    successfully starts.

Changin "GetContainer" to "GetRunningContainer" seemed to be the least
invasive change, which allows us to limit code changes in the affected
places. If "GetContainer" is left unchanged, then handling of containers
in status "Creating" needs to take place and this requires handling cases
when (e.g.) a modification request is sent to a container which isn't yet
running.

Signed-off-by: Maksim An <maksiman@microsoft.com>
anmaxvl added a commit to anmaxvl/hcsshim that referenced this pull request Apr 15, 2022
Prior to this change, GCS allowed only one CreateContainer operation
at a time. This isn't an issue in general case, however this doesn't
work properly with synchronization via OCI runtime hook.

Synchronization via runtime hook was introduced in:
microsoft#1258
It injects a CreateRuntime OCI hook, if security policy provides
wait paths.
This allows container-A to run after container-B, where container-B
writes to an empty directory volume shared between the two containers
to signal that it's done some setup container-A depends on.
In general case, container-A can be started before container-B which
results in a deadlock, because CreateContainer request holds a lock
to a map, which keeps track of running containers.

To resolve the issue, the code has been updated to do a more granular
locking when reading/updating the containers map:
  - Add a new "Status" field to Container object, which can be either
    "Running" or "Creating".
  - Remove locking from CreateContainer function
  - Rework "GetContainer" to "GetRunningContainer", which returns
    the container object only when it's in "Running" state, otherwise
    either `gcserr.HrVmcomputeSystemNotFound` or `gcserr.HrVmcomputeInvalidState`
    error returned.
  - Add new "AddContainer(id, container)" function, which updates the
    containers map with new container instances.
  - Rework CreateContainer to initially add new container objects into
    the containers map and set the "Status" to "Creating" at the start
    of the function and set it to "Running" only when the container
    successfully starts.

Reworking "GetContainer" to "GetRunningContainer" seemed to be the least
invasive change, which allows us to limit updates in the affected places.
If "GetContainer" is left unchanged, then handling of containers in status
"Creating" needs to take place and this requires handling cases when (e.g.)
a modification request is sent to a container which isn't yet running.

Signed-off-by: Maksim An <maksiman@microsoft.com>
anmaxvl added a commit to anmaxvl/hcsshim that referenced this pull request Apr 15, 2022
Prior to this change, GCS allowed only one CreateContainer operation
at a time. This isn't an issue in general case, however this doesn't
work properly with synchronization via OCI runtime hook.

Synchronization via runtime hook was introduced in:
microsoft#1258
It injects a CreateRuntime OCI hook, if security policy provides
wait paths.
This allows container-A to run after container-B, where container-B
writes to an empty directory volume shared between the two containers
to signal that it's done some setup container-A depends on.
In general case, container-A can be started before container-B which
results in a deadlock, because CreateContainer request holds a lock
to a map, which keeps track of running containers.

To resolve the issue, the code has been updated to do a more granular
locking when reading/updating the containers map:
  - Add a new "Status" field to Container object, which can be either
    "Running" or "Creating".
  - Remove locking from CreateContainer function
  - Rework "GetContainer" to "GetRunningContainer", which returns
    the container object only when it's in "Running" state, otherwise
    either `gcserr.HrVmcomputeSystemNotFound` or `gcserr.HrVmcomputeInvalidState`
    error returned.
  - Add new "AddContainer(id, container)" function, which updates the
    containers map with new container instances.
  - Rework CreateContainer to initially add new container objects into
    the containers map and set the "Status" to "Creating" at the start
    of the function and set it to "Running" only when the container
    successfully starts.

Reworking "GetContainer" to "GetRunningContainer" seemed to be the least
invasive change, which allows us to limit updates in the affected places.
If "GetContainer" is left unchanged, then handling of containers in status
"Creating" needs to take place and this requires handling cases when (e.g.)
a modification request is sent to a container which isn't yet running.

Signed-off-by: Maksim An <maksiman@microsoft.com>
anmaxvl added a commit to anmaxvl/hcsshim that referenced this pull request Apr 15, 2022
Prior to this change, GCS allowed only one CreateContainer operation
at a time. This isn't an issue in general case, however this doesn't
work properly with synchronization via OCI runtime hook.

Synchronization via runtime hook was introduced in:
microsoft#1258
It injects a CreateRuntime OCI hook, if security policy provides
wait paths.
This allows container-A to run after container-B, where container-B
writes to an empty directory volume shared between the two containers
to signal that it's done some setup container-A depends on.
In general case, container-A can be started before container-B which
results in a deadlock, because CreateContainer request holds a lock
to a map, which keeps track of running containers.

To resolve the issue, the code has been updated to do a more granular
locking when reading/updating the containers map:
  - Add a new "Status" field to Container object, which can be either
    "Running" or "Creating".
  - Remove locking from CreateContainer function
  - Rework "GetContainer" to "GetRunningContainer", which returns
    the container object only when it's in "Running" state, otherwise
    either `gcserr.HrVmcomputeSystemNotFound` or `gcserr.HrVmcomputeInvalidState`
    error returned.
  - Add new "AddContainer(id, container)" function, which updates the
    containers map with new container instances.
  - Rework CreateContainer to initially add new container objects into
    the containers map and set the "Status" to "Creating" at the start
    of the function and set it to "Running" only when the container
    successfully starts.

Reworking "GetContainer" to "GetRunningContainer" seemed to be the least
invasive change, which allows us to limit updates in the affected places.
If "GetContainer" is left unchanged, then handling of containers in status
"Creating" needs to take place and this requires handling cases when (e.g.)
a modification request is sent to a container which isn't yet running.

Signed-off-by: Maksim An <maksiman@microsoft.com>
anmaxvl added a commit to anmaxvl/hcsshim that referenced this pull request Apr 15, 2022
Prior to this change, GCS allowed only one CreateContainer operation
at a time. This isn't an issue in general case, however this doesn't
work properly with synchronization via OCI runtime hook.

Synchronization via runtime hook was introduced in:
microsoft#1258
It injects a CreateRuntime OCI hook, if security policy provides
wait paths.
This allows container-A to run after container-B, where container-B
writes to an empty directory volume shared between the two containers
to signal that it's done some setup container-A depends on.
In general case, container-A can be started before container-B which
results in a deadlock, because CreateContainer request holds a lock
to a map, which keeps track of running containers.

To resolve the issue, the code has been updated to do a more granular
locking when reading/updating the containers map:
  - Add a new "Status" field to Container object, which can be either
    "Running" or "Creating".
  - Remove locking from CreateContainer function
  - Rework "GetContainer" to "GetRunningContainer", which returns
    the container object only when it's in "Running" state, otherwise
    either `gcserr.HrVmcomputeSystemNotFound` or `gcserr.HrVmcomputeInvalidState`
    error returned.
  - Add new "AddContainer(id, container)" function, which updates the
    containers map with new container instances.
  - Rework CreateContainer to initially add new container objects into
    the containers map and set the "Status" to "Creating" at the start
    of the function and set it to "Running" only when the container
    successfully starts.

Reworking "GetContainer" to "GetRunningContainer" seemed to be the least
invasive change, which allows us to limit updates in the affected places.
If "GetContainer" is left unchanged, then handling of containers in status
"Creating" needs to take place and this requires handling cases when (e.g.)
a modification request is sent to a container which isn't yet running.

Signed-off-by: Maksim An <maksiman@microsoft.com>
anmaxvl added a commit to anmaxvl/hcsshim that referenced this pull request Apr 22, 2022
Prior to this change, GCS allowed only one CreateContainer operation
at a time. This isn't an issue in general case, however this doesn't
work properly with synchronization via OCI runtime hook.

Synchronization via runtime hook was introduced in:
microsoft#1258
It injects a CreateRuntime OCI hook, if security policy provides
wait paths.
This allows container-A to run after container-B, where container-B
writes to an empty directory volume shared between the two containers
to signal that it's done some setup container-A depends on.
In general case, container-A can be started before container-B which
results in a deadlock, because CreateContainer request holds a lock
to a map, which keeps track of running containers.

To resolve the issue, the code has been updated to do a more granular
locking when reading/updating the containers map:
  - Add a new "Status" field to Container object, which can be either
    "Running" or "Creating".
  - Remove locking from CreateContainer function
  - Rework "GetContainer" to "GetRunningContainer", which returns
    the container object only when it's in "Running" state, otherwise
    either `gcserr.HrVmcomputeSystemNotFound` or `gcserr.HrVmcomputeInvalidState`
    error returned.
  - Add new "AddContainer(id, container)" function, which updates the
    containers map with new container instances.
  - Rework CreateContainer to initially add new container objects into
    the containers map and set the "Status" to "Creating" at the start
    of the function and set it to "Running" only when the container
    successfully starts.

Reworking "GetContainer" to "GetRunningContainer" seemed to be the least
invasive change, which allows us to limit updates in the affected places.
If "GetContainer" is left unchanged, then handling of containers in status
"Creating" needs to take place and this requires handling cases when (e.g.)
a modification request is sent to a container which isn't yet running.

Signed-off-by: Maksim An <maksiman@microsoft.com>
anmaxvl added a commit that referenced this pull request Apr 22, 2022
Prior to this change, GCS allowed only one CreateContainer operation
at a time. This isn't an issue in general case, however this doesn't
work properly with synchronization via OCI runtime hook.

Synchronization via runtime hook was introduced in:
#1258
It injects a `CreateRuntime` OCI hook, if security policy provides
wait paths.
This allows container-A to run after container-B, where container-B
writes to an empty directory volume shared between the two containers
to signal that it's done some setup container-A depends on.
In general case, container-A can be started before container-B which
results in a deadlock, because `CreateContainer` request holds a lock
to a map, which keeps track of running containers.

To resolve the issue, the code has been updated to do a more granular
locking when reading/updating the containers map:
  - Add a new "status" field to Container object and atomic setter/getter,
    which can be either "Created" or "Creating". New `uint32` type alias
    and constants were added to represent the values (`containerCreated`
    and `containerCreating`)
  - Remove locking from `CreateContainer` function
  - Rework `GetContainer` to `GetCreatedContainer`, which returns
    the container object only when it's in `containerCreated` state,
    otherwise either `gcserr.HrVmcomputeSystemNotFound` or
    `gcserr.HrVmcomputeInvalidState` error returned.
  - Add new `AddContainer(id, container)` function, which updates the
    containers map with new container instances.
  - Rework `CreateContainer` to initially add new container objects into
    the containers map and set the "status" to `containerCreating` at the
    start of the function and set it to `containerCreated` only when the
    container is successfully created in runtime.

Reworking `GetContainer` to `GetCreatedContainer` seemed to be the least
invasive change, which allows us to limit updates in the affected places.
If `GetContainer` is left unchanged, then handling of containers in status
"Creating" needs to take place and this requires handling cases when (e.g.)
a modification request is sent to a container which isn't yet running.

Additionally update synchronization CRI tests to use go routines
to properly reproduce the scenario.

Signed-off-by: Maksim An <maksiman@microsoft.com>
anmaxvl added a commit to anmaxvl/hcsshim that referenced this pull request Apr 30, 2022
Follow up PR to add tests for wait-paths after initial PR microsoft#1258
was merged.

Signed-off-by: Maksim An <maksiman@microsoft.com>
anmaxvl added a commit that referenced this pull request May 4, 2022
Follow up PR to add tests for wait-paths after initial PR #1258
was merged.

Signed-off-by: Maksim An <maksiman@microsoft.com>
anmaxvl added a commit that referenced this pull request Feb 7, 2023
princepereira pushed a commit to princepereira/hcsshim that referenced this pull request Aug 29, 2024
…1258)

Introduce a new `wait-paths` binary, which polls file system
until requested paths are available or a timeout is reached.

Security policy has been updated to have `ExpectedMounts` entries,
which will be used in conjunction with "wait-paths" hook for
synchronization purposes.

Refactor oci-hook logic into its own internal package and update
existing code to use that package. Copy runc HookName and constants
definitions to break dependency on runc

Introduce `ExpectedMounts` as part of security policy language and
the logic to enforce the policy, which resolves the expected mounts
in the UVM and adds a wait-paths hook to the spec.

Add positive and negative CRI tests.

Signed-off-by: Maksim An <maksiman@microsoft.com>
princepereira pushed a commit to princepereira/hcsshim that referenced this pull request Aug 29, 2024
…t#1355)

Prior to this change, GCS allowed only one CreateContainer operation
at a time. This isn't an issue in general case, however this doesn't
work properly with synchronization via OCI runtime hook.

Synchronization via runtime hook was introduced in:
microsoft#1258
It injects a `CreateRuntime` OCI hook, if security policy provides
wait paths.
This allows container-A to run after container-B, where container-B
writes to an empty directory volume shared between the two containers
to signal that it's done some setup container-A depends on.
In general case, container-A can be started before container-B which
results in a deadlock, because `CreateContainer` request holds a lock
to a map, which keeps track of running containers.

To resolve the issue, the code has been updated to do a more granular
locking when reading/updating the containers map:
  - Add a new "status" field to Container object and atomic setter/getter,
    which can be either "Created" or "Creating". New `uint32` type alias
    and constants were added to represent the values (`containerCreated`
    and `containerCreating`)
  - Remove locking from `CreateContainer` function
  - Rework `GetContainer` to `GetCreatedContainer`, which returns
    the container object only when it's in `containerCreated` state,
    otherwise either `gcserr.HrVmcomputeSystemNotFound` or
    `gcserr.HrVmcomputeInvalidState` error returned.
  - Add new `AddContainer(id, container)` function, which updates the
    containers map with new container instances.
  - Rework `CreateContainer` to initially add new container objects into
    the containers map and set the "status" to `containerCreating` at the
    start of the function and set it to `containerCreated` only when the
    container is successfully created in runtime.

Reworking `GetContainer` to `GetCreatedContainer` seemed to be the least
invasive change, which allows us to limit updates in the affected places.
If `GetContainer` is left unchanged, then handling of containers in status
"Creating" needs to take place and this requires handling cases when (e.g.)
a modification request is sent to a container which isn't yet running.

Additionally update synchronization CRI tests to use go routines
to properly reproduce the scenario.

Signed-off-by: Maksim An <maksiman@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants