Skip to content

Conversation

MaciejKaras
Copy link
Collaborator

@MaciejKaras MaciejKaras commented Sep 24, 2025

Summary

Based on the HELP-81729 ticket I investigated if our workloads align with restricted Pod Security Standards security level. Unfortunately we cannot test enforcing of the rules easily and guarantee meeting restricted profile. This is mainly because how our e2e tests are setup. For example we are using istio, which adds istio-init containers to provide service mesh network capabilities and Istio containers do not follow restricted profile. Our tests pod also does not follow the PSS requirements. There are also other issues we have faced when testing the enforcement and this requires more time allocation and we cannot promise timelines and priorities.

Because of this I have enabled the warnmode for restricted security level instead of enforce. For one complex e2e_om_ops_manager_backup_sharded_cluster test I have enforced the restricted level and only in single cluster, so that we can monitor our PSS alignment. More about levels and modes can be found here

Proof of Work

Passing CI for all tests that have warning + passing enforcement for e2e_om_ops_manager_backup_sharded_cluster test.

Example warning from the e2e_om_ops_manager_backup_tls test:

[2025/09/24 14:59:58.691] Warning: would violate PodSecurity "restricted:latest": unrestricted capabilities (container "istio-init" must not include "NET_ADMIN", "NET_RAW" in securityContext.capabilities.add), runAsNonRoot != true (container "istio-init" must not set securityContext.runAsNonRoot=false), runAsUser=0 (container "istio-init" must not set runAsUser=0)

Checklist

  • Have you linked a jira ticket and/or is the ticket in the title?
  • Have you checked whether your jira ticket required DOCSP changes?
  • Have you added changelog file?

Copy link

github-actions bot commented Sep 24, 2025

⚠️ (this preview might not be accurate if the PR is not rebased on current master branch)

MCK 1.5.0 Release Notes

New Features

  • Improve automation agent certificate rotation: the agent now restarts automatically when its certificate is renewed, ensuring smooth operation without manual intervention and allowing seamless certificate updates without requiring manual Pod restarts.

Bug Fixes

  • To follow the Pod Security Standards more secure default pod securityContext settings were added.
    Operator deployment securityContext settings that have changed:

    • allowPrivilegeEscalation: false
    • capabilities.drop: [ ALL ]
    • seccompProfile.type: RuntimeDefault

    Other workloads:

    • capabilities.drop: [ ALL ] - container level
    • seccompProfile.type: RuntimeDefault - pod level

Note: If you require less restrictive securityContext settings please use template or podTemplate overrides.
Detailed information about overrides can be found in Modify Ops Manager or MongoDB Kubernetes Resource Containers.

SecurityContext: &corev1.SecurityContext{
ReadOnlyRootFilesystem: ptr.To(true),
AllowPrivilegeEscalation: ptr.To(false),
Capabilities: &corev1.Capabilities{
Copy link
Contributor

@lsierant lsierant Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any potential that adding this default will break any customer's workload (or rather prevent the operator from deploying or the workload sts from restarting) and will require some manual intervention? Just thinking about our semver guarantees.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is our own deployment, that we manage. If the customer wants a managedSecurityContext they are allowed to, but otherwise we should be able to modify the one we provide.

cc @mircea-cosbuc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is our defaults. Only problem I see with this is some customers now requiring explicitly setting capabilities.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be consider as a security fix? In that case we should be able to overwrite our defaults if they are not secure even if this forces customers to explicitly specify custom capabilities. What do you think?

Copy link
Contributor

@lsierant lsierant Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, do we force customers (which don't care about it) to do any manual fix when upgrading? If yes, we need to bump major.

Copy link
Collaborator Author

@MaciejKaras MaciejKaras Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what will we break here. We are changing our default SecurityContext for operator and other pods created. If customer wants to have dedicated SecurityContext or PodSecurityContext they need to set MANAGED_SECURITY_CONTEXT env var and our defaults will be completely overwritten. If they don't set MANAGED_SECURITY_CONTEXT every change they make to SecurityContext manually will be overwritten by our defaults.

Code that handles securityContext settings:

func WithDefaultSecurityContextsModifications() (Modification, container.Modification) {
managedSecurityContext := envvar.ReadBool(ManagedSecurityContextEnv) // nolint:forbidigo
configureContainerSecurityContext := container.NOOP()
configurePodSpecSecurityContext := NOOP()
if !managedSecurityContext {
configurePodSpecSecurityContext = WithSecurityContext(DefaultPodSecurityContext())
configureContainerSecurityContext = container.WithSecurityContext(container.DefaultSecurityContext())
}
return configurePodSpecSecurityContext, configureContainerSecurityContext
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have discussed with @lsierant that the change for the more strict Capabilities is only applied on the db/om containers, not the whole Pod. This will not affect other containers in the Pod i.e. security, istio sidecars that customer can have. The only change on the Pod level is adding seccompProfile: type: RuntimeDefault. We can do two things with it:

  • move the seccompProfile: type: RuntimeDefault to container level and don't specify it on pod level. We will have our containers with secure seccomp settings, but if customer will add any sidecar to it it will not have seccomp settings applied
  • keep it as is and secure entire Pod

@mircea-cosbuc looking for guidance here how to proceed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's best to set it at pod level. I think based on @lsierant this needs clarity on what customers might need to change on upgrade (if anything), outlining those scenarios and deciding if it's a breaking change.

Copy link
Collaborator Author

@MaciejKaras MaciejKaras Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked what are the consequences for using seccomp: type: RuntimeDefault and it defaults to what container runtime is used. Containerd and docker for example have very similar default seccomp profile -> https://docs.docker.com/engine/security/seccomp/#significant-syscalls-blocked-by-the-default-profile

Based on what I have found in official Kubernetes docs:

These profiles may differ between runtimes like CRI-O or containerd. They also differ for its used hardware architectures. But generally speaking, those default profiles allow a common amount of syscalls while blocking the more dangerous ones, which are unlikely or unsafe to be used in a containerized application.

Additionally on Red Hat OpenShift Container Platform RuntimeDefault is often enforced by default via Security Context Constraints (SCCs).

To summarise it is unlikely that users of our Operator require more syscalls permissions in MongoDB workloads than what is allowed by RuntimeDefault seccomp. Nevertheless I should add comment in the changelog how to mitigate securityContext defaults by using managedSecurityContext.

@lsierant @mircea-cosbuc let me know if that justifies approving PR. I have already edited changelog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants