-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release/1.7 backport] runtime/v2/runc: handle early exits w/o big locks #8712
[release/1.7 backport] runtime/v2/runc: handle early exits w/o big locks #8712
Conversation
eventSendMu is causing severe lock contention when multiple processes start and exit concurrently. Replace it with a different scheme for maintaining causality w.r.t. start and exit events for a process which does not rely on big locks for synchronization. Keep track of all processes for which a Task(Exec)Start event has been published and have not yet exited in a map, keyed by their PID. Processing exits then is as simple as looking up which process corresponds to the PID. If there are no started processes known with that PID, the PID must either belong to a process which was started by s.Start() and before the s.Start() call has added the process to the map of running processes, or a reparented process which we don't care about. Handle the former case by having each s.Start() call subscribe to exit events before starting the process. It checks if the PID has exited in the time between it starting the process and publishing the TaskStart event, handling the exit if it has. Exit events for reparented processes received when no s.Start() calls are in flight are immediately discarded, and events received during an s.Start() call are discarded when the s.Start() call returns. Co-authored-by: Laura Brehm <laurabrehm@hey.com> Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit 5cd6210) Signed-off-by: Cory Snider <csnider@mirantis.com>
Hi @corhere. Thanks for your PR. I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks LGTM
The same test is also failing on the current head of the 1.7 branch, so is most likely unrelated to the changes in this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ❤️
Yea, it's unrelated |
containerd 1.7.3 Welcome to the v1.7.3 release of containerd! The third patch release for containerd 1.7 contains various fixes and updates. * **RunC: Update runc binary to v1.1.8** ([#8843](containerd/containerd#8843)) * **CRI: Fix additionalGids: it should fallback to imageConfig.User when securityContext.RunAsUser,RunAsUsername are empty** ([#8824](containerd/containerd#8824)) * **CRI: write generated CNI config atomically** ([#8825](containerd/containerd#8825)) * **Port-Forward: Correctly handle known errors** ([#8806](containerd/containerd#8806)) * **Resolve docker.NewResolver race condition** ([#8799](containerd/containerd#8799)) * **Fix net.ipv4.ping_group_range with userns** ([#8786](containerd/containerd#8786)) * **Runtime/V2/RunC: handle early exits w/o big locks** ([#8712](containerd/containerd#8712)) * **SecComp: always allow name_to_handle_at** ([#8753](containerd/containerd#8753)) * **CRI: Windows Pod Stats: Add a check to skip stats for containers that are not running** ([#8654](containerd/containerd#8654)) * **Task: don't `close()` io before `cancel()`** ([#8658](containerd/containerd#8658)) * **Remove CNI conf_template deprecation** ([#8638](containerd/containerd#8638)) * **Fix issue for HPC pod metrics** ([#8634](containerd/containerd#8634)) See the changelog for complete list of changes Please try out the release binaries and report any issues at https://github.com/containerd/containerd/issues. * Akihiro Suda * Phil Estes * Sebastiaan van Stijn * Wei Fu * Derek McGowan * Kazuyoshi Kato * Austin Vazquez * Samuel Karp * Shingo Omura * Jin Dong * Maksym Pavlenko * Aditi Sharma * Danny Canter * James Sturtevant * Laura Brehm * Rodrigo Campos * Akhil Mohan * Andrey Epifanov * Bjorn Neergaard * Cory Snider * Madhav Jivrajani * Mahamed Ali * Priyanka Saggu * Qasim Sarfraz * wangxiang * zounengren <details><summary>63 commits</summary> <p> * [release/1.7] Prepare release notes for v1.7.3 ([#8871](containerd/containerd#8871)) * [`4cb2f1515`](containerd/containerd@4cb2f15) [release/1.7] Add release notes for v1.7.3 * [release/1.7] cri: memory.memsw.limit_in_bytes: no such file or directory ([#8869](containerd/containerd#8869)) * [`b461ecacf`](containerd/containerd@b461eca) cri: memory.memsw.limit_in_bytes: no such file or directory * [release/1.7] migrate to community owned bucket for node e2e tests ([#8875](containerd/containerd#8875)) * [`14328ae03`](containerd/containerd@14328ae) migrate to community owned bucket * [release/1.7 backport] update runc binary to v1.1.8 ([#8843](containerd/containerd#8843)) * [`b985f7ef1`](containerd/containerd@b985f7e) update runc binary to v1.1.8 * [release/1.7 backport] [CRI] fix additionalGids: it should fallback to imageConfig.User when securityContext.RunAsUser,RunAsUsername are empty ([#8824](containerd/containerd#8824)) * [`083f57160`](containerd/containerd@083f571) capture desc variable in range variable just in case that it run in parallel mode * [`a9440ce6b`](containerd/containerd@a9440ce) Use t.TempDir instead of os.MkdirTemp * [`eea3440d8`](containerd/containerd@eea3440) use strings.Cut instead of strings.Split for parsing imageConfig.User * [`eace67180`](containerd/containerd@eace671) fix userstr for dditionalGids on Linux * [release/1.7 backport] cri: write generated CNI config atomically ([#8825](containerd/containerd#8825)) * [`7353c0286`](containerd/containerd@7353c02) ctr: update WritePidFile to use atomicfile * [`ae7021300`](containerd/containerd@ae70213) shim: WritePidFile & WriteAddress use atomicfile * [`186eb64b7`](containerd/containerd@186eb64) cri: write generated CNI config atomically on Unix * [`64c3dcd8e`](containerd/containerd@64c3dcd) atomicfile: new package for atomic file writes * [release/1.7 backport] Move logrus setup code to log package ([#8831](containerd/containerd#8831)) * [`f7a20e17c`](containerd/containerd@f7a20e1) Move logrus setup code to log package * [release/1.7 backport] Cirrus CI: configure apt-get to wait for locks ([#8814](containerd/containerd#8814)) * [`60a6db9c2`](containerd/containerd@60a6db9) Cirrus CI: configure apt-get to wait for locks * [release/1.7 backport] Update Go to 1.20.6,1.19.11 ([#8815](containerd/containerd#8815)) * [`973778193`](containerd/containerd@9737781) Update Go to 1.20.6,1.19.11 * [release/1.7 backport] update go to go1.20.5, go1.19.10 ([#8716](containerd/containerd#8716)) * [`403033e52`](containerd/containerd@403033e) update go to go1.20.5, go1.19.10 * [release/1.7 backport] bugfix(port-forward): Correctly handle known errors ([#8806](containerd/containerd#8806)) * [`6b6b0c828`](containerd/containerd@6b6b0c8) bugfix(port-forward): Correctly handle known errors * [release/1.7] Resolve docker.NewResolver race condition ([#8799](containerd/containerd#8799)) * [`898eca21e`](containerd/containerd@898eca2) Change http.Header copy to builtin Clone * [`fa2efc406`](containerd/containerd@fa2efc4) Resolve docker.NewResolver race condition * [release/1.7] Fix net.ipv4.ping_group_range with userns ([#8786](containerd/containerd#8786)) * [`241514815`](containerd/containerd@2415148) pkg/cri/server: Test net.ipv4.ping_group_range works with userns * [`801e8c806`](containerd/containerd@801e8c8) pkg/cri/server: Fix net.ipv4.ping_group_range with userns * [release/1.7 backport] vendor: github.com/containerd/zfs v1.1.0 ([#8782](containerd/containerd#8782)) * [`d5639a5a8`](containerd/containerd@d5639a5) vendor: github.com/containerd/zfs v1.1.0 * [release/1.7 backport] ci: remove libseccomp-dev installation for nightly ([#8772](containerd/containerd#8772)) * [`15d65709e`](containerd/containerd@15d6570) ci: remove libseccomp-dev installation for nightly * [release/1.7] go.mod: Update cgroups to 3.0.2 ([#8769](containerd/containerd#8769)) * [`a08ae718c`](containerd/containerd@a08ae71) [release/1.7] go.mod: Update cgroups to 3.0.2 * [release/1.7 backport] runtime/v2/runc: handle early exits w/o big locks ([#8712](containerd/containerd#8712)) * [`18c6503d9`](containerd/containerd@18c6503) runtime/v2/runc: handle early exits w/o big locks * [release/1.7 backport] integration/client: add timeout to `TestShimOOMScore` ([#8750](containerd/containerd#8750)) * [`3bf3996d9`](containerd/containerd@3bf3996) integration/client: add timeout to `TestShimOOMScore` * [release/1.7 backport] Update ginkgo to match cri-tools' version ([#8760](containerd/containerd#8760)) * [`c2c54af9d`](containerd/containerd@c2c54af) Update ginkgo to match cri-tools' version * [release/1.7 backport] seccomp: always allow name_to_handle_at ([#8753](containerd/containerd#8753)) * [`6281d46df`](containerd/containerd@6281d46) seccomp: always allow name_to_handle_at * [release/1.7] Pinned image support ([#8718](containerd/containerd#8718)) * [`699d6701a`](containerd/containerd@699d670) Pinned image support * [release/1.7] cherry-pick: No more nondistributable layers in MS registry ([#8690](containerd/containerd#8690)) * [`dafbeb5b1`](containerd/containerd@dafbeb5) No more nondistributable layers in MS registry * [release/1.7] [cri] Windows Pod Stats: Add a check to skip stats for containers that are not running ([#8654](containerd/containerd#8654)) * [`58b6b99cd`](containerd/containerd@58b6b99) Add a check to skip stats for containers that are not running * [release/1.7 backport] task: don't `close()` io before `cancel()` ([#8658](containerd/containerd#8658)) * [`e5b2a0131`](containerd/containerd@e5b2a01) task: don't `close()` io before `cancel()` * [release/1.7 backport] move to CRI-TOOLS v1.27.0 ([#8656](containerd/containerd#8656)) * [`a6a15afe3`](containerd/containerd@a6a15af) move to CRI-TOOLS v1.27.0 * [release/1.7] Remove cni conf_template deprecation ([#8638](containerd/containerd#8638)) * [`0b2b96479`](containerd/containerd@0b2b964) RELEASES.md: de-deprecation of CNI conf_template will be v1.7.3 * [`a24267b28`](containerd/containerd@a24267b) Remove cni conf_template deprecation * [release/1.7] Fix issue for HPC pod metrics ([#8634](containerd/containerd#8634)) * [`89415fe36`](containerd/containerd@89415fe) Fix issue for HPC pod metrics </p> </details> <details><summary>49 commits</summary> <p> * gofumpt and update status badges ([#75](containerd/zfs#75)) * [`5e3457b`](containerd/zfs@5e3457b) TestZFSUsage: use t.TempDir() * [`6e9c675`](containerd/zfs@6e9c675) README: update badges * [`ff17a79`](containerd/zfs@ff17a79) gofmt code * go.mod: github.com/mistifyio/go-zfs/v3 v3.0.1 ([#73](containerd/zfs#73)) * [`d3485b9`](containerd/zfs@d3485b9) go.mod: github.com/mistifyio/go-zfs/v3 v3.0.1 * gha: fix golangci-lint, and upgrade to v1.52.2 ([#74](containerd/zfs#74)) * [`23c831a`](containerd/zfs@23c831a) remove pre-go1.17 build-tags, and fix missing build-tags in plugin * [`e5acd95`](containerd/zfs@e5acd95) gha: fix golangci-lint, upgrade to v1.52.2 * Bump github.com/containerd/containerd from 1.6.12 to 1.6.18 ([#72](containerd/zfs#72)) * [`00b96c2`](containerd/zfs@00b96c2) Bump github.com/containerd/containerd from 1.6.12 to 1.6.18 * Bump github.com/containerd/containerd from 1.6.9 to 1.6.12 ([#69](containerd/zfs#69)) * [`a099def`](containerd/zfs@a099def) Bump github.com/containerd/containerd from 1.6.9 to 1.6.12 * Add CodeQL analysis workflow ([#67](containerd/zfs#67)) * [`fee1db7`](containerd/zfs@fee1db7) Add CodeQL analysis workflow * Update GitHub actions CI workflow ([#66](containerd/zfs#66)) * [`b8b7ab2`](containerd/zfs@b8b7ab2) Update GitHub actions CI workflow * Upgrade compiler to Go 1.19 and update dependencies ([#68](containerd/zfs#68)) * [`3e729b3`](containerd/zfs@3e729b3) Update dependencies * [`3c003f8`](containerd/zfs@3c003f8) Upgrade compiler to Go 1.19 * Remove references to io/ioutil package ([#65](containerd/zfs#65)) * [`d700762`](containerd/zfs@d700762) Remove references to io/ioutil package * Update go.mod and move to supported Go version ([#62](containerd/zfs#62)) * [`f52906e`](containerd/zfs@f52906e) Update Go version to supported version * [`79ca2cb`](containerd/zfs@79ca2cb) Update containerd depedency to latest * go.mod: github.com/mistifyio/go-zfs v3.0.0 ([#59](containerd/zfs#59)) * [`2e3db29`](containerd/zfs@2e3db29) go.mod: github.com/mistifyio/go-zfs v3.0.0 * go.mod: github.com/mistifyio/go-zfs/v3 v3.0.0-20220217145925-d014733a5309 ([#58](containerd/zfs#58)) * [`d904e63`](containerd/zfs@d904e63) go.mod: github.com/mistifyio/go-zfs/v3 v3.0.0-20220217145925-d014733a5309 * Update vendoring to containerd 1.6.x ([#57](containerd/zfs#57)) * [`e021180`](containerd/zfs@e021180) Update vendoring to containerd 1.6.x * Bump github.com/containerd/containerd from 1.5.8 to 1.5.9 ([#55](containerd/zfs#55)) * [`fc0c9a9`](containerd/zfs@fc0c9a9) Bump github.com/containerd/containerd from 1.5.8 to 1.5.9 * Bump github.com/containerd/containerd from 1.5.5 to 1.5.8 ([#54](containerd/zfs#54)) * [`5d2f28c`](containerd/zfs@5d2f28c) Bump github.com/containerd/containerd from 1.5.5 to 1.5.8 * follow-up-#52: fix the order of cause in fmt.Errorf ([#53](containerd/zfs#53)) * [`b3f193d`](containerd/zfs@b3f193d) follow-up-#52: fix the order of cause in fmt.Errorf * replace pkg/errors ([#52](containerd/zfs#52)) * [`d5b0a2f`](containerd/zfs@d5b0a2f) replace pkg/errors * Bump github.com/containerd/containerd from 1.5.2 to 1.5.4 ([#51](containerd/zfs#51)) * [`fd6afa5`](containerd/zfs@fd6afa5) Bump github.com/containerd/containerd from 1.5.2 to 1.5.4 * Bump containerd to 1.5.2 ([#50](containerd/zfs#50)) * [`aef875e`](containerd/zfs@aef875e) bump containerd to 1.5.2 * Rename branches from master to main ([#49](containerd/zfs#49)) * [`35c6af7`](containerd/zfs@35c6af7) Rename branches from master to main * sync up with containerd 1.5 GA ([#47](containerd/zfs#47)) * [`3d5efef`](containerd/zfs@3d5efef) vendor sync up with containerd 1.5 ga * README.md: fix CI badge ([#46](containerd/zfs#46)) * [`0977d81`](containerd/zfs@0977d81) README.md: fix CI badge </p> </details> * **github.com/containerd/cgroups/v3** v3.0.1 -> v3.0.2 * **github.com/containerd/zfs** v1.0.0 -> v1.1.0 * **github.com/mistifyio/go-zfs/v3** v3.0.1 **_new_** Previous release can be found at [v1.7.2](https://github.com/containerd/containerd/releases/tag/v1.7.2)
service.handleProcessExit
/service.checkProcesses
, due to commit 6f34da5 not being in the 1.7 branch.eventSendMu is causing severe lock contention when multiple processes start and exit concurrently. Replace it with a different scheme for maintaining causality w.r.t. start and exit events for a process which does not rely on big locks for synchronization.
Keep track of all processes for which a Task(Exec)Start event has been published and have not yet exited in a map, keyed by their PID. Processing exits then is as simple as looking up which process corresponds to the PID. If there are no started processes known with that PID, the PID must either belong to a process which was started by s.Start() and before the s.Start() call has added the process to the map of running processes, or a reparented process which we don't care about. Handle the former case by having each s.Start() call subscribe to exit events before starting the process. It checks if the PID has exited in the time between it starting the process and publishing the TaskStart event, handling the exit if it has. Exit events for reparented processes received when no s.Start() calls are in flight are immediately discarded, and events received during an s.Start() call are discarded when the s.Start() call returns.
(cherry picked from commit 5cd6210)