-
Notifications
You must be signed in to change notification settings - Fork 2.8k
feat(exec): Add --no-session flag for improved performance #26727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(exec): Add --no-session flag for improved performance #26727
Conversation
|
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
c03d7e3 to
5263c85
Compare
|
[NON-BLOCKING] Packit jobs failed. @containers/packit-build please check. Everyone else, feel free to ignore. |
d63e18d to
f7d110b
Compare
|
Hello @mheon, I'm not really sure why some of the pipelines fail, I'm stuck. |
|
Integration tests, you need a SkipIfRemote in your tests. System tests are both flakes. |
f7d110b to
e637650
Compare
e637650 to
1853aea
Compare
f5cfc13 to
cd2b89f
Compare
cd2b89f to
82dcb63
Compare
82dcb63 to
1a91259
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job, LGTM
The failed Healthcheck seems to be a flake. I tested it locally and the Healthcheck passed.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Honny1, ryanmccann1024 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
LGTM |
|
I think there's still a pending review @mheon Or I'm not sure if I should do something else before that. |
|
@containers/podman-maintainers PTAL |
libpod/container_exec.go
Outdated
| // ExecNoSession executes a command in a container without creating a persistent exec session. | ||
| // It skips database operations and minimizes container locking for performance. | ||
| func (c *Container) ExecNoSession(config *ExecConfig, streams *define.AttachStreams, resize <-chan resize.TerminalSize) (int, error) { | ||
| if err := c.verifyExecConfig(config); err != nil { | ||
| return -1, err | ||
| } | ||
|
|
||
| unlock := true | ||
| if !c.batched { | ||
| c.lock.Lock() | ||
| defer func() { | ||
| if unlock { | ||
| c.lock.Unlock() | ||
| } | ||
| }() | ||
|
|
||
| if err := c.syncContainer(); err != nil { | ||
| return -1, err | ||
| } | ||
| } | ||
|
|
||
| if !c.ensureState(define.ContainerStateRunning) { | ||
| return -1, fmt.Errorf("can only create exec sessions on running containers: %w", define.ErrCtrStateInvalid) | ||
| } | ||
|
|
||
| session, err := c.createExecSession(config) | ||
| if err != nil { | ||
| return -1, err | ||
| } | ||
|
|
||
| opts, err := prepareForExec(c, session) | ||
| if err != nil { | ||
| return -1, err | ||
| } | ||
|
|
||
| defer func() { | ||
| if err := c.cleanupExecBundle(session.ID()); err != nil { | ||
| logrus.Errorf("Container %s light exec session cleanup error: %v", c.ID(), err) | ||
| } | ||
| }() | ||
|
|
||
| _, attachChan, err := c.ociRuntime.ExecContainer(c, session.ID(), opts, streams, nil) | ||
| if err != nil { | ||
| return -1, err | ||
| } | ||
|
|
||
| if !c.batched { | ||
| c.lock.Unlock() | ||
| unlock = false | ||
| } | ||
|
|
||
| err = <-attachChan | ||
| if err != nil && !errors.Is(err, define.ErrDetach) { | ||
| return -1, err | ||
| } | ||
|
|
||
| exitCode, err := c.readExecExitCode(session.ID()) | ||
| if err != nil { | ||
| return -1, fmt.Errorf("retrieving no-session exec %s exit code: %w", session.ID(), err) | ||
| } | ||
|
|
||
| return exitCode, nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be mostly duplication from c.healthCheckExec(), I like to see this worked to share the code not duplicate it.
pkg/domain/infra/abi/containers.go
Outdated
| return define.TranslateExecErrorToExitCode(ec, err), err | ||
| } | ||
|
|
||
| func (ic *ContainerEngine) ContainerExecNoSession(ctx context.Context, nameOrID string, options entities.ExecOptions, streams define.AttachStreams) (int, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason to define a new function on the interface like this, it could easily be added in ContainerExec()
In particular this function simply removes several required things that ContainerExec() does.
First this here never calls, getContainers() which means --latest will be broken
Second, it misses the if options.Tty branch to add the TERM env which means you get different behavior fro TERM in session on no session mode which seems very unexpected
Lastly it also doesn't do the tty resize logic that is in ExecAttachCtr() so the terminal is not set into the right state I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to keep this separate - I'm expecting further changes down the line to diverge from normal Exec()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
different how? we can split in libpod but doing this here on the cli/ContainerEngine just seems to bypass basic code that we must always do as I pointed out above.
We can always do if execNoSession inside ContainerExec() once we looked up the container and configured the basic exec config.
what further changes are expected here where this function makes sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Complete removal of the Conmon backend in favor of directly calling OCI runtime exec directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure but that can still happen within ContainerExec(), right now we duplicate common lookup logic which I find quite bad due the bugs mentioned, at the end of ContainerExec() a simple if options.NoSession would be easier IMO.
| stopSession := podmanTest.Podman([]string{"stop", "-t", "5", ctrName}) | ||
| stopSession.WaitWithDefaultTimeout() | ||
| Expect(stopSession).Should(ExitCleanly()) | ||
| Eventually(execSession, "5s").Should(Not(Exit(0))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please test exit exit codes, it should exit 137 due the kill signal I guess?
It is easy to such tests wrong otherwise, you could type the command then it always errors with non 0 code but didn't test what you wanted.
|
Do we have any numbers on what's the improvement is? Can you run it under hyperfine and check what's the difference with a regular exec? |
I did the test on my machine and the results are very good: |
|
Update: Going to make these changes this week. Sorry for the delay! |
|
Looking at this again there is another problem with this, right now conmon always spawns the podman container cleanup command which should remove the session but since there is no session in the db that command will always fail. With this and using the Either that or we keep the cleanup process so it can clean up the tmpfiles we created for this exec, but then we need a way to communicate the location to the cleanup process instead of it looking it up in the db. |
|
Got it! I think I'm all set to keep working on this although I'm not fully clear on the last suggestion made by @Luap99 (I'm a newbie). |
There is this in makeExecConfig() which sets a exit command on conmon which we don't want. So in theory all you need to do is to make sure the command is nil for you exec config then it should work I think. |
1a91259 to
bb3eaf0
Compare
bb3eaf0 to
bb967f2
Compare
bb967f2 to
0d8b404
Compare
test/e2e/exec_test.go
Outdated
|
|
||
| execResult = podmanTest.Podman([]string{"exec", "--no-session", ctrName, "nonexistentcommand"}) | ||
| execResult.WaitWithDefaultTimeout() | ||
| Expect(execResult).Should(ExitWithError(127, "")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ends with a 255 error code. I suspect there is some incorrect handling of the error code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a ridiculous question, but how do I run the tests locally (I'm on a Mac)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, you will need a Linux virtual machine (VM). Alternatively, you can SSH into the Podman machine with podman machine ssh. The entire /Users directory is mounted, so you can cd to the location where you cloned the Podman repository. Then, you can install the developer dependencies using the command sudo rpm-ostree install systemd-devel gcc glib2-devel glibc-devel glibc-static golang git-core gpgme-devel libassuan-devel libgpg-error-devel libseccomp-devel libselinux-devel shadow-utils-subid-devel pkgconfig man-db sqlite-devel systemd systemd-devel. Afterward, exit the machine and restart it with podman machine stop and podman machine start. Once you SSH back into the machine, you should have a working environment where you can run tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I'm doing something wrong, the integration tests pass for me locally?
0d8b404 to
d15725c
Compare
| Expect(execResult).Should(ExitWithError(127, "OCI runtime attempted to invoke a command that was not found")) | ||
|
|
||
| execSession := podmanTest.Podman([]string{"exec", "--no-session", ctrName, "sleep", "30"}) | ||
| killSession := podmanTest.Podman([]string{"exec", ctrName, "sh", "-c", "kill -9 $(pgrep sleep)"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem here is that this is running serially - maybe put the execSession bit in a goroutine so they run concurrent? Though you'd need a sleep to sequence them, give time for the first exec to start (our CI is really slow)
Fixes: containers#26588 For use cases like HPC, where `podman exec` is called in rapid succession, the standard exec process can become a bottleneck due to container locking and database I/O for session tracking. This commit introduces a new `--no-session` flag to `podman exec`. When used, this flag invokes a new, lightweight backend implementation that: - Skips container locking, reducing lock contention - Bypasses the creation, tracking, and removal of exec sessions in the database - Executes the command directly and retrieves the exit code without persisting session state - Maintains consistency with regular exec for container lookup, TTY handling, and environment setup - Shares implementation with health check execution to avoid code duplication The implementation addresses all performance bottlenecks while preserving compatibility with existing exec functionality including --latest flag support and proper exit code handling. Changes include: - Add --no-session flag to cmd/podman/containers/exec.go - Implement lightweight execution path in libpod/container_exec.go - Ensure consistent container validation and environment setup - Add comprehensive exit code testing including signal handling (exit 137) - Optimize configuration to skip unnecessary exit command setup Signed-off-by: Ryan McCann <ryan_mccann@student.uml.edu> Signed-off-by: ryanmccann1024 <ryan_mccann@student.uml.edu>
d15725c to
d718ce4
Compare
d718ce4 to
2cb0bce
Compare
|
Any suggestions on how to fix the failure on that pipeline due to the merge? I think there's also a linting error from upstream? |
|
Linting error fix: #27234 |
|
@ryanmccann1024 Could you please rebase on main? |
Fixes: #26588
For use cases like HPC, where
podman execis called in rapid succession, the standard exec process can become a bottleneck due to container locking and database I/O for session tracking.This commit introduces a new
--no-sessionflag topodman exec. When used, this flag invokes a new, lightweight backend implementation (ExecNoSession) that:Does this PR introduce a user-facing change?