Skip to content

Conversation

@ryanmccann1024
Copy link

@ryanmccann1024 ryanmccann1024 commented Jul 31, 2025

Fixes: #26588

For use cases like HPC, where podman exec is called in rapid succession, the standard exec process can become a bottleneck due to container locking and database I/O for session tracking.

This commit introduces a new --no-session flag to podman exec. When used, this flag invokes a new, lightweight backend implementation (ExecNoSession) that:

  • Skips container locking, reducing lock contention.
  • Bypasses the creation, tracking, and removal of exec sessions in the database.
  • Executes the command directly and retrieves the exit code without persisting session state.

Does this PR introduce a user-facing change?

Added a new `--no-session` flag to `podman exec` to provide a performance-optimized execution path that bypasses container locking and database session tracking. This is ideal for high-concurrency environments like HPC where exec session tracking is not required.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 31, 2025

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci openshift-ci bot added the do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None label Jul 31, 2025
@ryanmccann1024 ryanmccann1024 force-pushed the feature/26588-exec-no-session branch from c03d7e3 to 5263c85 Compare July 31, 2025 20:21
@packit-as-a-service
Copy link

[NON-BLOCKING] Packit jobs failed. @containers/packit-build please check. Everyone else, feel free to ignore.

@ryanmccann1024 ryanmccann1024 force-pushed the feature/26588-exec-no-session branch 3 times, most recently from d63e18d to f7d110b Compare August 1, 2025 01:41
@ryanmccann1024
Copy link
Author

Hello @mheon,

I'm not really sure why some of the pipelines fail, I'm stuck.

@mheon
Copy link
Member

mheon commented Aug 1, 2025

Integration tests, you need a SkipIfRemote in your tests. System tests are both flakes.

@ryanmccann1024 ryanmccann1024 force-pushed the feature/26588-exec-no-session branch from f7d110b to e637650 Compare August 1, 2025 13:39
@ryanmccann1024 ryanmccann1024 force-pushed the feature/26588-exec-no-session branch from e637650 to 1853aea Compare August 1, 2025 18:43
@ryanmccann1024 ryanmccann1024 force-pushed the feature/26588-exec-no-session branch 2 times, most recently from f5cfc13 to cd2b89f Compare August 6, 2025 15:53
@ryanmccann1024 ryanmccann1024 force-pushed the feature/26588-exec-no-session branch from cd2b89f to 82dcb63 Compare August 6, 2025 22:32
@ryanmccann1024 ryanmccann1024 force-pushed the feature/26588-exec-no-session branch from 82dcb63 to 1a91259 Compare August 7, 2025 15:18
Copy link
Member

@Honny1 Honny1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job, LGTM

The failed Healthcheck seems to be a flake. I tested it locally and the Healthcheck passed.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 7, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Honny1, ryanmccann1024
Once this PR has been reviewed and has the lgtm label, please assign mheon for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ryanmccann1024 ryanmccann1024 requested a review from mheon August 12, 2025 15:52
@mheon
Copy link
Member

mheon commented Aug 13, 2025

LGTM

@ryanmccann1024
Copy link
Author

I think there's still a pending review @mheon

Or I'm not sure if I should do something else before that.

@mheon
Copy link
Member

mheon commented Aug 15, 2025

@containers/podman-maintainers PTAL

Comment on lines 1310 to 1338
// ExecNoSession executes a command in a container without creating a persistent exec session.
// It skips database operations and minimizes container locking for performance.
func (c *Container) ExecNoSession(config *ExecConfig, streams *define.AttachStreams, resize <-chan resize.TerminalSize) (int, error) {
if err := c.verifyExecConfig(config); err != nil {
return -1, err
}

unlock := true
if !c.batched {
c.lock.Lock()
defer func() {
if unlock {
c.lock.Unlock()
}
}()

if err := c.syncContainer(); err != nil {
return -1, err
}
}

if !c.ensureState(define.ContainerStateRunning) {
return -1, fmt.Errorf("can only create exec sessions on running containers: %w", define.ErrCtrStateInvalid)
}

session, err := c.createExecSession(config)
if err != nil {
return -1, err
}

opts, err := prepareForExec(c, session)
if err != nil {
return -1, err
}

defer func() {
if err := c.cleanupExecBundle(session.ID()); err != nil {
logrus.Errorf("Container %s light exec session cleanup error: %v", c.ID(), err)
}
}()

_, attachChan, err := c.ociRuntime.ExecContainer(c, session.ID(), opts, streams, nil)
if err != nil {
return -1, err
}

if !c.batched {
c.lock.Unlock()
unlock = false
}

err = <-attachChan
if err != nil && !errors.Is(err, define.ErrDetach) {
return -1, err
}

exitCode, err := c.readExecExitCode(session.ID())
if err != nil {
return -1, fmt.Errorf("retrieving no-session exec %s exit code: %w", session.ID(), err)
}

return exitCode, nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be mostly duplication from c.healthCheckExec(), I like to see this worked to share the code not duplicate it.

return define.TranslateExecErrorToExitCode(ec, err), err
}

func (ic *ContainerEngine) ContainerExecNoSession(ctx context.Context, nameOrID string, options entities.ExecOptions, streams define.AttachStreams) (int, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason to define a new function on the interface like this, it could easily be added in ContainerExec()

In particular this function simply removes several required things that ContainerExec() does.

First this here never calls, getContainers() which means --latest will be broken

Second, it misses the if options.Tty branch to add the TERM env which means you get different behavior fro TERM in session on no session mode which seems very unexpected

Lastly it also doesn't do the tty resize logic that is in ExecAttachCtr() so the terminal is not set into the right state I think

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to keep this separate - I'm expecting further changes down the line to diverge from normal Exec()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

different how? we can split in libpod but doing this here on the cli/ContainerEngine just seems to bypass basic code that we must always do as I pointed out above.

We can always do if execNoSession inside ContainerExec() once we looked up the container and configured the basic exec config.

what further changes are expected here where this function makes sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complete removal of the Conmon backend in favor of directly calling OCI runtime exec directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure but that can still happen within ContainerExec(), right now we duplicate common lookup logic which I find quite bad due the bugs mentioned, at the end of ContainerExec() a simple if options.NoSession would be easier IMO.

stopSession := podmanTest.Podman([]string{"stop", "-t", "5", ctrName})
stopSession.WaitWithDefaultTimeout()
Expect(stopSession).Should(ExitCleanly())
Eventually(execSession, "5s").Should(Not(Exit(0)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please test exit exit codes, it should exit 137 due the kill signal I guess?
It is easy to such tests wrong otherwise, you could type the command then it always errors with non 0 code but didn't test what you wanted.

@giuseppe
Copy link
Member

Do we have any numbers on what's the improvement is?

Can you run it under hyperfine and check what's the difference with a regular exec?

@giuseppe
Copy link
Member

Do we have any numbers on what's the improvement is?

Can you run it under hyperfine and check what's the difference with a regular exec?

I did the test on my machine and the results are very good:

➜ hyperfine 'bin/podman exec foo true' 'bin/podman exec --no-session foo true'
Benchmark 1: bin/podman exec foo true
  Time (mean ± σ):      80.2 ms ±   2.9 ms    [User: 22.6 ms, System: 14.6 ms]
  Range (min … max):    74.9 ms …  86.5 ms    34 runs

Benchmark 2: bin/podman exec --no-session foo true
  Time (mean ± σ):      29.9 ms ±   6.8 ms    [User: 20.1 ms, System: 11.9 ms]
  Range (min … max):    22.3 ms …  54.1 ms    97 runs

Summary
  bin/podman exec --no-session foo true ran
    2.68 ± 0.62 times faster than bin/podman exec foo true

@ryanmccann1024
Copy link
Author

Update: Going to make these changes this week.

Sorry for the delay!

@Luap99
Copy link
Member

Luap99 commented Sep 4, 2025

Looking at this again there is another problem with this, right now conmon always spawns the podman container cleanup command which should remove the session but since there is no session in the db that command will always fail.
Well technically it doesn't fail because we ignore the no session error as this is a common race

diff --git a/pkg/domain/infra/abi/containers.go b/pkg/domain/infra/abi/containers.go
index 1ac01c3842..9393b4eedc 100644
--- a/pkg/domain/infra/abi/containers.go
+++ b/pkg/domain/infra/abi/containers.go
@@ -1328,7 +1328,7 @@ func (ic *ContainerEngine) ContainerCleanup(ctx context.Context, namesOrIds []st
                                err = ctr.ExecCleanup(options.Exec)
                        }
                        // If ErrNoSuchExecSession then the exec session was already removed so do not report an error.
-                       if err != nil && !errors.Is(err, define.ErrNoSuchExecSession) {
+                       if err != nil /*&& !errors.Is(err, define.ErrNoSuchExecSession) */ {
                                return nil, err
                        }
                        return []*entities.ContainerCleanupReport{}, nil

With this and using the hack/podman_cleanup_tracer.bt you see the commands failing.
Anyway my point here is that we should never register the cleanup command to begin with, it will always fail so this should really not specify the command for conmon. This should make this patch even more effective as we do not spawn a cleanup process.

Either that or we keep the cleanup process so it can clean up the tmpfiles we created for this exec, but then we need a way to communicate the location to the cleanup process instead of it looking it up in the db.
And also one thing exec doesn't seem to do is to handle the signal's sure, i.e. if we get SIGINT/TERM we should be able to cleanup the tmpfiles before exiting to ensure we do not leak them.

@ryanmccann1024
Copy link
Author

Got it!

I think I'm all set to keep working on this although I'm not fully clear on the last suggestion made by @Luap99 (I'm a newbie).

@Luap99
Copy link
Member

Luap99 commented Sep 9, 2025

Got it!

I think I'm all set to keep working on this although I'm not fully clear on the last suggestion made by @Luap99 (I'm a newbie).

There is this in makeExecConfig() which sets a exit command on conmon which we don't want.

	// TODO: Add some ability to toggle syslog
	exitCommandArgs, err := specgenutil.CreateExitCommandArgs(storageConfig, runtimeConfig, logrus.IsLevelEnabled(logrus.DebugLevel), false, false, true)
	if err != nil {
		return nil, fmt.Errorf("constructing exit command for exec session: %w", err)
	}
	execConfig.ExitCommand = exitCommandArgs

So in theory all you need to do is to make sure the command is nil for you exec config then it should work I think.

@ryanmccann1024 ryanmccann1024 force-pushed the feature/26588-exec-no-session branch from 1a91259 to bb3eaf0 Compare September 10, 2025 22:38
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 10, 2025
@ryanmccann1024 ryanmccann1024 force-pushed the feature/26588-exec-no-session branch from bb3eaf0 to bb967f2 Compare September 10, 2025 22:43
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 10, 2025
@ryanmccann1024 ryanmccann1024 force-pushed the feature/26588-exec-no-session branch from bb967f2 to 0d8b404 Compare September 10, 2025 22:45

execResult = podmanTest.Podman([]string{"exec", "--no-session", ctrName, "nonexistentcommand"})
execResult.WaitWithDefaultTimeout()
Expect(execResult).Should(ExitWithError(127, ""))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ends with a 255 error code. I suspect there is some incorrect handling of the error code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a ridiculous question, but how do I run the tests locally (I'm on a Mac)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, you will need a Linux virtual machine (VM). Alternatively, you can SSH into the Podman machine with podman machine ssh. The entire /Users directory is mounted, so you can cd to the location where you cloned the Podman repository. Then, you can install the developer dependencies using the command sudo rpm-ostree install systemd-devel gcc glib2-devel glibc-devel glibc-static golang git-core gpgme-devel libassuan-devel libgpg-error-devel libseccomp-devel libselinux-devel shadow-utils-subid-devel pkgconfig man-db sqlite-devel systemd systemd-devel. Afterward, exit the machine and restart it with podman machine stop and podman machine start. Once you SSH back into the machine, you should have a working environment where you can run tests.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I'm doing something wrong, the integration tests pass for me locally?

@ryanmccann1024 ryanmccann1024 force-pushed the feature/26588-exec-no-session branch from 0d8b404 to d15725c Compare September 11, 2025 14:06
Expect(execResult).Should(ExitWithError(127, "OCI runtime attempted to invoke a command that was not found"))

execSession := podmanTest.Podman([]string{"exec", "--no-session", ctrName, "sleep", "30"})
killSession := podmanTest.Podman([]string{"exec", ctrName, "sh", "-c", "kill -9 $(pgrep sleep)"})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem here is that this is running serially - maybe put the execSession bit in a goroutine so they run concurrent? Though you'd need a sleep to sequence them, give time for the first exec to start (our CI is really slow)

Fixes: containers#26588

For use cases like HPC, where `podman exec` is called in rapid succession, the standard exec process can become a bottleneck due to container locking and database I/O for session tracking.

This commit introduces a new `--no-session` flag to `podman exec`. When used, this flag invokes a new, lightweight backend implementation that:

- Skips container locking, reducing lock contention
- Bypasses the creation, tracking, and removal of exec sessions in the database
- Executes the command directly and retrieves the exit code without persisting session state
- Maintains consistency with regular exec for container lookup, TTY handling, and environment setup
- Shares implementation with health check execution to avoid code duplication

The implementation addresses all performance bottlenecks while preserving compatibility with existing exec functionality including --latest flag support and proper exit code handling.

Changes include:
- Add --no-session flag to cmd/podman/containers/exec.go
- Implement lightweight execution path in libpod/container_exec.go
- Ensure consistent container validation and environment setup
- Add comprehensive exit code testing including signal handling (exit 137)
- Optimize configuration to skip unnecessary exit command setup

Signed-off-by: Ryan McCann <ryan_mccann@student.uml.edu>
Signed-off-by: ryanmccann1024 <ryan_mccann@student.uml.edu>
@ryanmccann1024 ryanmccann1024 force-pushed the feature/26588-exec-no-session branch from d15725c to d718ce4 Compare October 5, 2025 16:35
@ryanmccann1024 ryanmccann1024 force-pushed the feature/26588-exec-no-session branch from d718ce4 to 2cb0bce Compare October 5, 2025 17:05
@ryanmccann1024
Copy link
Author

Any suggestions on how to fix the failure on that pipeline due to the merge?

I think there's also a linting error from upstream?

@Honny1
Copy link
Member

Honny1 commented Oct 6, 2025

Linting error fix: #27234

@Honny1
Copy link
Member

Honny1 commented Oct 16, 2025

@ryanmccann1024 Could you please rebase on main?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sessionless Exec

6 participants