Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support transient store mode #16371

Merged
merged 6 commits into from
Dec 6, 2022

Conversation

alexlarsson
Copy link
Contributor

@alexlarsson alexlarsson commented Nov 1, 2022

This adds a boolean option transient_store to storage.conf, or via the CLI --transient-store. If this is enabled then all containers state is considered transient and will not survived reboot, as discussed in #16245.

In particular the changes are these:

  • $rootdir/libpod/bolt_state.db moves to $runroot/libpod
  • $rootdir/overlay-containers/containers.json moves to $runroot/overlay-containers
  • There is a new set of storage directories overlay-containerlayers which is similar to overlay-layers (uses the same layer store code). This is used for layers that are part of containers (i,e. not image layers).
  • The layer store now has a transient mode where the layers.json is stored in $runroot. This is used for the overlay-containerlayers dir.

The end result of this is that all metadata that is stored related to containers and volumes (but not images) end up on tmpfs. This has two main advantages. First of all database writes much faster, in particular for podman run which now has no database writes outside tmpfs. And secondly, since the databases are not persisted we can't run into weird issues on unclean shutdowns or other corner case. Every boot always starts from a clean slate.

Note, while the databases (bolt_state.db, containers.json and layers.json) don't persist, we do use persistant storage for some of the content (such as the "upper" of the overlayfs). This shouldn't cause problems in practice because all such data is indexed by layer/container id which is unlikely to be reused across boots. However, ideally $rootdir/overlay-container and $rootdir/overlay-containerlayers should be cleaned on boot to avoid leftover data to waste space.

The performance improvement of this depends on system performance and characteristics. But here is an example on my desktop:

Without transient mode:

Benchmark 1: bin/podman run --transient-store=false --rm --pull=never --network=host --security-opt seccomp=unconfined fedora true
  Time (mean ± σ):     229.4 ms ±  31.3 ms    [User: 76.2 ms, System: 39.9 ms]
  Range (min … max):   196.0 ms … 298.3 ms    11 runs

With transient mode:

Benchmark 1: bin/podman run --transient-store=true --rm --pull=never --network=host --security-opt seccomp=unconfined fedora true
  Time (mean ± σ):      90.5 ms ±   3.2 ms    [User: 39.1 ms, System: 18.9 ms]
  Range (min … max):    86.4 ms … 103.6 ms    27 runs

NOTE: This commit touches both podman code and vendored containers/storage code, so will need to be split out. But I think reviewing is easier this way.

Does this PR introduce a user-facing change?

None

@openshift-ci openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None labels Nov 1, 2022
@rhatdan
Copy link
Member

rhatdan commented Nov 1, 2022

@mheon @vrothberg @nalind @baude PTAL

@@ -322,6 +322,14 @@ func makeRuntime(runtime *Runtime) (retErr error) {
}
}

// Create the tmpDir
if err := os.MkdirAll(runtime.config.Engine.TmpDir, 0751); err != nil {
Copy link
Collaborator

@flouthoc flouthoc Nov 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be 755 or if 751 is intentional maybe a comment just above could help understand why only execute is given without read to others.

Suggested change
if err := os.MkdirAll(runtime.config.Engine.TmpDir, 0751); err != nil {
if err := os.MkdirAll(runtime.config.Engine.TmpDir, 0755); err != nil {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0751 would give others the ability to reach files (search) under the tree, but not able to list it. I do not know which is correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have an opinion on this, I just moved the existing mkdir code earlier in the function because I need to reference TmpDir. The existing code actually had both 751 and 755, so I deleted both, but the 751 is the one that would have "won" before, so I kept that to make it not change behaviour.

I'm fine with changing the behavior to 755`, but maybe that should be a separate PR with separate arguments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0751 is what's being used at the moment, so I think we should keep it that way in this PR. Changing the default can be done in a separate change if needed.

libpod/runtime.go Outdated Show resolved Hide resolved
pkg/specgenutil/util.go Outdated Show resolved Hide resolved
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 1, 2022
@mheon
Copy link
Member

mheon commented Nov 1, 2022

Can you add this to podman info as well, to make it easy to discover?

@rhatdan
Copy link
Member

rhatdan commented Nov 1, 2022

Missing man page changes.

Can I run both --transient containers and non transient containers at the same time?

@alexlarsson
Copy link
Contributor Author

alexlarsson commented Nov 2, 2022

Can I run both --transient containers and non transient containers at the same time?

This is more of a storage configuration similar to --runroot= or --root=, as it affects everything. Might it still work? I dunno? Maybe? You'll have double databases for some things (bolt_state.db, etc) but shared for some (images.json, etc). My guess is that it would look like it works, but there are some issues in the locking etc if you look at it more carefully.

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the numbers!

For the c/storage changes, I suggest to open a PR directly. This will allow for passing CI and get the storage experts' eyes on it.

Need more docs for both PRs.

cmd/podman/root.go Show resolved Hide resolved
libpod/runtime.go Outdated Show resolved Hide resolved
pkg/specgenutil/util.go Outdated Show resolved Hide resolved
@alexlarsson
Copy link
Contributor Author

New version with some feedback applied. Still lacks man page updates and splitting up into two PRs.

@alexlarsson
Copy link
Contributor Author

I split out the containers/storage part here: containers/storage#1424
It is then vendored back in here, but that is a hack which I will redo properly when that PR lands.

@alexlarsson
Copy link
Contributor Author

Note, this doesn't currently move container userdata files like config.json, secrets and shm to runroot. Maybe we should do that too for minor performance reasons. I assume only the artifacts subdir will really contain large files, right? Those should probably still go on the disk.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 7, 2022
@@ -457,6 +457,8 @@ func rootFlags(cmd *cobra.Command, podmanConfig *entities.PodmanConfig) {
pFlags.StringVar(&podmanConfig.Runroot, runrootFlagName, "", "Path to the 'run directory' where all state information is stored")
_ = cmd.RegisterFlagCompletionFunc(runrootFlagName, completion.AutocompleteDefault)

pFlags.BoolVar(&podmanConfig.TransientStore, "transient-store", false, "Enable transient container storage")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems to override the setting in storage.conf.

Shouldn't transient_store = true in storage.conf always set the transient flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be able to pass this in the conmon cleanup commands to get the exact same setup as the initial podman command was run as, similar to the --rundir=... args, etc. That can't rely on the global setting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed it makes sense to have the CLI option, but when I create a new container and the global setting is set I'd expect it to affect the newly created container unless I specify --transient=false

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. right!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this was a bit busted before, but should work now.

However, if you specify transient_store=true in the config file that currently only affects the default for rootful containers. It seems that there is a bunch of special code that defines the default for rootless containers. This might make sense, but I'm not sure, opinions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly about either way. I think it's a fairly niche feature (for now) but it's worth pointing users to storage.conf.

@alexlarsson alexlarsson force-pushed the transient-store branch 4 times, most recently from e8f53dd to 7cb716d Compare November 10, 2022 12:33
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 10, 2022
@alexlarsson alexlarsson force-pushed the transient-store branch 2 times, most recently from b523c9b to b55fb1f Compare November 11, 2022 08:01
@alexlarsson alexlarsson force-pushed the transient-store branch 2 times, most recently from 0823d99 to c917692 Compare December 2, 2022 09:37
@github-actions github-actions bot added the kind/api-change Change to remote API; merits scrutiny label Dec 2, 2022
@alexlarsson alexlarsson force-pushed the transient-store branch 4 times, most recently from 5bc4d64 to cf1d1d4 Compare December 2, 2022 13:47
@alexlarsson alexlarsson changed the title WIP: Support transient store mode Support transient store mode Dec 2, 2022
@openshift-ci openshift-ci bot added release-note-none and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None labels Dec 2, 2022
@rhatdan
Copy link
Member

rhatdan commented Dec 2, 2022

@mheon @vrothberg PTAL

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall

@@ -322,6 +322,14 @@ func makeRuntime(runtime *Runtime) (retErr error) {
}
}

// Create the tmpDir
if err := os.MkdirAll(runtime.config.Engine.TmpDir, 0751); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0751 is what's being used at the moment, so I think we should keep it that way in this PR. Changing the default can be done in a separate change if needed.

@@ -457,6 +457,8 @@ func rootFlags(cmd *cobra.Command, podmanConfig *entities.PodmanConfig) {
pFlags.StringVar(&podmanConfig.Runroot, runrootFlagName, "", "Path to the 'run directory' where all state information is stored")
_ = cmd.RegisterFlagCompletionFunc(runrootFlagName, completion.AutocompleteDefault)

pFlags.BoolVar(&podmanConfig.TransientStore, "transient-store", false, "Enable transient container storage")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly about either way. I think it's a fairly niche feature (for now) but it's worth pointing users to storage.conf.

docs/source/markdown/podman.1.md Outdated Show resolved Hide resolved
This moda mode allows starting containers faster, as well as guaranteeing a fresh state on boot in case of unclean shutdowns or other problems. However
it is not compabible with a traditional model where containers persist across reboots.

Default value for this is configured in `/etc/containers/storage.conf`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/etc/containers/storage.conf -> containers-storage.conf(5) as there are various locations for storage.conf.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copied verbatim from other options like e.g. --root. Should I change all of them?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes lets move to man page links.

Later changes will need to access it earlier, so move its creation to
just after the creation of StaticDir.

Note: For whatever reason this we created twice before, but we now
only do it once.

Signed-off-by: Alexander Larsson <alexl@redhat.com>
This handles the transient store options from the container/storage
configuration in the runtime/engine.

Changes are:
 * Print transient store status in `podman info`
 * Print transient store status in runtime debug output
 * Add --transient-store argument to override config option
 * Propagate config state to conmon cleanup args so the callback podman
   gets the same config.

Note: This doesn't really change any behaviour yet (other than the changes
in containers/storage).

Signed-off-by: Alexander Larsson <alexl@redhat.com>
This brings a performance improvement to `podman run` on top of the
other transient_store improvements in containers/storage:

Transient mode without transient bolt_db:

Benchmark 1: bin/podman run --transient-store=true --rm --pull=never --network=host --security-opt seccomp=unconfined fedora true
  Time (mean ± σ):     130.6 ms ±   5.8 ms    [User: 44.4 ms, System: 25.9 ms]
  Range (min … max):   122.6 ms … 143.7 ms    21 runs

Transient mode with transient bolt_db:

Benchmark 1: bin/podman run --transient-store=true --rm --pull=never --network=host --security-opt seccomp=unconfined fedora true
  Time (mean ± σ):     100.3 ms ±   5.3 ms    [User: 40.5 ms, System: 24.9 ms]
  Range (min … max):    93.0 ms … 111.6 ms    29 runs

Signed-off-by: Alexander Larsson <alexl@redhat.com>
Signed-off-by: Alexander Larsson <alexl@redhat.com>
This just calls GC on the local storage, which will remove any leftover
directories from previous containers that are not in the podman db anymore.
This is useful primarily for transient store mode, but can also help in
the case of an unclean shutdown.

Also adds some e2e test to ensure prune --external works.

Signed-off-by: Alexander Larsson <alexl@redhat.com>
This changes references to `/etc/containers/storage.conf` (and similar) to
links to `containers-storage.conf(5)`, as there are alternative locations
for this file.

Signed-off-by: Alexander Larsson <alexl@redhat.com>
Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Nice work, @alexlarsson !

@rhatdan
Copy link
Member

rhatdan commented Dec 6, 2022

/approve
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 6, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 6, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alexlarsson, rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 4a8d953 into containers:main Dec 6, 2022
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 18, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 18, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/api-change Change to remote API; merits scrutiny lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note-none
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants