Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect runtime file locations #4820

Merged
merged 5 commits into from
Apr 2, 2024
Merged

Conversation

holmanb
Copy link
Member

@holmanb holmanb commented Jan 30, 2024

Proposed Commit Message

fix: Fix runtime file locations for cloud-init

Update various hardcoded filepaths. Also make sure we
bootstrap our Paths() config correctly so that we read
from the configured rundir.

Fixes GH-4766

Additional Context

Fixes #4766

This doesn't fix the /etc file location issue, but it also shouldn't make it any worse than it already is.

That should be fixed in a follow-up PR.

Merge type

  • Squash merge using "Proposed Commit Message"
  • Rebase and merge unique commits. Requires commit messages per-commit each referencing the pull request number (#<PR_NUM>)

@holmanb holmanb added the wip Work in progress, do not land label Jan 30, 2024
@github-actions github-actions bot added the stale-pr Pull request is stale; will be auto-closed soon label Feb 13, 2024
@canonical canonical deleted a comment from github-actions bot Feb 13, 2024
@blackboxsw blackboxsw removed the stale-pr Pull request is stale; will be auto-closed soon label Feb 13, 2024
Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of fixes needed for this approach to work I think. I'd like to avoid the bootstrap of config, but I don't see how given that we'd need to source all applicable config sources first with ConfigMerger to account for any overrides before trying to re-process that content.

Ideally if we've bootstrapped config we wouldn't need to re-merge, but since run_dir can change, that'd potentially affect what sources for <run_dir>.cloud.cfg we pull in or ignore as well.

cloudinit/stages.py Outdated Show resolved Hide resolved
cloudinit/stages.py Outdated Show resolved Hide resolved
cloudinit/cmd/main.py Show resolved Hide resolved
cloudinit/stages.py Outdated Show resolved Hide resolved
cloudinit/cmd/devel/logs.py Show resolved Hide resolved
cloudinit/cmd/devel/logs.py Show resolved Hide resolved
# distro-specific rund_dir locations. Once we have the run_dir
# we re-read our config with a valid Paths() object. This code has to
# assume the location of /etc/cloud/cloud.cfg && /etc/cloud/cloud.cfg.d
bootstrapped_config = self._read_bootstrap_cfg(extra_fns, {})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm understanding the reason you are bootstrapping config here is to obtain the fully merged system_info.pathsdefined in any of the default config sources including the filesystem in /etc/, extra_fns and the datasource config.

If so, this bootstrapped_config is a dict that won't actually surface the top-level config path keys you are trying to re-pass into Paths initialization, so the bootstrap isn't honoring the merged system_info.paths configuration obtained because you are passing the full config dictionary instead of the specific bootstrapped_config["system_info"]["paths"] if present. This would be where we'd obtain the FreeBSD customized config from the file-system.

So I think what we probably want here is _read__boostrap_cfg(extra_fns, bootstrapped_config,get("system_info", {}).get("paths", {}) to ensure the processed mergedsystem_info.paths is being honored.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think what we probably want here is _read__boostrap_cfg(extra_fns, bootstrapped_config,get("system_info", {}).get("paths", {}) to ensure the processed mergedsystem_info.paths is being honored.

+1 thanks for the fix

Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

request changes on a couple of bugs in the approach at the moment. This bootstrapping is potentially painful for configurations that are overriding path defaults via /etc/cloud/cloud.cfg and forces a full re-merge of all content on all runs of cloud-init even when only a small percentage of images are changing those cloud_dir/run_dir defaults in system config. I wonder if it's worth instead just checking post ConfigMerger if merged._cfg["system_info"]["paths"]["run_dir"] != self.paths.run_dir and only re-run read_bootstrap_cfg if we've detected a change in run_dir.

Copy link

Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close.

If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging TheRealFalcon, and he will ensure that someone takes a look soon.

(If the pull request is closed and you would like to continue working on it, please do tag TheRealFalcon to reopen it.)

@github-actions github-actions bot added the stale-pr Pull request is stale; will be auto-closed soon label Feb 29, 2024
@holmanb holmanb removed the stale-pr Pull request is stale; will be auto-closed soon label Feb 29, 2024
@holmanb
Copy link
Member Author

holmanb commented Feb 29, 2024

I wonder if it's worth instead just checking post ConfigMerger if merged._cfg["system_info"]["paths"]["run_dir"] != self.paths.run_dir and only re-run read_bootstrap_cfg if we've detected a change in run_dir.

This actually causes infinite recursion, since this code path is where self.paths gets defined, however we can check for change in run_dir as the trigger for re-reading config.

@holmanb holmanb changed the title wip: Respect runtime file locations Respect runtime file locations Feb 29, 2024
@holmanb holmanb removed the wip Work in progress, do not land label Feb 29, 2024
@holmanb
Copy link
Member Author

holmanb commented Feb 29, 2024

@blackboxsw I addressed your comments and fixed the tests, ready for re-review now.

tests/unittests/test_cli.py Outdated Show resolved Hide resolved
cloudinit/stages.py Outdated Show resolved Hide resolved
cloudinit/stages.py Outdated Show resolved Hide resolved
@holmanb holmanb requested a review from blackboxsw March 7, 2024 03:01
@holmanb
Copy link
Member Author

holmanb commented Mar 7, 2024

@blackboxsw thanks for the review, I think I've addressed everything

@blackboxsw
Copy link
Collaborator

The issue I'm seeing, and neglected to think about before this effort, is that our generator time-frame remains unaware of any system config run_path. This then leaves cloud-init in a split-brain scenario where our systemd generator time-frame or any early boot script invocation of ds-identify (BSD/Alpine) will store ds-identify artifacts in the default /run/cloud-init and any subsequent boot stages will observe system configuration (/etc/cloud/*) overrides and store the remainder of out python-based boot stages under the overridden system_info.paths.run_dir.

Without aligning generator timeframe run_dir artifacts and python-based boot stages with the system config overrides, cloud-init python boot stages will be unable to source the ds-identify output which determines whether cloud-init is enabled or disabled and the filtered datasource_list values.

Some thoughts, not certain how viable, but I think the generator time-frame functionality may need to be sorted in a separate prior to the python-based solution because, with the split-brain scenario, cloud-init boot stages will be looking in the wrong /run/cloud-init directory and not see the /customrundir/cloud-init/cloud.cfg which provides the filtered datasource_list as well as /customrundri/cloud-init/(enabled|disabled) files which prevent later boot stages from even running.

Ideas for fixes needed:

  • systemd/cloud-init-hotplugd.socket: maybe the biggest blocker as the ListenFIFO path uses a hard-coded /run/clou-init/hook-hotplug-cmd
  • ds-identify could use the check_config run_dir facility to attempt to use a configured run_dir. Note that check_config doesn't explicitly pay attention to YAML nested structures, so a formerly ignored top-level run_dir: something could override the official system_info:paths:run_dir nested YAML leaf even though the top-level run_dir key is ignored by python runtime as it's not the appropriate nested config value under system_info.paths
  • systemd/cloud-init-generator.tmpl: needs a function like ds-identify's check_config function (maybe worth separating set_run_path/check_config/read_uname_info into a /usr/lib/cloud-init/common.functions that both ds-id and generator can reuse?)

Here's a running diff that looks almost functional (though it as a direct cut-n-paste of the check_config/read_uname_info functions:
generator.patch.txt

Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs fixing as I think we need to sort generator time-frame functionality first to avoid split-brain issues w/ python boot stages and ds-identify behavior.

@holmanb
Copy link
Member Author

holmanb commented Mar 7, 2024

@blackboxsw great thoughts, thanks for the review.

The issue I'm seeing, and neglected to think about before this effort, is that our generator time-frame remains unaware of any system config run_path. This then leaves cloud-init in a split-brain scenario where our systemd generator time-frame or any early boot script invocation of ds-identify (BSD/Alpine) will store ds-identify artifacts in the default /run/cloud-init and any subsequent boot stages will observe system configuration (/etc/cloud/*) overrides and store the remainder of out python-based boot stages under the overridden system_info.paths.run_dir.

True. Although I don't think the path issue affects Alpine. And on BSD, ds-identify can independently behave as required due to set_run_path.

Without aligning generator timeframe run_dir artifacts and python-based boot stages with the system config overrides, cloud-init python boot stages will be unable to source the ds-identify output which determines whether cloud-init is enabled or disabled and the filtered datasource_list values.

Since ds-identify now uses /var/run for non-Linux kernels, this PR is more about making cloud-init's Python code capable of also respecting this run directory.

Multiple sources of truth is not great, but I think that with the current approach it will now at least be possible to make both use /var/run/.

Some thoughts, not certain how viable, but I think the generator time-frame functionality may need to be sorted in a separate prior to the python-based solution because, with the split-brain scenario, cloud-init boot stages will be looking in the wrong /run/cloud-init directory and not see the /customrundir/cloud-init/cloud.cfg which provides the filtered datasource_list as well as /customrundri/cloud-init/(enabled|disabled) files which prevent later boot stages from even running.

Ideas for fixes needed:

systemd/cloud-init-hotplugd.socket: maybe the biggest blocker as the ListenFIFO path uses a hard-coded /run/clou-init/hook-hotplug-cmd

I hadn't thought about this one. However, I'd be surprised if this one works on non-Linux, because udev is very Linux-specific. Searching about FreeBSD + udev I see that maybe xorg is using udev api with devd as a backend circa 2020, but I haven't found any official details about it.

ds-identify could use the check_config run_dir facility to attempt to use a configured run_dir. Note that check_config doesn't explicitly pay attention to YAML nested structures, so a formerly ignored top-level run_dir: something could override the official system_info:paths:run_dir nested YAML leaf even though the top-level run_dir key is ignored by python runtime as it's not the appropriate nested config value under system_info.paths

Using check_config for this scares me a little, but that could work. I'd be concerned about this because false positives would break all of ds-identify. Currently false positives will (I think) only result in possibly adding unwanted datasources to the list, and cloud-init's Python code has ds_detect() and a failover path which means that when this happens, cloud-init should usually still behave as intended. Maybe we limit the splash zone of this failure by checking for the existence of directories set by run_dir before using this directory? However this would mean that run_dir would also have to be a single-line k/v pair in order to work, so I'm not sure how much more we want to add reliance on our broken yaml parser.

systemd/cloud-init-generator.tmpl: needs a function like ds-identify's check_config function (maybe worth separating set_run_path/check_config/read_uname_info into a /usr/lib/cloud-init/common.functions that both ds-id and generator can reuse?)

Here's a running diff that looks almost functional (though it as a direct cut-n-paste of the check_config/read_uname_info functions:
generator.patch.txt

+1 nice, thanks for putting that together

It is up to you whether we go forward with the current approach (given the points raised above), or try for a more complete run_dir-everywhere approach which depends on check_config. I think I'd lean towards moving forward with this and possibly iterating on the generator scripts separate from this PR, but I don't feel strongly about that.

@igalic
Copy link
Collaborator

igalic commented Mar 14, 2024

finally got around to testing this, and the main issue I'm seeing is:

Which, according to rcorder(8) it does not.
We need to amend

# REQUIRE: mountcritlocal
to say cleanvar instead. (Unlike the comment that claims var_run should suffice)

@igalic
Copy link
Collaborator

igalic commented Mar 14, 2024

that seems to be my fault, tho:

run_dir: /var/run/

@igalic
Copy link
Collaborator

igalic commented Mar 15, 2024

@holmanb here's a PR addressing my complaints: holmanb#50

@holmanb
Copy link
Member Author

holmanb commented Mar 18, 2024

@holmanb here's a PR addressing my complaints: holmanb#50

great feedback, and thanks for the PR against this branch - I just merged it into this branch

@holmanb holmanb requested a review from blackboxsw March 18, 2024 18:45
@igalic
Copy link
Collaborator

igalic commented Mar 28, 2024

n.b.: I've tested this on FreeBSD and it works as expected.
Is there anything specific I should be looking out for?

@blackboxsw
Copy link
Collaborator

blackboxsw commented Apr 1, 2024

It is up to you whether we go forward with the current approach (given the points raised above), or try for a more complete run_dir-everywhere approach which depends on check_config. I think I'd lean towards moving forward with this and possibly iterating on the generator scripts separate from this PR, but I don't feel strongly about that.

Yes, I think we can move forward with this PR. I think this PR you have fixes the the highest-impact gaps to align run-time python config based on overrides provided in /etc/cloud/cloud.cfg.*. There are a couple of minor areas that I think we should track in new GH issues as run_dir override is not fully functional everywhere:

Let's file 3 issues for the following related to run_dir config overrides:

  • systemd/cloud-init-generator-tmpl awareness of cloud.cfg::system_info::paths::run_dir
  • tools/(hook-hotplug|cloud-init-hotplugd) and systemd/cloud-init-hotplugd.socket awareness of cloud.cfg::system_info::paths::run_dir resulting in both hard-coded FIFO @ default run_dir path /run/cloud-init/hook-hotplug-cmd as well as skipped hotplug events because hook-hotplug never recognized cloud-init as is_finished

We may be able to avoid the incomplete cloud-init status issue in this PR by also checking the for <default_run_dir>/enabled or <default_run_dir>/disabled files. But "the right fix" would be to somehow address this completely in the cloud-init-generator script and not band-aid a solution for cloud-init status to fallback to defaults. Ideally a ds-identify fix though would require smarter YAML processing to avoid false positives, and I don't want to introduce that here this more specific fix.

Given that overriding system_config::run_dir is generally a non-standard approach outside of BSD, and the BSD-cases are aligned with ds-identify, I think we are good to defer working those 3 issues separately for more complete support of run_dir overrides as lower-weight tasks.

Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@holmanb If you agree to create and defer the couple of remaining issues with this approach I think we can go ahead and land this PR as-is.

@holmanb
Copy link
Member Author

holmanb commented Apr 1, 2024

Thanks for the review @blackboxsw.

  • systemd/cloud-init-generator-tmpl awareness of cloud.cfg::system_info::paths::run_dir
  • tools/(hook-hotplug|cloud-init-hotplugd) and systemd/cloud-init-hotplugd.socket awareness of cloud.cfg::system_info::paths::run_dir resulting in both hard-coded FIFO @ default run_dir path /run/cloud-init/hook-hotplug-cmd as well as skipped hotplug events because hook-hotplug never recognized cloud-init as is_finished

All of these are systemd-specific code-paths, and systemd explicitly requires an ephemeral /run directory, so I don't think there is anything to fix related to these three points.

@holmanb holmanb added the 24.1 label Apr 1, 2024
@blackboxsw
Copy link
Collaborator

All of these are systemd-specific code-paths, and systemd explicitly requires an ephemeral /run directory, so I don't think there is anything to fix related to these three points.

Good point. Though some image could set run_dir: /run/mydir/ which would adhere to systemd's requiements, but still present a specialized run_dir in configuration. In these cases, some artifacts will still be placed in /run/cloud-init/ and others in /run/mydir/cloud-init leading to those errors I mentioned. But, I think me belaboring this point is really a limited return on investment for both of us because I honestly don't think there is a real use-case for changing the run_dir outside of our FreeBSD case at the moment. Given that non-Linux will align w/ /var/run we should be in good shape until a second use-case presents itself where modification of run_dir is necessary for some other reason.

@holmanb
Copy link
Member Author

holmanb commented Apr 2, 2024

Force pushed to resolve merge conflicts.

@holmanb holmanb merged commit a6e09d9 into canonical:main Apr 2, 2024
28 checks passed
blackboxsw pushed a commit to blackboxsw/cloud-init that referenced this pull request Apr 2, 2024
Update various hard-coded filepaths. Also make sure we
bootstrap our Paths() config correctly so that we read
from the configured rundir.

Co-authored-by: Mina Galić <freebsd@igalic.co>
Sponsored by: The FreeBSD Foundation

Fixes canonicalGH-4766
blackboxsw pushed a commit to blackboxsw/cloud-init that referenced this pull request Apr 2, 2024
Update various hard-coded filepaths. Also make sure we
bootstrap our Paths() config correctly so that we read
from the configured rundir.

Co-authored-by: Mina Galić <freebsd@igalic.co>
Sponsored by: The FreeBSD Foundation

Fixes canonicalGH-4766
blackboxsw pushed a commit to blackboxsw/cloud-init that referenced this pull request Apr 3, 2024
Update various hard-coded filepaths. Also make sure we
bootstrap our Paths() config correctly so that we read
from the configured rundir.

Co-authored-by: Mina Galić <freebsd@igalic.co>
Sponsored by: The FreeBSD Foundation

Fixes canonicalGH-4766
holmanb added a commit that referenced this pull request Apr 3, 2024
Update various hard-coded filepaths. Also make sure we
bootstrap our Paths() config correctly so that we read
from the configured rundir.

Co-authored-by: Mina Galić <freebsd@igalic.co>
Sponsored by: The FreeBSD Foundation

Fixes GH-4766
holmanb added a commit that referenced this pull request Apr 3, 2024
Update various hard-coded filepaths. Also make sure we
bootstrap our Paths() config correctly so that we read
from the configured rundir.

Co-authored-by: Mina Galić <freebsd@igalic.co>
Sponsored by: The FreeBSD Foundation

Fixes GH-4766
holmanb added a commit that referenced this pull request Apr 3, 2024
Update various hard-coded filepaths. Also make sure we
bootstrap our Paths() config correctly so that we read
from the configured rundir.

Co-authored-by: Mina Galić <freebsd@igalic.co>
Sponsored by: The FreeBSD Foundation

Fixes GH-4766
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[bug] ephemeral directory module not used
3 participants