Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cc_grub_dpkg: determine idevs in more robust manner with grub-mkdevicemap #358

Merged
merged 11 commits into from
Jun 1, 2020

Conversation

matthewruffell
Copy link
Contributor

@matthewruffell matthewruffell commented May 8, 2020

Please read https://bugs.launchpad.net/cloud-init/+bug/1877491

Replace the hardcoded list of devices with a more robust way of determining
the device which grub is installed to.

We use grub-probe to fetch the underlying disk the /boot directory is located on, and attempt to match the disk with its /dev/disk/by-id value. If no such /dev/disk/by-id/ value exists, we fallback to the plain disk name.

The changes are robust to unstable kernel device names and ordering, and use
/dev/disk/by-id values to populate grub-pc/install_devices where possible.

LP: #1877491

@OddBloke OddBloke self-assigned this May 11, 2020
@OddBloke OddBloke added the wip Work in progress, do not land label May 12, 2020
@OddBloke
Copy link
Collaborator

Thanks for the submission, Matthew! There's some discussion of this approach happening in the bug, so I'm marking this "wip" until we've reached a consensus.

@matthewruffell
Copy link
Contributor Author

This new revision drops nearly all the complexity from the previous version.

Changes:

  • no longer use grub-mkdevicemap as per xnox's recommendations due to it being deprecated
  • the fork between BIOS and EFI is completely removed, and all instances use the same single code path
  • we use grub-probe to determine the disk where the /boot directory is located, since it works for both BIOS and EFI systems
  • removed duplicate code for the container usecase
  • closely follow the algorithm from usable_partitions(), device_to_id() and available_ids() functions in the grub2 postinst.in file: https://paste.ubuntu.com/p/vKFNSwNyhP/

Copy link
Contributor

@xnox xnox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really good I like it ! I think this should be merged now.

Personally i would have parsed udevadm output, but walking symlinks in /dev/disk is valid too. Somebody might want to refactor it that way later. But that's just style/attitude refactor, rather than a change of behaviour.

devlist = util.subp(['awk', '{print $1}'],
data=devlist, capture=True)[0].strip()
for device in devlist.split():
if str(Path(device).resolve()) == disk:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all of the above can be done in one go:

$ udevadm info --query property $(grub-probe -t disk /boot) | grep -e DEVNAME= -e DEVLINKS
DEVNAME=/dev/nvme0n1
DEVLINKS=/dev/disk/by-path/pci-0000:04:00.0-nvme-1 /dev/disk/by-id/nvme-eui.00000000000000006479a71d90513837 /dev/disk/by-id/nvme-Sabrent_27FF079419DE00131387

Which queries grub-probe device, and gives you DEVNAME= and DEVLINKS=, if there is /by-id/ in devlinks you can use the first /by-id/ one there, otherwise fallback to /by-path/, or DEVNAME itself.

If that fails, it should produce no stdout output and only stderr messages i.e.:

echo $(udevadm info --query property $(grub-probe -t disk /non-existant) | grep -e DEVNAME= -e DEVLINKS) 2>/dev/null
grub-probe: error: failed to get canonical path of `/non-existant'.
A device name or path is required

But that's just style / refactor comment. What you do here is correct and valid too. And because I love using udevadm so much. Walking /dev/disk/by-id should be the same and probably easier to handle spaces and escaping.

@OddBloke
Copy link
Collaborator

Thanks for the review @xnox!

Just for clarity, @matthewruffell, I asked Dimitri to review the PR from a grub POV. I'm going to follow-up with a code review from a cloud-init perspective in the next day or two. (We have no indication that this is a high priority, so I've just added this to my TODO list for now. If there's some time sensitivity here that I'm not aware of, please let me know!)

@matthewruffell
Copy link
Contributor Author

Thank you for the review @xnox, and for your insight into udevadm as an alternative to walking symlinks.

@OddBloke Thanks for being clear with next steps. There is some time sensitivity, but nothing major, with SLA expiry coming up mid next month on 2020-06-14. The bug this patch fixes is more of a nuisance than anything, as it ensures that user's don't see the interactive dpkg prompt for selecting an install location when they upgrade their grub2 packages.

I will be available for quick iterations if you would like anything changed, just let me know.

As for getting this SRU'd into Ubuntu releases, do I need to submit any patches, or do you take care of that? I read https://wiki.ubuntu.com/CloudinitUpdates and it seems you backport the current tree. I'm happy to help test regardless.

@OddBloke
Copy link
Collaborator

@OddBloke Thanks for being clear with next steps. There is some time sensitivity, but nothing major, with SLA expiry coming up mid next month on 2020-06-14. The bug this patch fixes is more of a nuisance than anything, as it ensures that user's don't see the interactive dpkg prompt for selecting an install location when they upgrade their grub2 packages.

I will be available for quick iterations if you would like anything changed, just let me know.

I plan on reviewing this after lunch, so I'll let you know, thanks!

As for getting this SRU'd into Ubuntu releases, do I need to submit any patches, or do you take care of that? I read https://wiki.ubuntu.com/CloudinitUpdates and it seems you backport the current tree. I'm happy to help test regardless.

Yep, this would go back as part of a regular SRU. That said, we're about to start an SRU process and we wouldn't expect to do another one before the described deadline. I'm wondering, @blackboxsw, if we should hold on the SRU until this lands. @matthewruffell, what are the target Ubuntu releases that the SLA applies to?

Copy link
Collaborator

@OddBloke OddBloke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the speedy turnaround, Matthew!

At a high-level, the main problem here is that we're shelling out to perform some string/list manipulation that would be much better performed in Python code. I've left some other comments where they made sense, but this'll need a re-review once it's all written in Python.

cloudinit/config/cc_grub_dpkg.py Outdated Show resolved Hide resolved
cloudinit/config/cc_grub_dpkg.py Outdated Show resolved Hide resolved
cloudinit/config/cc_grub_dpkg.py Outdated Show resolved Hide resolved
cloudinit/config/cc_grub_dpkg.py Outdated Show resolved Hide resolved
cloudinit/config/cc_grub_dpkg.py Outdated Show resolved Hide resolved
@matthewruffell
Copy link
Contributor Author

Again, thanks for the review and feedback @OddBloke, these iterations really do help improve code quality.

Changelog:

  • Removed subprocess calls to sort and awk, and implemented their string manipulation in pure Python
  • Dropped status_cb from the subp() call for grub-probe
  • We no longer have to resolve symlinks twice.

Tested on AWS c5d.large (nitro) and t2.micro (xen) instances, LXD (with and without grub-common installed), KVM (BIOS and EFI).

How to test:

  1. Reset value of grub-pc/install_devices
    $ echo reset grub-pc/install_devices | sudo debconf-communicate grub-pc
  2. Run cloud-init:
    $ sudo cloud-init single --name grub-dpkg --frequency=always
  3. Fetch value of grub-pc/install_devices
    $ echo get grub-pc/install_devices | sudo debconf-communicate grub-pc

cloudinit/config/cc_grub_dpkg.py Outdated Show resolved Hide resolved
cloudinit/config/cc_grub_dpkg.py Outdated Show resolved Hide resolved
cloudinit/config/cc_grub_dpkg.py Outdated Show resolved Hide resolved
cloudinit/config/cc_grub_dpkg.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@OddBloke OddBloke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, thanks for the review and feedback @OddBloke, these iterations really do help improve code quality.

Thank you for iterating!

Changelog:
* Removed subprocess calls to sort and awk, and implemented their string manipulation in pure Python
* Dropped status_cb from the subp() call for grub-probe
* We no longer have to resolve symlinks twice.

This is all good stuff, thanks! I have a few other inline comments/thoughts.

I will also want you to take a look at writing some unit tests for the module. I didn't mention this before, because I didn't want you to spend time writing tests for code that we were still iterating on, but now that I think we're beginning to settle on a form for the code that makes sense (assuming you're happy with my inline comment about refactoring, that is), I think it's worth mentioning.

Let me know if you'd like some guidance, I'd be happy to chat on IRC or in a video call. We have some documentation on what we expect of unit tests in our HACKING doc and test_resolv_conf.py should give you an idea of the preferred structure,

@OddBloke
Copy link
Collaborator

Oh, and as an aside: add your GH username to tools/.github-cla-signers and the CLA check will stop failing.

matthewruffell added a commit to matthewruffell/cloud-init that referenced this pull request May 20, 2020
@matthewruffell
Copy link
Contributor Author

Again, @OddBloke thanks for the review! These iterations have been great, and I can see my code getting better, smaller and more robust with each feedback cycle.

To tell you the truth, I have never written a unit test in my life, but I followed the docs, looked at a lot of the existing examples and somehow managed to write one that works. It is very messy, and will likely need another revision. The tests are currently violating the 80 char line limits, and suggestions on how to fix that would be welcome.

Changelog:

  • We no longer walk symlinks in /dev/disk/by-id, as we now parse the output of udevadm info --query=symlinks instead.
  • Most imports have been removed, and we only rely on cloudinit.util now.
  • Moved logic in the try block to a new function, fetch_idevs()
  • Wrote a unit test that gets 100% coverage on fetch_idevs().

matthewruffell added a commit to matthewruffell/cloud-init that referenced this pull request May 25, 2020
Replace the hardcoded list of devices with a more robust way of determining
the device which grub is installed to.

We use grub-probe to fetch the underlying disk the /boot directory is
located on, and attempt to match the disk with its /dev/disk/by-id value.
If no such /dev/disk/by-id/ value exists, we fallback to the plain disk
name.

The changes are robust to unstable kernel device names and ordering, and use
/dev/disk/by-id values to populate grub-pc/install_devices where possible.

LP: #1877491
matthewruffell added a commit to matthewruffell/cloud-init that referenced this pull request May 25, 2020
Add a unit test for fetch_idevs() as a part of the refactoring
effort for cc_grub_dpkg.

There are 5 unit tests which correspond to common environments
which cloud-init can be expected to run.

LP: #1877491
matthewruffell added a commit to matthewruffell/cloud-init that referenced this pull request May 25, 2020
@matthewruffell
Copy link
Contributor Author

Hi @OddBloke, please review my latest iteration.

Changelog:

  • I tidied up the unit test a little bit and added docstrings
  • Added a testcase that covers SCSI drives, like what you would find on GCP.
  • I also made the strings in the unit test generic and not referencing any particular cloud vendor.

Copy link
Collaborator

@OddBloke OddBloke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Matthew, I agree that this is getting better and better. Thank you for your continued engagement in iterations, it's much appreciated!

(Comments inline. 👍)

cloudinit/config/cc_grub_dpkg.py Outdated Show resolved Hide resolved
cloudinit/config/cc_grub_dpkg.py Show resolved Hide resolved
cloudinit/config/cc_grub_dpkg.py Show resolved Hide resolved
cloudinit/config/tests/test_grub_dpkg.py Outdated Show resolved Hide resolved
cloudinit/config/tests/test_grub_dpkg.py Outdated Show resolved Hide resolved
cloudinit/config/tests/test_grub_dpkg.py Outdated Show resolved Hide resolved
matthewruffell added a commit to matthewruffell/cloud-init that referenced this pull request May 28, 2020
Replace the hardcoded list of devices with a more robust way of determining
the device which grub is installed to.

We use grub-probe to fetch the underlying disk the /boot directory is
located on, and attempt to match the disk with its /dev/disk/by-id value.
If no such /dev/disk/by-id/ value exists, we fallback to the plain disk
name.

The changes are robust to unstable kernel device names and ordering, and use
/dev/disk/by-id values to populate grub-pc/install_devices where possible.

LP: #1877491
matthewruffell added a commit to matthewruffell/cloud-init that referenced this pull request May 28, 2020
Add unit tests for fetch_idevs() and handle() as a part of the refactoring
effort for cc_grub_dpkg.

There are 11 tests which correspond to common environments which cloud-init
can be expected to run.

LP: #1877491
matthewruffell added a commit to matthewruffell/cloud-init that referenced this pull request May 28, 2020
@matthewruffell
Copy link
Contributor Author

Hi @OddBloke, again thanks for the feedback. I have implemented the changes you have requested, please review this revision.

Changelog:

  • Added unit test class TestHandle to test the handle() method in the various ways idevs and idevs_empty can be set.
  • Added more comprehensive exception handling to separate out expected conditions from the unexpected. Adding appropriate logging.
  • Updated TestFetchIdevs to reflect changes in exception handling, and we make sure to test all exception handling routines.
  • Changed the filter statement to be more "Pythonic".

Copy link
Collaborator

@OddBloke OddBloke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks good now; thank you! I've suggested some minor changes inline: I'm going to ask another cloud-init dev to review those, then will land them into your branch (and fix CI, which will potentially fail), and then land the branch. (There should be no need for input from you, enjoy your PTO. 👍)

cloudinit/config/cc_grub_dpkg.py Show resolved Hide resolved
cloudinit/config/cc_grub_dpkg.py Outdated Show resolved Hide resolved
cloudinit/config/tests/test_grub_dpkg.py Outdated Show resolved Hide resolved
cloudinit/config/cc_grub_dpkg.py Outdated Show resolved Hide resolved
# are not exposed to the container.
elif "failed to get canonical path" in e.stderr:
log.debug("Device mapping between /proc/self/mountinfo does not "
"match /dev, e.g. inside a container")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code does not attempt to compare /proc/self/mountinfo and host info; so I would not include this debug message; there may be other reasons for this error.

cloudinit/config/cc_grub_dpkg.py Outdated Show resolved Hide resolved
cloudinit/config/cc_grub_dpkg.py Show resolved Hide resolved
cloudinit/config/cc_grub_dpkg.py Outdated Show resolved Hide resolved
matthewruffell and others added 8 commits June 1, 2020 15:18
Replace the hardcoded list of devices with a more robust way of determining
the device which grub is installed to.

We use grub-probe to fetch the underlying disk the /boot directory is
located on, and attempt to match the disk with its /dev/disk/by-id value.
If no such /dev/disk/by-id/ value exists, we fallback to the plain disk
name.

The changes are robust to unstable kernel device names and ordering, and use
/dev/disk/by-id values to populate grub-pc/install_devices where possible.

LP: #1877491
Add unit tests for fetch_idevs() and handle() as a part of the refactoring
effort for cc_grub_dpkg.

There are 11 tests which correspond to common environments which cloud-init
can be expected to run.

LP: #1877491

try:
# check if disk exists and use udevadm to fetch symlinks
if disk and os.path.exists(disk):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If grub-probe -t returns OK, but does not provide a disk value (which this check seems to think is possible), should we exit earlier and just move this check up after the try/except for grub-probe?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If grub-probe fails, we'd have disk = "" still from line 57. So it at least protects us from that. (I'm not 100% sure I've understood the request, so if this reply doesn't make sense, please do LMK.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm suggesting that we do this after the grub-probe call and before we try udev call.

if not disk:
return []

This handles either scenario I was thinking of: 1) grub-probe fails, we have disks = "" 2) grub-probe returns 0, but produces no standard-output, so disk = ""

If we don't have a disk value, we might as well just return []

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, right, I see what you mean. Good point, on it.

Copy link
Collaborator

@raharper raharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good.

@OddBloke OddBloke dismissed their stale review June 1, 2020 21:29

Comments addressed

@OddBloke OddBloke removed the wip Work in progress, do not land label Jun 1, 2020
@OddBloke OddBloke merged commit fc07d63 into canonical:master Jun 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants