-
Notifications
You must be signed in to change notification settings - Fork 657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix eclipsing vms in ctor #2101
Conversation
c70ebd7
to
dddae4d
Compare
Let exceptions that arise when creating a VM through. This avoids erasing instance data that could be recoverable and important for the user, but aborts the daemon since it can't honor its database, which would otherwise be out of sync. More robust approaches for particular causes can always added. Fixes #1658.
Log and remove, from the db, instances whose images are missing when the daemon comes back up. This covers cases where the user deletes instances directly from the backend, or otherwise removes them from disk.
dddae4d
to
900fe31
Compare
Drop prepare call from the mock vault's default action for fetch_image, which when called through `fetch_images_for` in daemon.cpp, returned an empty image. Fixes a few test failures now that the daemon checked that instance images exist in the ctor.
Replace instance creation throw in ctor with image verification failure.
Codecov Report
@@ Coverage Diff @@
## main #2101 +/- ##
==========================================
- Coverage 81.48% 81.47% -0.01%
==========================================
Files 184 184
Lines 9456 9457 +1
==========================================
Hits 7705 7705
- Misses 1751 1752 +1
Continue to review full report at Codecov.
|
So that it can be used in the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I think failing the daemon is better than deleting someone's instance wholesale, I'm still concerned that if the daemon can't start, then how does a user recover from this?
In the end, I'll approve this, but I'd like some feedback from the team here on how we plan to address a situation where the daemon just won't start due to hitting this.
Oh sure, this is a stop-gap. Long-term we should mark the instance "Broken" and skip over it. Another |
Sure, I definitely understand this is a stop-gap and is better than just deleting the instance. But we are trading deleting an instance wholesale with the potential of the daemon not starting at all (which is a better failure). My question is, if/when users complain that Multipass is not starting at all, how will we deal with that? Have users modify the instance DB to remove the problematic instance without deleting it in order to get the daemon to start? Wing it and just cross that bridge when we get to it? Again, I'm not saying this shouldn't go in, I just want us all to be on the same page about how we'll deal with the daemon not starting. I guess where I'm going with this is we really need a full solution for this ASAP. |
There are two sorts of situations that this PR covers by aborting the daemon:
If I am not mistaken, most reports have been of the first kind. In such cases, marking all instances as BROKEN is not much better then quitting: if instances can't be launched, there isn't much that daemon could do anyway. That would be useful mainly for situation 2. But, depending on how frequent we find case 2 to be, maybe guiding users to remove/recover instances manually won't be that big a deal and that "full solution" may not be all that urgent. I do think it's less urgent than fixing permanently erased instances. Of course, if we find specific measures to help on a case-by-case basis (e.g. a bigger timeout somewhere), we can implement them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this is better than deleting an instance.
bors merge
2101: Fix eclipsing vms in ctor r=townsend2010 a=ricab Let exceptions that arise when creating a VM in the daemon constructor through. This avoids erasing instance data that could be recoverable and important for the user, but aborts the daemon since it can't honor its database, which would otherwise be out of sync. More robust approaches for particular causes can always added. Fixes #1658. Co-authored-by: Ricardo Abreu <ricardo.abreu@canonical.com>
Build failed: |
bors retry |
2099: [qemu] ensure `dnsmasq.hosts` exists r=townsend2010 a=surahman **_With respect to issue #1118:_** Added a check on `DNSMasq` bootstrapping for `dnsmasq.hosts` file in the constructor. If absent, an empty file will be created in the `data_dir`. 2101: Fix eclipsing vms in ctor r=townsend2010 a=ricab Let exceptions that arise when creating a VM in the daemon constructor through. This avoids erasing instance data that could be recoverable and important for the user, but aborts the daemon since it can't honor its database, which would otherwise be out of sync. More robust approaches for particular causes can always added. Fixes #1658. Co-authored-by: Saad Ur Rahman <saad.ur.rahman@gmail.com> Co-authored-by: Ricardo Abreu <ricardo.abreu@canonical.com>
Build failed (retrying...): |
2101: Fix eclipsing vms in ctor r=townsend2010 a=ricab Let exceptions that arise when creating a VM in the daemon constructor through. This avoids erasing instance data that could be recoverable and important for the user, but aborts the daemon since it can't honor its database, which would otherwise be out of sync. More robust approaches for particular causes can always added. Fixes #1658. Co-authored-by: Ricardo Abreu <ricardo.abreu@canonical.com>
Build failed:
|
bors retry |
2101: Fix eclipsing vms in ctor r=townsend2010 a=ricab Let exceptions that arise when creating a VM in the daemon constructor through. This avoids erasing instance data that could be recoverable and important for the user, but aborts the daemon since it can't honor its database, which would otherwise be out of sync. More robust approaches for particular causes can always added. Fixes #1658. Co-authored-by: Ricardo Abreu <ricardo.abreu@canonical.com>
Build failed: |
bors retry |
2099: [qemu] ensure `dnsmasq.hosts` exists r=townsend2010 a=surahman **_With respect to issue #1118:_** Added a check on `DNSMasq` bootstrapping for `dnsmasq.hosts` file in the constructor. If absent, an empty file will be created in the `data_dir`. 2101: Fix eclipsing vms in ctor r=townsend2010 a=ricab Let exceptions that arise when creating a VM in the daemon constructor through. This avoids erasing instance data that could be recoverable and important for the user, but aborts the daemon since it can't honor its database, which would otherwise be out of sync. More robust approaches for particular causes can always added. Fixes #1658. Co-authored-by: Saad Ur Rahman <saad.ur.rahman@gmail.com> Co-authored-by: Ricardo Abreu <ricardo.abreu@canonical.com>
Build failed (retrying...): |
2101: Fix eclipsing vms in ctor r=townsend2010 a=ricab Let exceptions that arise when creating a VM in the daemon constructor through. This avoids erasing instance data that could be recoverable and important for the user, but aborts the daemon since it can't honor its database, which would otherwise be out of sync. More robust approaches for particular causes can always added. Fixes #1658. Co-authored-by: Ricardo Abreu <ricardo.abreu@canonical.com>
Build failed: |
bors retry |
2101: Fix eclipsing vms in ctor r=townsend2010 a=ricab Let exceptions that arise when creating a VM in the daemon constructor through. This avoids erasing instance data that could be recoverable and important for the user, but aborts the daemon since it can't honor its database, which would otherwise be out of sync. More robust approaches for particular causes can always added. Fixes #1658. Co-authored-by: Ricardo Abreu <ricardo.abreu@canonical.com>
This is getting funny 😆 |
🤦 |
Build failed: |
Gah. bors merge |
Build succeeded: |
Let exceptions that arise when creating a VM in the daemon constructor through. This avoids erasing instance data that could be recoverable and important for the user, but aborts the daemon since it can't honor its database, which would otherwise be out of sync. More robust approaches for particular causes can always added. Fixes #1658.