-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib/deploy: Use a FIFREEZE/FITHAW cycle for /boot #1049
Conversation
@cgwalters thanks, sorry i don't understand about
as i know, that kernel by default supports automatic unfreeze via:
|
You're quoting https://lwn.net/Articles/287435/? As best I can tell, that timeout feature never landed. See https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fcccf502540e3d752d33b2d8e976034dee81f9f7 |
@cgwalters thanks =). i'm use this feature is more then 3 years and think that it worked =)). |
const char *path, | ||
GCancellable *cancellable, | ||
GError **error) | ||
fsfreeze_thaw_cycle (int rootfs_dfd, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a small comment here with some of the rationale from the commit msg? E.g. why we're doing FIFREEZE/FITHAW
instead of just syncfs
.
else | ||
return glnx_throw_errno_prefix (error, "ioctl(FIFREEZE)"); | ||
} | ||
if (ioctl (rootfs_dfd, FITHAW, 0) != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a shortcircuit here based on some env var so we can add an insttest that the watchdog kicks in as predicted?
if (TEMP_FAILURE_RETRY (write (pipe_write, &c, sizeof (c))) < 0) | ||
err (1, "write"); | ||
/* Wait for the parent to say it's going to freeze. */ | ||
(void) TEMP_FAILURE_RETRY (read (pipefds[0], &c, sizeof (c))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, this is referring to the same pipe, though right? Won't we just pick up the data we just wrote up above? If we want bidirectional comms, I think we need two pipes, right? (So 4 fds total).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, oops 😄 Fixed this by using a socketpair()
.
See: http://marc.info/?l=linux-fsdevel&m=149520244919284&w=2 XFS doesn't flush the journal on `syncfs()`. GRUB doesn't know how to follow the XFS journal, so if the filesystem is in a dirty state (possible with xfs `/boot`, extremely likely with `/`, if the journaled data includes content for `/boot`, the system may be unbootable if a system crash occurs. Fix this by doing a `FIFREEZE`+`FITHAW` cycle. Now, most people probably would have replaced the `syncfs()` invocation with those two ioctls. But this would have become (I believe) the *only* place in libostree where we weren't safe against interruption. The failure mode would be ugly; nothing else would be able to write to the filesystem until manual intervention. The real fix here I think is to land an atomic `FIFREEZETHAW` ioctl in the kernel. I might try a patch. In the meantime though, let's jump through some hoops and set up a "watchdog" child process that acts as a fallback unfreezer. Closes: ostreedev#876
☀️ Test successful - status-atomicjenkins |
While the original bug that brought this up only reproduces on XFS, the problem could happen on ext4 or Btrfs. So they all probably should get freeze/thaw after bootloader related things change. |
Bit hysterical that we haven't done this... On Linux we could call syncfs() for only those filesystems that we actually touched but somehow I doubt it's worth the trouble. Another option might be doing this at rpmtxnEnd() but maybe that's excessive.
Trying to get osmet to work with RHCOS, I kept being unable to mount the rootfs and it turned out that while the code I'd added to unpack the LUKS bits was working fine, mounting the rootfs was failing because the block device was read-only, but XFS wanted to replay the journal. And for some reason I haven't traced through yet, this apparently isn't affecting FCOS, but the unclean journal is breaking osmet w/RHCOS. This of course is the *second* time I've hit this problem of "ext4 flushes the journal on umount, XFS doesn't" - see also ostreedev/ostree#1049 Adding the necessary invocation of `xfs_freeze` ensures that our firstboot has a clean XFS journal for `/` which is just a good idea anyways.
Trying to get osmet to work with RHCOS, I kept being unable to mount the rootfs and it turned out that while the code I'd added to unpack the LUKS bits was working fine, mounting the rootfs was failing because the block device was read-only, but XFS wanted to replay the journal. And for some reason I haven't traced through yet, this apparently isn't affecting FCOS, but the unclean journal is breaking osmet w/RHCOS. This of course is the *second* time I've hit this problem of "ext4 flushes the journal on umount, XFS doesn't" - see also ostreedev/ostree#1049 Adding the necessary invocation of `xfs_freeze` ensures that our firstboot has a clean XFS journal for `/` which is just a good idea anyways.
See: http://marc.info/?l=linux-fsdevel&m=149520244919284&w=2
XFS doesn't flush the journal on
syncfs()
. GRUB doesn't know how to follow theXFS journal, so if the filesystem is in a dirty state (possible with xfs
/boot
, extremely likely with/
, if the journaled data includes content for/boot
, the system may be unbootable if a system crash occurs.Fix this by doing a
FIFREEZE
+FITHAW
cycle. Now, most peopleprobably would have replaced the
syncfs()
invocation with those twoioctls. But this would have become (I believe) the only place in
libostree where we weren't safe against interruption. The failure
mode would be ugly; nothing else would be able to write to the filesystem
until manual intervention.
The real fix here I think is to land an atomic
FIFREEZETHAW
ioctlin the kernel. I might try a patch.
In the meantime though, let's jump through some hoops and set up
a "watchdog" child process that acts as a fallback unfreezer.
Closes: #876