Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misleading error message if /etc/mtab is missing "The /dev/zfs device is missing and must be created." #4680

Closed
binwiederhier opened this issue May 21, 2016 · 13 comments

Comments

@binwiederhier
Copy link

On Ubuntu 16.04, I've been trying to set up ZFS in a chroot (16.04 host + chroot). I have /proc, /dev, /run, and /sys mounted. I have gotten stuck with the error message:

The /dev/zfs device is missing and must be created.
Try running 'udevadm trigger' as root to create it.

It took me days to figure out what was actually wrong: /etc/mtab was missing:

$ strace zfs list
...
open("/etc/mtab", O_RDONLY)             = -1 ENOENT (No such file or directory)
close(3)                                = 0
write(2, "The /dev/zfs device is missing a"..., 104The /dev/zfs device is missing and must be created.
Try running 'udevadm trigger' as root to create it.
) = 104
...

It'd be nice to have a more descriptive error message.

@rlaager
Copy link
Member

rlaager commented May 23, 2016

This comes up in #zfsonlinux regularly. +1 to a better error message. (Maybe I should dive into this and write a patch.)

@behlendorf
Copy link
Contributor

@rlaager please do!

@rlaager
Copy link
Member

rlaager commented May 24, 2016

In lib/libzfs/libzfs_util.c, libzfs_error_init() returns the error string in question. The attempt to access /etc/mtab seems (from just reading the code) to be coming from libzfs_init() in the same file. This seems like a weird design for error handling. All errors in libzfs_init() need to result in distinct errno values, or there's no way for libzfs_error_init() to distinguish them. In this case, a lack of /dev/zfs or a lack or /etc/mtab results in errno = ENOENT.

If changing the interface is an option, libzfs_init() could take an error parameter. Then it could fill that with the a string specific to the actual problem. This is the most future-proof option, but breaks backwards compatibility.

If we can't change the interface, then there are a few options. We could have libzfs_init() store an error string in a global variable and libzfs_error_init() could simply always return that, ignoring the parameter to libzfs_error_init(). Another option would be to modify libzfs_error_init() to stat() /dev/zfs in the ENOENT path; if that works, print a different message about /etc/mtab; that seems pretty ugly, though.

The simplest option is to just change the error message to mention that it could be either /dev/zfs or /etc/mtab that's missing.

I'm happy to work up a patch, but how do we feel about bumping the major version on libzfs. Note that GRUB can compile against libzfs, but generally doesn't on Linux. I'm incline to take the approach of storing the error state internally and having libzfs_error_init() ignore its parameter.

@behlendorf
Copy link
Contributor

@rlaager let's not change the libzfs_init() interface unless we have too, at this point there are external consumers we don't want to break. This is the main reason the error handling here is slightly weird, the original upstream implementation simply returned NULL for all failures. Setting errno was added latter to provide some information without having to change the interface.

As for a solution I'd like to avoid adding a global to communicate the error. I'd suggest we just update the error message to mention /etc/mtab. That would resolve the main issue here.

Related to this if someone is feeling ambitious we should look in to using libmount. When this code was originally written libmount didn't exist on most Linux systems but now it does and it's something we could take advantage of.

@rlaager
Copy link
Member

rlaager commented Jun 3, 2016

One complicating factor is that the error message is a static string. If we write "/etc/mtab" in the string, then we have hard-coded the filename, which may not match the value of MNTTAB. If we put MNTTAB into the string, then we won't properly match strings in the translation catalog (if MNTTAB is set to anything other than /etc/mtab).

Do we actually care about /etc/mtab being a real file? If we don't, we could replace MNTTAB with a hard-coded /proc/self/mounts everywhere, including in the error message string. I think there's some code to update /etc/mtab, which would could drop too. It seems almost everyone else on Linux has already dropped /etc/mtab. Specifically, /etc/mtab has been deprecated in util-linux for some and support is dropped in 2.27.1. Also, systemd does not support /etc/mtab being a file.

@behlendorf
Copy link
Contributor

@rlaager that's a good point. My recollection of the code is that we already handle the case where /etc/mtab is a symlink to /proc/self/mounts. It wouldn't take much to extend that to not caring at all if /etc/mtab even exists.

@chiluk
Copy link

chiluk commented Jul 29, 2016

I agree with @rlaager . We had a user (ubuntu), hit this exact issue recently (see https://bugs.launchpad.net/ubuntu/+source/sysvinit/+bug/1607920). I really think that zfs should depend on /proc/self/mounts, and only check for /etc/mtab if it does not exist (purely for supporting ancient kernels). Additionally /etc/mtab does not ship with a number of cloud images, and only gets created on first boot by sysvinit scripts, or the systemd equivalents depending on OS. As a result there are a number of timing races that happen on first boot. Some of which result in zfs services showing as failed. If zfs depended directly on /proc/self/mounts, it would eliminate all of these race conditions.

Is anyone working on this? If I go and write a patch, would it get rejected in preferance to a libmount patch? There seems to be a number of places where /etc/mtab is hardcoded or has logic around it to deal with /proc/self/mounts.

@behlendorf
Copy link
Contributor

The only person I'm aware of who might be working on this is @rlaager. I wouldn't be opposed to writting a patch which updates this to use /proc/self/mounts. At some point we may end up replacing this with full libmount integration but that's probably quite a ways off and something no one is working on.

@rlaager
Copy link
Member

rlaager commented Jul 29, 2016

In that case, I (though note that my opinion isn't "authoritative" in any way) think the patch needs to do the following things:

  1. Remove all code related to writing to MNTTAB. Anywhere that it checks to see if it's writable, rip out the code path where it is.
  2. Replace all instances of MNTTAB with /proc/self/mounts (hardcoded).
  3. Update the error message to mention /proc/self/mounts.

@chiluk Did you want to work on this? I'd be happy to review it, talk to you on IRC about the design, etc. If you aren't interested or don't have the time, I can try to whip something up. I've been extra busy for the summer, but hopefully that'll be slowing down soon.

@chiluk
Copy link

chiluk commented Jul 30, 2016

I'll do it. I'm leaving for vacation for the next few days, but I'll get
something together eventuality.

I'll also integrate whatever we come up with into Ubuntu.

Dave.

On Jul 29, 2016 5:53 PM, "Richard Laager" notifications@github.com wrote:

In that case, I (though note that my opinion isn't "authoritative" in any
way) think the patch needs to do the following things:

  1. Remove all code related to writing to MNTTAB. Anywhere that it
    checks to see if it's writable, rip out the code path where it is.
  2. Replace all instances of MNTTAB with /proc/self/mounts (hardcoded).
  3. Update the error message to mention /proc/self/mounts.

@chiluk https://github.com/chiluk Did you want to work on this? I'd be
happy to review it, talk to you on IRC about the design, etc. If you aren't
interested or don't have the time, I can try to whip something up. I've
been extra busy for the summer, but hopefully that'll be slowing down soon.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#4680 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ACDI0w34tXxKbRihJrB8fqqb8oZn1lk0ks5qaoRbgaJpZM4Ij1Vh
.

@slashdd
Copy link
Contributor

slashdd commented Aug 25, 2016

Since chiluk will be on vacation for the next few days ... I'll be working on it as well and will pull request something hopefully sometimes soon.

Eric

@slashdd
Copy link
Contributor

slashdd commented Aug 25, 2016

Please ignore the ~6 commits above.
Those commit are tests I made using a fork on my account.

I don't know why they showed up in the issue page.
I assume it's due to the fact that "Closes #4680" is mentioned in the commit log.

Sorry for the inconvenient.

Eric

rlaager added a commit to rlaager/zfs that referenced this issue Sep 11, 2016
behlendorf pushed a commit that referenced this issue Sep 20, 2016
Fix misleading error message:

 "The /dev/zfs device is missing and must be created.", if /etc/mtab is missing.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Eric Desrochers <eric.desrochers@canonical.com>
Closes #4680 
Closes #5029
@scotte
Copy link

scotte commented Sep 28, 2016

Thanks for resolving this - confirmed it resolves the issue for me with embryonic AWS images.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants