-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Device Symlink detection in zpool import #4500
Comments
@FireDrunk great timing, enough of the groundwork is now in place that you can solve this issue the same way it was done in FreeBSD / Illumos. @don-brady's recent commit to master, 39fc0cb, added support to store the devid and phy_path keys in the vdev label. That means we can reinstate the logic in |
I've read the code on OpenZFS, and what they do once a device is reopend on a different device link, is rewrite the VDEV with the 'new' information. Is that something we want to do in ZoL? Practical example: Someone creates a pool on /dev/sdc when booting from an USB thumb drive, rebootes, and the /dev/sdc is now /dev/sdb. ZFS cannot open the device, so it starts searching for symlinks, and finds the correct disk in /dev/disk/by-id/. After which ZFS rewrites the correct device id in the VDEV physpath value. After a zpool status, the users sees his disk under /dev/disk/by-id/. Is this expected behaviour? or do we want to let the user still see the /dev/sdb he first used? On another note: Where is the code located where ZFS "tastes" a given device to see if it's the correct/valid ZFS device that it's 'needing' when mounting a pool? |
@FireDrunk good question. That would definitely be a change in behavior from how things behave today, but it's almost certainly a desirable one for devices like As for device tasting it depends on how a pool is imported. Either your going to be opening a pool based on the configuration stored in the cache file, or using a configuration dynamically generated from blkid or directory scanning. In either case basic sanity check for the configuration are validated in |
So what we can do is classify an array of "bad paths" on which we actively rewrite the path to a better one we find using symlinks. If the device requested is 'normally' in a good path, but we can only find it in another good path, we could notify the user on import that it is possible to import the pool, but the user would need to do a -F to rewrite/update the vdev locations. This can cause mixed pools in rare cases (by mixed i mean, some links in /dev/disk/by-id/ and some via /dev/disk/by-path for instance). We can always allow the use of bad paths thru an environment setting, and maybe it's even possible to let the user declare 'bad-paths' thru module settings or environment variables. For instance, i'm thinking about /dev/mapper locations that some users might want (in case of Fibre Channel backup LUN's on SAN's, but in other cases (iSCSI?) you might not want to. Does that seam like a feasible idea? Now that i'm thinking about it: FreeBSD's behaviour is just import the pool on other device nodes, so if we want to 'allign' the implementation, we might have to look sideways to the other implementations. |
The raw code @FireDrunk speaks of is this: This is what it outputs:
So it will return the first symlink it will find on a priority list basis. That list is nothing more than the list already implemented in ZoL:
After the user normally created the pool, we import the pool again based on the default list. That can imply a mixed symlink import. Only if the user forced the search path with the
So ZFS will first look for symlinks under |
One other thing regarding "device tasting" I should mention is that we're systematically moving toward depending on libblkid. As you're probably aware this is the standard mechanism used under Linux to determine whats on a block device as they are dynamically added to the system. In master, if you're not using a cache file the default behavior is to ask libblkid for the list of available zpool member devices. This list will be considered authoritative when constructing a pool. To get the old probing behavior you can either specify a list of directories with -d, or use -s to scan the default paths. Building on top of udev and libblkid allows us to start properly handling the various auto* zpool properties. It also let's us get notifications for device removal and addition. 7d11e37 Require libblkid |
Does libblkid 'identify' disks as being Valid ZPool members by use of probing? Or does it rely on GPT partition type information or something? If so, we could just ask libblkid to disregard any /dev/sd* links and just ask for "the rest" and make a decent table of devices with the result. |
The way blkid works is when a disk is added to the system a udev event is run which results in As I mentioned before we've been gradually integrating with the blkid cache. What happens today in master when Recently libblkid was made mandatory in the master code and that resulted in us uncovering a few new issues. @FireDrunk since I know you've been looking in to this it would be great if you could review the patches in #4523. It would be great if you could also post your thoughts on what is still needs to be done. |
Does this then fully rely on proper vdev information? What if that fails? The idea of device tasting is that we don't rely on just one way of identifying a disk, but check various |
I think you are misunderstanding Brian. udev triggers at boot for each block device, regardless of it's path. This means that the blkid cache will contain all succesfully tasted devices. Once ZFS comes along (when doing the zpool import stage) it will ask blkid for all devices that were tasted as being a ZFS device. This results in a list of devices that ZFS is going to try to probe itself for 'internal' vdev information. I think that should be fine as long as the UDEV / BLKID device tasting is at the same level as ZFS' internal probing. A situation that i'm thinking of now, is a 'semi-broken' device that is failing to be probed by UDEV but would be properly probed by ZFS. What is the 'chance' that this is possible @behlendorf ? Can we be 'sure' that the udev / blkid probing of ZFS VDEV Members is 'good' enough? |
@behlendorf , i've looked into #4523 and found the related blkid functions: It looks like the code asks blkid for the entire cache, and just tries to read a zpool label from it. This seems fine to be as this wil increase our changes of finding zpools on block devices. One thing tho, we need to choose whether we want to keep the /dev/disk/by-id/ or rework that to something more fancy like upgrading the links if we have 'better' paths. |
That's what i wanted to say but you said it better 😄
Chances you mean? |
Blkid has been able to reliably detect ZFS vdevs for several years now. The probing isn't quite as through as that done by ZFS, for example it does not read all the label information, but it shouldn't miss anything. Much older versions of blkid, such as that used in RHEL6, may not be quite as good. But yes, with any modern distribution we can depend on it. And there's nothing preventing us to submitting patches to improve it if we find issues.
It does but we filter out any devices which were not identified as a zfs_member type.
With the patches in #4523 the paths used by the previous import are preferred by default. If for some reason they don't work it will fallback to the This is certainly something which could be enhansed and libudev does provide some nice functionality to enumerate all possible paths given just the devnode. |
Is libudev mandatory for ZFS? I mean, do we really want to fully rely on systemd-udev and libblkid? This might upset some Gentoo users :-) |
libblkid was made mandatory, libudev was left optional. |
@FireDrunk @CurlyMoo is there still an outstanding issue here which hasn't been sufficiently addressed in the master branch? |
Closing. This was resolved by adding a dependency on libblkid. |
This is a rework / improvement of #3892.
I've started trying to figure out how to build a better handling of the Symlink / Device Tree detection mechanisms in the zpool import functions.
Currently i have some raw code to detect the symlinks. It builds a tree of devices with it's "best" symlink attached.
I'm looking into implementing this function into the zpool_find_import_impl function in libzfs_import.c#L1589.
The function starts with looping thru all directory to find "disks", the function i have already does this, and provides the 'correct' device node link on which ZFS should access the device.
In #3892, @ryao mentions that we need to update the respective zpool_do_{import,create,add,remove,attach,detach,replace} functions.
Wouldn't it be better if we did the device / symlink handling in a central place, and then hand off the results to the other zpool functions?
The text was updated successfully, but these errors were encountered: