-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZFS FS support (for NFS exports) #55
Conversation
7fcdb86
to
2370167
Compare
The failure at commit e515c4c is a bit sad. ZFS on Linux apparently fixed this use case somewhere between 0.7.5 and 0.8.4 and some thought should be given to "which ZFS version targetd aims to support." But given that the "hack" introduced in 6f909ff does fix it, it's possible to support 0.7.5. @triluch I believe the current state covers most use cases. I will go through the code once to find any spots where I should do some additional sanity checks but other than that, it looks done. |
36958a2
to
a354d95
Compare
There are some outstanding points I did not address in here as I'm not known with the expectations from the API users:
|
If the API docs are lacking https://github.com/open-iscsi/targetd/blob/master/API.md we should add additional information to make them clear.
The pool name has no requirement for it to be some type of path. Actually it likely shouldn't, eg. when you create a FS from a pool you should be able to list all the FS and locate it by name, although name uniqueness should likely only be enforced for a specific pool, not across all pools.
From an API standpoint, a FS |
@dsonck92 I'll take a look at this PR tomorrow or later this week. Thank you for adding to the client test in addition to the ZFS FS support changes. One thing that I did see was the commit message |
a354d95
to
8181ed6
Compare
I will add some notes regarding the availability of
Agreed, all backends now keep track of a list of pools they handle, and they can contain the same volume name but together with pool this is unique. Beside this, the UUIDs in use also offer uniqueness for most operations and most even require their use.
The current
ZFS has the peculiarity that, once you create a clone based off of a snapshot, this snapshot cannot be deleted until the clone is removed. Which in turn also would mean that the original volume can't be removed. I'm not sure what btrfs does in this regard. This also has the side-effect of requiring more space for the snapshot as the original volume is diverging, and more space as the clone is diverging The local exports is an interesting idea. In fact, considering that both btrfs and zfs have ways to mount their subvolumes at arbitrary paths, it could be possible to expose this. For ZFS this would be trivially handled through the zfs command as you can set the mountpoint directly, btrfs might needs some trickery with /etc/fstab or systemd-mount. However this pull request was mainly focused on adding another type of FS volume (as opposed to block volumes that are shared through iSCSI). I realize that this project, being part of open-iscsi, is diverging from its "iSCSI" source but NFS support has already been added prior to this. Given the addition of ZFS filesystem datasets besides ZFS volumes, this means that people using ZFS as their main storage system can now create both block devices and filesystems and export those over iSCSI and NFS. As noted in #54 this could provide clusters that have external data storages with dynamically provisioned filesystems for both private (iSCSI) as well as shared (NFS) storage (where especially the shared version is currently limited to complex or cloud based solutions). |
4ca23d1
to
bbd5141
Compare
As I've touched the client test with the fs tests anyways, I figured I might as well do the same procedure for the regular block tests as well, ensuring that at least basic testing of LVM and ZFS is performed. Both tests now just loop over a list of volumes, creating a subvolume, a snapshot or copy of said volume and in case of block devices also an export. After that they destroy everything and verify that things have been cleaned up. This would allow future expansion with other filesystems and block devices. |
While I had some time to think, something dawned on me which might be a bit strange and might need some second opinion: Initially, when implementing the backend agnostic fs layer, I decided that fs_pools would simply list all mountpoints of the top volumes under which targetd creates its subvolumes. This is compatible with btrfs as its command takes the mountpath as argument. When I introduced the ZFS backend, I decided to inspect /proc/mounts to deduce which filesystem is in use (and as such it requires a specific subvolume to be mounted at the exact location listed, which works always for ZFS and can be done for btrfs in /etc/fstab too). This does provide information for the ZFS backend to recover the ZFS internal name. This all works, but since ZFS then prefers its internal top-pool/volume/subvolume name over /mnt/zfs-mount/subvolume, the internal name becomes top-pool/zfs-mount/subvolume (or wherever it actually resides). I didn't think much of it at first, but while writing the client test, I noticed that for ZFS, now the pool name does not correspond to the config file defined pool name. In the test setup, this is not really noticeable: /zfs_targetd/fs_pool becomes zfs_targetd/fs_pool. But when admins start mounting /mnt/zfs_targetd/fs_pool, this still becomes zfs_targetd/fs_pool. There are 2 ways to solve this:
|
Well, as going for a walk always seems to have a good influence on the thought process, I've entertained the thought on mapping ZFS datasets transparently by their mount points (noting only the FS part, not Blockdevices). My reasoning (as stated in the commit): nfs_export_add takes full NFS reachable mountpoints. My interpretation of filesystem pools is for them to be exported through nfs_export_add which takes the absolute path destined for /etc/exports. The initial /proc/mounts parsing allows targetd resolving the local (NFS reachable) mountpoints to their intended backends (btrfs/zfs). This means that an administrator can specify ["/local/pool"] in targetd, reference it in configuration for its clients and later transparently swap between btrfs or zfs as they choose. Clients then utilize the full_path to setup any dynamic NFS exports they require. Now just looking which code I overlooked that causes this check to fail. |
@dsonck92 I'm looking at this PR, trying to understand all of what you're saying. I spun up a VM and I created a couple of zfs pools. Shouldn't I be able to specify in the targetd.yaml FYI: I'm not that well versed with ZFS. |
This was the point I was trying to make 2 comments back. While it would be possible to deduce zfs_fs_pool being a zfs pool based on the lack of a leading slash, this might be troublesome in the long run when other filesystems pop up that offer similar features like subvols. However, this would be more in line with "all values supplied inside the targetd.yaml are in fact pool names"
The /proc/mounts has the benefit of mapping NFS names (which currently do not use pool + volume addressing during creation) to managed pools regardless of how the tools interact. In addition, it makes an "fs_pool" actually mean "look at this location, I want to create mounted filesystems here" which, in my opinion, ties in with NFS. Imagine:
But other than that, zfs list does give a list of pools that exist and the backend could try to match this.
Yes, I believe it should be possible to use the same pool for both block devices as well as file systems. As far as zfs is concerned, it can have mixed block devices and filesystems. The only question there is, what to do when they start having conflicting names. While thinking about this part. I do realize that, if it's preferable to go the zfs-style poolnames, then zfs_block_pools could in fact upgrade to zfs_pools allowing it to host both filesystems and block devices. It would then basically either show up twice in the pool list (as both filesystem and block pool) or the API adapts to give multiple values for its type. But I haven't thought about that much, its more of a thought brought up due to your question.
In the end, the major differences between ZFS and btrfs/lvm are:
|
Well, given that the build now succeeds. It's now possible to specify filesystems (which must be mounted) in fs_pools and it will properly deduce whether it's btrfs or zfs and map the names accordingly. Hence: zfs create -o mountpoint=/mnt/zfs_fs_pool zfs_root/fs_pool fs_pools: [/mnt/zfs_fs_pool]
subsequently, it's possible to
I believe, considering the filesystem main use case is NFS exports (as iSCSI relies on block devices) this would be convenient for the end user of targetd. They only need to indicate which paths should be managed, and arrange on their system that either btrfs or zfs is mounted there. That said, it makes use of the fact that zfs subvolumes inherit their parent path from their parent, so they automatically mount at /mnt/zfs_fs_pool/ if the parent dataset is mounted at /mnt/zfs_fs_pool. On an entirely tangential note (which I believe would warrant a new issue to be opened altogether if wanted): the other solution is to make NFS exports in-line with iSCSI exports as noted a few comments back. This would mean that the API takes a pool + volume in addition to a path (which it currently does not for NFS). With this mindset, it would be possible to leverage zfs easy mountpoint management to let targetd manage mountpoints entirely, completely hiding away how the zfs pools are called behind the scenes as far as API clients are concerned. But this would create much more work. Hence why I think the "map zfs volumes through /proc/mounts" is the most convenient, less change requiring method, with the most gain: fs_pools lists mountpoints, easily exportable, and the user can swap zfs/btrfs as they please by simply mounting another filesystem at the given fs_pool locations |
@triluch perhaps you also want to take a look at the solution as its currently standing. |
The nfs handling code of targetd was written in such a way that an administrator could have nfs exports that were not managed by targetd. I haven't had a chance to fully review this PR so I don't know if utilizing As for removing any ambiguity between this fs pool is btrfs or this fs pool is zfs, I would be fine with adding a new designation in the config file to remove that ambiguity, especially if it makes things simpler in the code. |
Well, that's at least good to know for me considering my last comment on #54 as it would make me prefer targetd even more (I rely on /etc/exports not to change). I think I might not have been clear regarding /proc/mounts. In essence what I'm doing with it is:
An added bonus to this /proc/mounts read is, it can very early on error out when it has deduced that the fs_pool path given does not actually map to btrfs or zfs. Without it, btrfs would fail due to it not being able to create the filesystem and snapshot locations but that's essentially already late. The error would indirectly show it was due to it not being btrfs, but with the /proc/mounts check (which is also just done once at start) it actually errors out with "Unsupported filesystem ext4 for pool /var/lib/exportd"
The added complexity to deduce paths through /proc/mounts is pretty minimal. There's a dedicated class for parsing /proc/mounts written that splits out the fields and provides constants to retrieve the names back (could be changed to a returned dict if that's preferred). The mapping of mount -> zfs is a simple dict. Given that mountpoints should be unique on linux, any given key will map unambiguously to a zfs pool. Only a single "zfs_pool = pools_fs[pool]" for each function accepting a single pool was necessary, and for those that go through the list, python offers for pool, zfs_pool in pools_fs.items() Whether that's considered complex by other contributors, I cannot judge, but the end result is less complexity in the configuration file. But perhaps playing around with it locally might shed some light behind my chosen design. I might add that, while btrfs errors out early if it can't create the required subvolumes, if an administrator actually made the directories beforehand, it would not error out at start, but subsequently fail to perform any btrfs tasks when an API call is made. So I think interfacing with a standard linux file (which has been documented) is justified for at least error handling. And in my opinion, when it's already read, why not utilize the additional information stored in there. |
@dsonck92 I'm in the process of writing a more comprehensive unit test against the whole API. Thus far I've found 2 bugs, neither of which have anything to do with this PR. Once I get the unit test done and assuming no issues are found pertaining to the PR we'll get it merged. |
Good, I can rebase on the new unit test when it is done. I think it's then probably better to pull my unit test code out of the change for 2 reasons:
|
On phone they really should put an "are you sure you want to close" message |
And I found another mistake in the NFS support. I'm actually testing this in my private (but production) environment and was hit by that. I was a bit confused as API.md did not mention |
@dsonck92 Unless you would like to squash a bit, I think this is OK to merge and I'll do so tomorrow. I'm planning on expanding the unit tests to increase the code coverage and address anything else that might be present before a new release is made. |
I'll do that, should be done in a few hours |
- The original btrfs filesystem pools use mount path based pool names. ZFS differs from this by using special ZFS pool names starting with the top level ZFS pool name. This unifies the fs_pool configuration to always refer to mount points. Rationale: as filesystem pools defined in targetd are likely to be exported over NFS, and since NFS paths are strongly tied to actual mount paths, it makes sense to specify filesystem pools in terms of their "soon to be exported" paths.
450cb81
to
bc619f1
Compare
@tasleson if you wish, I could combine all the other pull requests I made into this one although they should not conflict when applied individually. |
Adds ZFS support to the filesystem pools.
These changes allow ZFS as a backend for both iSCSI targets as well as NFS targets as discussed in #54.
Closes: #54