-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZFS Encryption #5769
ZFS Encryption #5769
Conversation
@tcaputi, thanks for your PR! By analyzing the history of the files in this pull request, we identified @behlendorf, @grwilson and @ahrens to be potential reviewers. |
Several updates have been made to the code since it was moved (thanks to @ahrens for the review thus far). These are the highlights:
These changes do constitute an on-disk format change, so anyone updating from an earlier version of the PR will have to recreate their pool. |
6cf566f
to
ae18e6d
Compare
Thanks to everyone so far 👍
For clarity (for any newcomer to this PR), from https://github.com/tcaputi/zfs/blob/ae18e6d0a8c1b32991094594d1b751646bc5b4ca/man/man8/zfs.8 with added emphasis:
(The first – a definition of user data – includes some things that might otherwise be viewed as metadata.) Also from the opening of this PR:
– could be updated to reflect the split that is highlighted in the first (human) comment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation: first pass …
man/man5/zpool-features.5
Outdated
accessing the data allow for it. Deduplication with encryption will leak | ||
information about which blocks are equivalent in a dataset and will incur an | ||
extra CPU cost per block written. | ||
This feature enables the creation and management of natively encrypted datasets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As details of the feature are more in the manual page for zfs(8)
than in the page for zpool(8)
, so the SEE ALSO part of this zpool-features(5)
page should be expanded to include zfs(8)
.
man/man8/zfs.8
Outdated
Selecting \fBencryption\fR=\fBon\fR when creating a dataset indicates that the | ||
default encryption suite will be selected, which is currently \fBaes-256-ccm\fR. | ||
In order to provide consistent data protection, encryption must be specified at | ||
dataset creation time and it cannot be changed afterwards. | ||
.sp | ||
For more details and caveats about encryption see \fBzpool-features\fR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SEE ALSO part of this zfs(8)
manual page should be expanded to include zpool-features(5)
.
man/man5/zpool-features.5
Outdated
accessing the data allow for it. Deduplication with encryption will leak | ||
information about which blocks are equivalent in a dataset and will incur an | ||
extra CPU cost per block written. | ||
This feature enables the creation and management of natively encrypted datasets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basics.
Can creation of a pool comprise a single encrypted dataset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This functions the same way as any other zfs property (compression, deduplication, etc)
c4daa03
to
8594e08
Compare
Fantastic work 👍 . I'm curious, how far would you say are you from publishing the changes from this PR? A rough estimate would do, say: weeks, months, Q2, Q3? |
include/sys/dsl_dataset.h
Outdated
@@ -245,26 +247,35 @@ dsl_dataset_phys(dsl_dataset_t *ds) | |||
#define DS_UNIQUE_IS_ACCURATE(ds) \ | |||
((dsl_dataset_phys(ds)->ds_flags & DS_FLAG_UNIQUE_ACCURATE) != 0) | |||
|
|||
/* flags for holding the dataset */ | |||
#define DS_HOLD_FLAG_DECRYPT (1 << 0) /* needs access encrypted data */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets use an enum for this, so that it's more clear what values the function argument can have. (Unfortunately there's a bad example set by some other flags - cleanup has been on my to-do list for a while now.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good to know. will fix
module/zfs/dsl_dir.c
Outdated
if (dd == NULL) { | ||
dsl_dir_t *winner; | ||
|
||
dd = kmem_zalloc(sizeof (dsl_dir_t), KM_SLEEP); | ||
dd->dd_object = ddobj; | ||
dd->dd_dbuf = dbuf; | ||
dd->dd_pool = dp; | ||
|
||
if (doi.doi_type == DMU_OTN_ZAP_METADATA && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use dsl_dir_is_zapified(), instead. (at a cost of maybe doing dmu_object_info twice - shouldn't be a big deal)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will fix
module/zfs/dsl_dir.c
Outdated
|
||
/* | ||
* encrypted datasets can only be moved if they are | ||
* an encryption root (locally set keyformat). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we allow that? Seems like a common use case would be "encrypt all datasets below this the same way". "whoops, named a child dataset wrong, rename it". As long as the new location is under the same encryption root, I think we can allow the rename. Would that be hard to implement?
I thought there was a restriction that an unencrypted dataset couldn't be under an encrypted dataset. Doesn't that need to be enforced here?
I didn't see either of these restrictions in the manpages. Seems like they should at least be in the zfs rename
section, and maybe also the Encryption
section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we allow that? Seems like a common use case would be "encrypt all datasets below this the same way". "whoops, named a child dataset wrong, rename it"
To clarify, renaming an encryption child is ok. Moving to a new parent is not currently.
As long as the new location is under the same encryption root, I think we can allow the rename. Would that be hard to implement?
I think I can implement it like that. Shouldn't be a big deal.
I thought there was a restriction that an unencrypted dataset couldn't be under an encrypted dataset. Doesn't that need to be enforced here?
Whoops. It looks like I must've rebased that away at some point. I will put it back.
I didn't see either of these restrictions in the manpages. Seems like they should at least be in the zfs rename section, and maybe also the Encryption section.
I will make sure this is documented in both places (I think there is a sentence about it in the encryption section).
man/man8/zfs.8
Outdated
locally, the dataset is an encryption root. Encryption roots share their keys | ||
with all datasets that inherit this property. This means that when a key is | ||
loaded for the encryption root, all children that inherit the \fBkeylocation\fR | ||
property are automatically loaded as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about the key for all children ... is automatically loaded as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will fix
man/man8/zfs.8
Outdated
.sp | ||
Even though the encryption suite cannot be changed after dataset creation, the | ||
keylocation can be with \fBzfs change-key\fR. If the dataset is an encryption | ||
root this property may also be changed with \fBzfs set\fR. If \fBprompt\fR is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not an encryption root, can the key be changed with zfs change-key
? The current wording implies it can, by calling out the restriction only for zfs set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct. non-encryption roots can use zfs change-key
, which will make them an encryption root. I will add a sentence to the man page to reflect this.
module/zfs/zfs_ioctl.c
Outdated
if (ret != 0) | ||
goto error; | ||
|
||
return (0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like this (success case) would work right if it fell through to the error
label.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change it. I personally like separate error labels that don't share code.
module/zfs/zil.c
Outdated
* the keys may not be loaded. | ||
*/ | ||
if (wbuf == NULL && BP_IS_ENCRYPTED(bp)) | ||
zio_flags |= ZIO_FLAG_RAW; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we change this to pass RAW
even if not encrypted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why not. I just didn't want to impact existing functionality. I'll run some tests to make sure that works and make the change.
module/zfs/zio.c
Outdated
BP_SET_ENCRYPTED(bp, B_TRUE); | ||
} | ||
|
||
/* Perform the encryption. This should not fail */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use a more precise word than "should"? You're gracefully handling failure just below, so presumably it does fail in some cases... Maybe This will fail for block types that are partially encrypted (dnode, zil) if no encryption is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change the "should". I am not gracefully handling errors. The only acceptable return codes here are 0
and ZIO_NO_ENCRYPTION_NEEDED
(see the assertion below).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - I wasn't sure if ZIO_NO_ENCRYPTION_NEEDED was considered an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, its a special return code to indicate that the block didn't need to be encrypted for some reason (for instance a dnode block with no encryptable bonus buffers).
module/zfs/zio_checksum.c
Outdated
* evenly across the bits of the checksum. Therefore, | ||
* when truncating a weak checksum we XOR the first | ||
* 2 words with the last 2 so that we don't "lose" any | ||
* entropy unnecessarily. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any downside to doing this for all checksum types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not aware of any, but I know all secure checksums support being truncated without issue whereas there may be weaknesses for XOR folding.
module/zfs/zio_crypt.c
Outdated
uint32_t *iv2 = (uint32_t *)(iv + sizeof (uint64_t)); | ||
|
||
ASSERT(BP_IS_ENCRYPTED(bp)); | ||
bp->blk_dva[2].dva_word[0] = LE_64(*((uint64_t *)salt)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This trick doesn't work on processors which don't support unaligned memory access. You'll need to do it byte at a time (if it's unaligned, at least - but if you implement it both ways, make sure we can test hitting both code paths).
Same goes for the following functions.
See zap_leaf_array_read()
for an example (the la_array
is not aligned).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will fix.
Thanks for updating the PR, will build out a new test mule this week. |
There are no restrictions, but I would argue that there is also nothing that would prevent them from running a |
afee592
to
2284451
Compare
84eeb12
to
5b21838
Compare
module/zfs/arc.c
Outdated
* | ||
* The L1ARC has a slightly different system for storing encrypted data. | ||
* Raw (encrypted + possibly compressed) data has a few subtle differences from | ||
* data that is just compressed. The biggest difference is that it is not always |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can delete always
, because it is not possible to decrypt ... if keys aren't loaded
(right?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will fix
* may have both an encrypted version and a decrypted version of its data at | ||
* once. When a caller needs a raw arc_buf_t, it is allocated and the data is | ||
* copied out of this header. To avoid complications with b_pabd, raw buffers | ||
* cannot be shared. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure where is the best place to explain this in the code, but at least for my own edification:
When would we have both encrypted and unencrypted data in the ARC? If we scrub when the keys are not loaded, we'll have encrypted dnode blocks, I think? And once we have encrypted send, that could cache encrypted user data in the ARC? And if we later load the keys and do an unencrypted read, it would decrypt from b_rabd to b_abd.
Once we have both b_rabd and b_abd, do we ever evict one of them, to reduce memory usage? Or are we stuck with 2x memory usage until we happen to evict the whole hdr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would we have both encrypted and unencrypted data in the ARC? If we scrub when the keys are not loaded, we'll have encrypted dnode blocks, I think? And once we have encrypted send, that could cache encrypted user data in the ARC? And if we later load the keys and do an unencrypted read, it would decrypt from b_rabd to b_abd.
All correct.
Once we have both b_rabd and b_abd, do we ever evict one of them, to reduce memory usage? Or are we stuck with 2x memory usage until we happen to evict the whole hdr?
If we have both we evict the encrypted one as soon as all encrypted buffers are destroyed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure where is the best place to explain this in the code...
There is actually already a comment for this in arc_buf_destroy_impl()
(where the header freeing takes place).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK cool, I haven't got to reading that part of the code yet, it's a lot to fit in my brain at once :-)
module/zfs/arc.c
Outdated
@@ -817,6 +835,8 @@ static arc_state_t *arc_l2c_only; | |||
#define arc_need_free ARCSTAT(arcstat_need_free) /* bytes to be freed */ | |||
#define arc_sys_free ARCSTAT(arcstat_sys_free) /* target system free bytes */ | |||
|
|||
/* encrypted + compressed size of entire arc */ | |||
#define arc_raw_size ARCSTAT(arcstat_raw_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Encryption doesn't change the size of the data, so how is this different from arc_compressed_size
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this just gives an extra stat for how much raw encrypted data is stored in the ARC. I thought it would be useful seeing as how this memory operates separately from compressed memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it's the sum of the size of the b_rabd
s? Could you update the comment to say that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will fix
include/sys/arc.h
Outdated
@@ -112,20 +123,22 @@ typedef enum arc_flags | |||
ARC_FLAG_L2_WRITING = 1 << 11, /* write in progress */ | |||
ARC_FLAG_L2_EVICTED = 1 << 12, /* evicted during I/O */ | |||
ARC_FLAG_L2_WRITE_HEAD = 1 << 13, /* head of write list */ | |||
/* encrypted on disk (may or may not be encrypted in memory) */ | |||
ARC_FLAG_ENCRYPT = 1 << 14, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about ARC_FLAG_ENCRYPTED
(and HDR_ENCRYPTED()
), since this is not being used as a verb.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. will fix.
include/sys/arc_impl.h
Outdated
/* encryption parameters */ | ||
uint8_t b_salt[DATA_SALT_LEN]; | ||
uint8_t b_iv[DATA_IV_LEN]; | ||
uint8_t b_mac[DATA_MAC_LEN]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
b_mac
seems different from the other encryption parameters, because it's used not when writing to L2ARC, but rather when decrypting an encrypted hdr. In that case I think the caller could pass in the BP and we could get the mac there. It might be worth explaining that we keep the mac here for convenience in this sake, so the caller doesn't have to pass in the BP again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting point. it looks like the only place this is really used is in arc_hdr_decrypt()
. I will add that comment. I could probably also assert that the mac in l2arc_apply_transforms()
matches the one in the header.
* pool. When this is the case, we must first compress it if it is | ||
* compressed on the main pool before we can validate the checksum. | ||
*/ | ||
if (!HDR_COMPRESSION_ENABLED(hdr) && compress != ZIO_COMPRESS_OFF) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this code moved somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, see comment below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which comment specifically?
I think this code is now assuming that we write blocks to the L2ARC exactly as they are in the main pool, including compressed, unencrypted blocks, even if the b_pabd is uncompressed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. I'm sorry I thought this was a different piece of code. You are correct, this change allows the L2ARC to write out data as it is in the main pool, removing the need for this step.
@@ -1652,15 +1701,14 @@ arc_hdr_set_compress(arc_buf_hdr_t *hdr, enum zio_compress cmp) | |||
*/ | |||
if (!zfs_compressed_arc_enabled || HDR_GET_PSIZE(hdr) == 0) { | |||
arc_hdr_clear_flags(hdr, ARC_FLAG_COMPRESSED_ARC); | |||
HDR_SET_COMPRESS(hdr, ZIO_COMPRESS_OFF); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an intentional change to now set the hdr's compression algorithm even if it is not compressed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Previously, disabled arc compression worked by simply not storing the compression at all. This change allows us to keep track of the compression algorithm for raw blocks even if arc compression is disabled.
module/zfs/arc.c
Outdated
* This function will take a header that only has raw encrypted data in | ||
* b_crypt_hdr.b_rabd and decrypts it into a new buffer which is stored in | ||
* b_l1hdr.b_pabd. If designated on the header, this function will also | ||
* decompress the data as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove as well
(already has also
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will fix
module/zfs/arc.c
Outdated
} | ||
|
||
if (HDR_GET_COMPRESS(hdr) != ZIO_COMPRESS_OFF && | ||
!HDR_COMPRESSION_ENABLED(hdr)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could use a comment explaining this case. I think it's saying that the data on disk is compressed, and therefore the raw data in b_rabd is compressed, but we don't want this hdr to be stored compressed (i.e. b_pabd should not be compressed), so we need to decompress it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is the correct interpretation. I will add a comment.
tmp = abd_borrow_buf(cabd, arc_hdr_size(hdr)); | ||
|
||
ret = zio_decompress_data(HDR_GET_COMPRESS(hdr), | ||
hdr->b_l1hdr.b_pabd, tmp, HDR_GET_PSIZE(hdr), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's possible this is used in a performance-sensitive code path, it would be good to avoid the copy from b_pabd to linearize it for decompression. This could be avoided by allocating b_pabd as linear, before decrypting into it. (And then replacing b_pabd with the decompressed scatter ABD, as you're already doing here.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only issue with that is that then we will not be honoring zfs_abd_scatter_enabled
. I do not think this is an incredibly performance-sensitive path either, since this will realistically only be run in 2 circumstances: decrypting dnode blocks and reading data that is already in the ARC due to a scrub.
Let me know what you think and I can change it if you still feel it is appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could still honor zfs_abd_scatter_enabled
because the data stored in b_pabd
would be scatter (or not). It's only the intermediary, short-lived buffer (decrypted but not yet uncompressed) that would be linear.
But it does seem like not a super common code path so I could go either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean now... This might be a bit messy to do since arc_hdr_alloc_abd()
cannot currently decide how it would like its abd to look. Unfortunately the code for that is a few functions up the stack. I'm inclined to leave it for now unless it becomes a problem.
Oh I should mention, in the IllumOS patches, before I included your commits yesterday, when we did a
But it could be an old bug now, but if it is trivial for you to set lz4, and test |
Further to @lundman 's comment, we have openzfsonosx/zfs#557 |
@lundman I dont know what version you were on before, but the problem seems to be fixed now. |
Woo \o/ |
@sempervictus |
@teknoman117 The issue does not seem to reproduce on Illumos, so we must have changed something at some point. I hope that I can get to the bottom of it tomorrow. |
@teknoman117 I figured it out. The problem is actually just with your patch. You need this in the patch as well:
Let me know if I'm missing something or if you need anything else. |
My assumption to leave that guard in was based on this comment in aes_impl.c That being said, this does indeed seem to let the generic c implementation load a dataset encrypted using the the aesni or x86_64 optimized method. However, removing the guard completely breaks the accelerated methods. I suppose this means that for x86_64 the byte swap only is only needed for the generic implementation. |
My question is .. why is this not stable yet ? I need to run zfs-linux-git on Arch Linux ... sort of unofficially .. is it testing what's keeping this from stable ? |
@jbrodriguez It's not stable yet because it's a Hard Problem, and because the developers have run into some issues, most notably #6845. I don't think you really ought to complain unless you're paying the OpenZFS developers... and if you are you probably have internal channels for that :) |
Thanks, I don't know how much you officially speak for @grantwwu, I was just asking a question. Nevertheless, thanks for pointing out the issue, I'll subscribe to it. |
Yeah. I think that comment was meant to be taken in the context of the entire file. In other words, I think the
The 3 big reasons:
|
Thanks @tcaputi, the major version comment is spot on. I'm looking to start from scratch, so I may have overlooked some considerations. Can't praise enough the work you've done here, kudos. |
This change incorporates three major pieces: The first change is a keystore that manages wrapping and encryption keys for encrypted datasets. These commands mostly involve manipulating the new DSL Crypto Key ZAP Objects that live in the MOS. Each encrypted dataset has its own DSL Crypto Key that is protected with a user's key. This level of indirection allows users to change their keys without re-encrypting their entire datasets. The change implements the new subcommands "zfs load-key", "zfs unload-key" and "zfs change-key" which allow the user to manage their encryption keys and settings. In addition, several new flags and properties have been added to allow dataset creation and to make mounting and unmounting more convenient. The second piece of this patch provides the ability to encrypt, decyrpt, and authenticate protected datasets. Each object set maintains a Merkel tree of Message Authentication Codes that protect the lower layers, similarly to how checksums are maintained. This part impacts the zio layer, which handles the actual encryption and generation of MACs, as well as the ARC and DMU, which need to be able to handle encrypted buffers and protected data. The last addition is the ability to do raw, encrypted sends and receives. The idea here is to send raw encrypted and compressed data and receive it exactly as is on a backup system. This means that the dataset on the receiving system is protected using the same user key that is in use on the sending side. By doing so, datasets can be efficiently backed up to an untrusted system without fear of data being compromised. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes openzfs#494 Closes openzfs#5769
This looks awesome, congrats! Is there a guide / tutorial anywhere on how to set this up? Can be integrated with an existing LUKS encrypted partition setup? (How?) Thanks for helping out a newb! |
Manpages show usage, luks is just the header for most dm-crypt setups, and dm-crypt CAN be used under it to protect its unencrypted metadata at the overhead of a whole encrypt/decrypt round under the zfs impl. |
Which manpage, for LUKS or ZFS? (And if ZFS which command?). I might not have been super clear. It's my understanding that "ZFS on root" (i.e. ZFS managing the entire drive) doesn't work well on Linux, so what people do is they create a ZFS partition. That's what this guide does and that's what I've been following. In that guide a partition is created for But now that zfsonlinux has encryption, what's the best thing to do, and how? |
I'm not sure if this is the best thing to do, but on my laptop I've been using a small fat32 partition for |
I have an experimental version of the HOWTO for Debian that builds 0.8.0 from experimental: I haven't updated that since the rc4 package. The final 0.8.0 will likely be in experimental soon, at which point I'll update that again. I'm not sure how much I want to publicize this, though, as a hand-built backport isn't necessarily the best idea. |
LUKS is a Linux component, zfs is not, they share zero code, and one is a header for full block encryption of a device while the other is data encryption inside a filesystem. They're not related by any direct linkage. |
@rlaager wrote:
Wow, thanks for that. It seems like it could be useful as a reference, but also seems quite complicated. I'm wondering if there's a simpler way? Currently exploring the following approach (with Fedora) for a VPS setup:
Is that doable? Would it be simpler? (anyone know of any guides for that?) |
I should have mentioned in my previous comments, one of the reasons I can't follow that approach is that, as mentioned in that link, you don't always have the required RAM available to use a Live CD (as with a VPS), but also my web host does not offer a Fedora Live CD, so even if I had the RAM, I'd be forced to use the "anaconda" troublshooting environment which doesn't even have a working package manager and is a nightmare to use. |
We both use accelerated encryption implementations, but due to licensing restructions they are separate implementations. |
Is this still true now that SIMD support is cut upstream due to GPL-only export of FPU functions? 4.14+ have that commit applied nowadays since it got back ported to 4.14.120 |
@taoeffect I run ZFS as the root file system on my Linux machine. There's a small fat32 partition at the beginning which uses an efistub kernel and the zfs "pool" is the second partition which contains datasets for /, /home, and some others. |
If you have that commit then yes SIMD support is disabled. We aer still working on figuring out a workaround there. |
We build kernels in-house with zfs as a built-in to the main kernel image (avoids certain instrumentation passes for modules at runtime), which permits us the opportunity to simply revert that commit. Most of our clients cannot do this, and folks are talking about moving to 0.8 this year already due to crypto being available. Im a bit worried that more will stop using zfs for being too slow - we've had a few drop in the last year or so because modern storage hw is much faster than the zfs io pipeline, kpti smashed even more bs into the execution of the "slow" fs code paths, and with no SIMD this is just going to get worse and worse. |
Based on what little I saw, I must, unfortunately. agree. You wouldn't happen to have a link to the issue in question, would you? At this time, it looks to me like the Linux Kernel project is effectively controlled by zealots. I see this as the first step in a "whack-a-mole" process (with aggressive back-porting) to cripple ZoL in favor of the broken BTRFS. As with any religious war, the only thing that matters is the religion; facts are irrelevant. I don't believe this is a war that ZoL can win. With that in mind, I'm in the process of creating a message to all my ZoL customers, suggesting that they consider migrating their file servers from Linux/ZoL to FreeBSD/OpenZFS. |
If need be, as the five and dime shop we are, we'll figure out how to push our own kernel build (+patches) into our archlinux setup. We used to use Oracle Solaris 11 11/11 for its ZFS encryption capabilities. Switched to dm-crypt and jumped in blindfolded into linux zfs encryption as soon as tom caputi and team made it available. We're not going back. Only forward. |
@teknoman117 wrote:
You and @benley both seem to be doing this. However, every tutorial I have seen for ZFS on root is insanely complicated, and also does not seem to always work on VPS setups with only 2GB of RAM because of the LiveCD requirements. I wish deeply that ZFS were the default on Linux, and that it were simple to setup. That unfortunately is not the reality. Also, I have not seen any comparisons between the latest ZFS encryption stuff and ZFS on LUKS. It would be useful to know what the tradeoffs are in terms of both ease-of-use, security, and performance. |
I'm not really one to get into the politics of this whole situation, but I will say that despite the statement from some of the Linux maintainers, nothing has really changed in this regard. The Linux kernel has always had a set of functions and symbols that it exports in one release and stops exporting in the next and its internal APIs are always changing. As a "third party" module, we have always had to adapt our software to these changes and we must always write code that can support past, present, and future versions as best as possible. Every time a new kernel comes out, our testing builders fail and within a few days / weeks someone takes a look and fixes any new incompatibilities that have arisen. This particular issue is a bit trickier to work around, but I don't believe we are fundamentally incapable of dealing with it as we have in the past. |
@pashford: I wouldn't send that email out yet, likely not to work. ZoL has the accounting feature which, iirc, prevents use on non-Linux systems. This is why we started making all new pools with that feature disabled, then patched zfs to make disabled the default. @taoeffect: livecd is squashfs mapped to ram, not much to do with zfs. Far as dm-crypt vs ANY filesystem crypto, you probably want to get a firm understanding of what block data and file data are relative to each other, because you keep asking questions indicative of a thought process which does not account for the differences in the logical tiers of storage required to provide a user with access to a file (encrypted or otherwise). Far as performance goes, see above, Linux is actively trying to degrade our performance for by making the relevant interface GPL-only. So performance where? On 4.14.119 or today? What are you trying to use zfs for from a business-logic perspective, and why not pay the hosting company for the proper amount of resources? |
Per @kpande's request, I've opened up two discussion threads on the mailing list:
(BTW, this is amazing mailing list software! Never seen it before!) But to quickly answer your questions @sempervictus:
I mean Linux in general. Whatever the latest kernel is. I doubt people will stay on old kernels for long due to the constant security problems that are being created, found, fixed, and re-created.
A 2GB VPS seems like a reasonable target for ZFS, no? Feel free to post any replies via the mailing list! |
So far as 2G goes, really depends on the workload. ARC usually takes half your ram, so if the kernel and userspace are under a gig, you're OK till you run software. If you have a large pool with a bunch of dedup blocks, your DDTs will take up even more. Swap and ARC/DDTs don't mix well... |
Native encryption in zfsonlinux (See issue #494)
The change incorporates 2 major pieces:
The first feature is a keystore that manages wrapping and encryption keys for encrypted datasets. The commands are similar to that of Solaris but with a few key enhancements to make it more predictable, more consistent, and require less manual maintenance. It is fully integrated with the existing
zfs create
functions andzfs clone
functions. It also exposes a new set of commands viazfs key
for managing the keystore. For more info on the issues with the Solaris implementation see my comments here and here. The keystore operates on a few rules.keylocation
orkeyformat
while creating a dataset causes the dataset to become the root of an encryption tree.The second feature is the actual data and metadata encryption. All user data in an encrypted dataset is stored encrypted on-disk. User-provided metadata is also encrypted, but metadata structures have been left plain so that scrubbing and resilvering still works without the keys loaded. The design was originallly inspired by this article but has been changed fairly significantly since.
Implementation details that should be looked at
key_mapping_t
duringdsl_dataset_tryown()
. I added a flag to this function for code that wishes to own the dataset, but that does not require encrypted data, such as the scrub functions. I did my best to confirm that all owners set this flag correctly, but someone should confirm them, just to be sure.zfs send
andzfs recv
do not currently do anything special with regards to encryption. The format of the send file has not changed and zfs send requires the keys to be loaded in order to work. At some point there should probably be a way to do raw sends.lzc_create() and lzc_clone()
to support hidden arguments. I understand that the purpose of libzfs_core is to have a stable api interacting with the ZFS ioctls. However, these functions need to accept wrapping keys separately from the rest of their parameters because they need to use the (new) hidden_args framework to support hiding arguments from the logs. Without this, the wrapping keys would get printed to the zpool history.EDIT 5/4/16: Updated to reflect the current state of the PR
EDIT 1/3/17: Updated to reflect the current state of the PR
EDIT 2/9/17: reopened under a new PR due to long Github load times (previous PR was #4329)