Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve QAT issues with incompressible data #7338

Merged
merged 1 commit into from
Mar 30, 2018
Merged

Conversation

tcaputi
Copy link
Contributor

@tcaputi tcaputi commented Mar 23, 2018

Currently, when ZFS wants to accelerate compression with QAT, it
passes a destination buffer of the same size as the source buffer.
Unfortunately, if the data is incompressible, QAT can actually
"compress" the data to be larger than the source buffer. When this
happens, the QAT driver will return a FAILED error code and print
warnings to dmesg. This patch fixes this issue by allocating a
larger temporary buffer for QAT to use before copying the data to
the final destination if it was actually compressed.

Signed-off-by: Tom Caputi tcaputi@datto.com

How Has This Been Tested?

There were concerns about how the extra buffer copy would effect performance. I ran 2 tests, one with incompressible data (from /dev/urandom) and one with compressible data (from the yes executable). Each test wrote to a test dataset with compression=gzip and all other settings at their defaults:

Incompressible data

root@qat-test:~# cat /sys/module/zfs/parameters/zfs_qat_compress_disable
0 
root@qat-test:~# cat /root/urand.txt | pv > /pool/crypt/urand.txt 
1000MiB 0:00:05 [ 199MiB/s] [       <=>                                                     ]
root@qat-test:~# cat /root/urand.txt | pv > /pool/crypt/urand.txt 
1000MiB 0:00:05 [ 194MiB/s] [       <=>                                                     ]
root@qat-test:~# cat /root/urand.txt | pv > /pool/crypt/urand.txt 
1000MiB 0:00:04 [ 204MiB/s] [      <=>                                                      ]
root@qat-test:~# cat /root/urand.txt | pv > /pool/crypt/urand.txt 
1000MiB 0:00:04 [ 204MiB/s] [      <=>                                                      ]
root@qat-test:~# 
root@qat-test:~# 
root@qat-test:~# 
root@qat-test:~# echo 1 > /sys/module/zfs/parameters/zfs_qat_compress_disable                
root@qat-test:~# cat /root/urand.txt | pv > /pool/crypt/urand.txt 
1000MiB 0:00:06 [ 160MiB/s] [        <=>                                                    ]
root@qat-test:~# cat /root/urand.txt | pv > /pool/crypt/urand.txt 
1000MiB 0:00:05 [ 167MiB/s] [       <=>                                                     ]
root@qat-test:~# cat /root/urand.txt | pv > /pool/crypt/urand.txt 
1000MiB 0:00:06 [ 161MiB/s] [        <=>                                                    ]
root@qat-test:~# cat /root/urand.txt | pv > /pool/crypt/urand.txt 
1000MiB 0:00:06 [ 158MiB/s] [        <=>                                                    ]
root@qat-test:~# cat /root/urand.txt | pv > /pool/crypt/urand.txt 
1000MiB 0:00:06 [ 160MiB/s] [        <=>                                                    ]
root@qat-test:~# 

During this test, CPU usage for EACH of the z_wr_iss threads was routinely at about 70% without QAT and <1% with QAT.

Compressible data

root@qat-test:~# cat /sys/module/zfs/parameters/zfs_qat_compress_disable                     
0
root@qat-test:~# cat /root/yes.txt | pv > /pool/crypt/yes.txt                                
1000MiB 0:00:05 [ 199MiB/s] [       <=>                                                     ]
root@qat-test:~# cat /root/yes.txt | pv > /pool/crypt/yes.txt 
1000MiB 0:00:04 [ 206MiB/s] [      <=>                                                      ]
root@qat-test:~# cat /root/yes.txt | pv > /pool/crypt/yes.txt 
1000MiB 0:00:04 [ 201MiB/s] [      <=>                                                      ]
root@qat-test:~# cat /root/yes.txt | pv > /pool/crypt/yes.txt 
1000MiB 0:00:04 [ 207MiB/s] [      <=>                                                      ]
root@qat-test:~# cat /root/yes.txt | pv > /pool/crypt/yes.txt 
1000MiB 0:00:04 [ 200MiB/s] [      <=>                                                      ]
root@qat-test:~# cat /root/yes.txt | pv > /pool/crypt/yes.txt 
1000MiB 0:00:05 [ 196MiB/s] [       <=>                                                     ]
root@qat-test:~# 
root@qat-test:~# 
root@qat-test:~# 
root@qat-test:~# echo 1 > /sys/module/zfs/parameters/zfs_qat_compress_disable                
root@qat-test:~# cat /root/yes.txt | pv > /pool/crypt/yes.txt                                
1000MiB 0:00:05 [ 184MiB/s] [       <=>                                                     ]
root@qat-test:~# cat /root/yes.txt | pv > /pool/crypt/yes.txt 
1000MiB 0:00:05 [ 192MiB/s] [       <=>                                                     ]
root@qat-test:~# cat /root/yes.txt | pv > /pool/crypt/yes.txt 
1000MiB 0:00:05 [ 189MiB/s] [       <=>                                                     ]
root@qat-test:~# cat /root/yes.txt | pv > /pool/crypt/yes.txt 
1000MiB 0:00:05 [ 190MiB/s] [       <=>                                                     ]
root@qat-test:~# cat /root/yes.txt | pv > /pool/crypt/yes.txt 
1000MiB 0:00:05 [ 188MiB/s] [       <=>                                                     ]

During this test, CPU usage for each of the z_wr_iss threads was routinely at about 20% without QAT and <1% with QAT.

TL:DR: It looks like even with this change the benefits of

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • All commit messages are properly formatted and contain Signed-off-by.
  • Change has been approved by a ZFS on Linux member.

@tcaputi
Copy link
Contributor Author

tcaputi commented Mar 23, 2018

@wli5 Would you mind taking a look at this when you get a chance?

@wli5
Copy link
Contributor

wli5 commented Mar 26, 2018

I think it is not efficient as the memory copy is always needed.

A better way might be to allocate a temporary "addition" buffer (add_start), and send it to qat_compress as separate parameter, e.g.:
qat_compress(QAT_COMPRESS, s_start, s_len, d_start, d_len, add_start, add_len, &dstlen)
in function qat_compress, we construct destination sg-list buffer which contains the "d_start" and "add_start", for most cases, the addition buffer will not be used, if it is used, we just skip compression and return the source buffer, there is no memory copy anymore for both compressible and incompressible data.

Please note, the dest buffer allocated by ZFS is 87.5% of source buffer size:

/* Compress at least 12.5% */
d_len = s_len - (s_len >> 3);

So, if the data is compressed to less than 87.5% of source data, we return the compressed buffer. otherwise, we just delete the temp addition buffer, and return the source buffer (no-compress).
In theory, the worst expand rate is 1.125x of source size, so if we allocate addition buffer with same size as dest buffer, the total dest size will be 87.5% x 2 = 1.75x of source buffer, it is big enough to cover the worst expansion case.

@tcaputi
Copy link
Contributor Author

tcaputi commented Mar 26, 2018

@wli5 The PR has been updated. Is this close to what you had in mind? Tested with compressible and incompressible data and there was a slight improvement to average throughput for compressible data when compared to the previous iteration (210MB/s -> 214MB/s).

if (dstlen < d_len) {
return ((size_t)dstlen);
} else {
if (d_len == s_len)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check can be removed.
If the QAT returns success, but the output length (dstLen) is larger or equal to the d_len (87.5% of source length), we just copy the source buffer to dest buffer and return the source length, so ZFS knows the data is not compressed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this to match the check below ikn the software implementation. When I didn't have it I ended up getting GPFs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, it's different option - we can do the overflow check inside qat_compress function, that is what you've done, that's fine.
Another option I suggested is, not check inside the function, but check the output length in the calling side to decide if we skip the compression (e.g., copy source buffer to dest).
All can be working. Please keep your current implementation if you like, thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, can you please add one comment in the function, saying if it overflow the dest length will be set to the source length ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Ill add the comment.

@@ -379,6 +413,7 @@ qat_compress(qat_compress_dir_t dir, char *src, int src_len,

compressed_sz = dc_results.produced;
if (compressed_sz + hdr_sz + ZLIB_FOOT_SZ > dst_len) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check tries to make sure there is enough space in dest buffer for the compressed data and gzip head + foot, "add_len" should be added. Actually I'm now thinking maybe we can remove this check safely, as adding the addition buffer, the dest size is large enough to contain all output in theory, it is 1.75x of source buffer now!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd rather keep this in in case this code gets used in other contexts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add add_len

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to do that because in my comment I say that the add_buf is a scratch buffer that can be discarded after the function is done, and if we leak into the add_buf we know we're over the useful limit anyway.

QAT_PHYS_CONTIG_FREE(buffer_meta_src);
QAT_PHYS_CONTIG_FREE(buffer_meta_dst);
QAT_PHYS_CONTIG_FREE(buf_list_src);
QAT_PHYS_CONTIG_FREE(buf_list_dst);

return (ret);
return (status);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"status" initial value is success:
CpaStatus status = CPA_STATUS_SUCCESS
some fail branch will return success.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here was that CPA_SATUS_FAIL would indicate a real hardware failure whereas a src_len == dst_len would indicate that compression simply wasn't worth it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously the function could return -1 but the calling code would check that it was equal to CPA_STATUS_SUCCESS

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - I see, but for some fail cases, it will return SUCCESS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense. Perhaps @behlendorf has a better idea of how we should convey this meaning?

void *add_buf = zio_data_buf_alloc(add_buf_len);

ret = qat_compress(QAT_COMPRESS, s_start, s_len, d_start,
d_len, add_buf, add_buf_len, &dstlen);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the caller never needs to access this additonal space it's aukward to force it to do the allocation. It really shouldn't need to worry about this when it can all be handled internally in the qat code. How about renaming qat_compress() to qat_compress_impl() and then have qat_compress() be a wrapper function which handles allocating the needed memory for QAT_COMPRESS before calling qat_compress_impl().

This would then allow you to move the more involved status checking about success/failure/compressed size all in to the wrapper function. Callers qat_compress() wouldn't then need to deal with it at all and could using the existing interface.

@tcaputi tcaputi changed the title Prevent QAT warnings with incompressible data Resolve QAT issues with incompressible data Mar 28, 2018
@tcaputi
Copy link
Contributor Author

tcaputi commented Mar 28, 2018

@wli5 and @behlendorf Changes have been implemented as requested.

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment here is a little bit misleading. Actually a d_len 12.% smaller than the s_len is passed to qat_compress().

Thanks for refactoring this to make the code clearer. I can't test this locally but it looks right to me from inspection.

return ((size_t)dstlen);
/* if hardware compress fail, do it again with software */
} else if (ret == CPA_STATUS_INCOMPRESSIBLE) {
if (d_len != s_len)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is always true today because of how zio_compress_data calls the compression function. But I agree we should keep so if we make the shift configurable as module option it will work. The same dead conditional exists in the software compression case so at least it's consistent.

return (status);
}

/* Entry point for QAT accelerated compression / decompression. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: function header block comments should all be multi-line even they fit on one line.

/*
 *  Entry point for QAT accelerated compression / decompression
 */

Copy link
Contributor

@wli5 wli5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks for making the changes!

@codecov
Copy link

codecov bot commented Mar 29, 2018

Codecov Report

Merging #7338 into master will decrease coverage by 0.04%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7338      +/-   ##
==========================================
- Coverage   76.35%   76.31%   -0.05%     
==========================================
  Files         329      329              
  Lines      104191   104189       -2     
==========================================
- Hits        79560    79510      -50     
- Misses      24631    24679      +48
Flag Coverage Δ
#kernel 76.07% <ø> (-0.19%) ⬇️
#user 65.5% <ø> (-0.22%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 13a2ff2...7cffa6b. Read the comment docs.

Currently, when ZFS wants to accelerate compression with QAT, it
passes a destination buffer of the same size as the source buffer.
Unfortunately, if the data is incompressible, QAT can actually
"compress" the data to be larger than the source buffer. When this
happens, the QAT driver will return a FAILED error code and print
warnings to dmesg. This patch fixes these issues by providing the
QAT driver with an additional buffer to work with so that even
completely incompressible source data will not cause an overflow.

This patch also resolves an error handling issue where
incompressible data attempts compression twice: once by QAT and
once in software. To fix this issue, a new (and fake) error code
CPA_STATUS_INOMPRESSIBLE has been added so that the calling code
can correctly account for the difference between a hardware
failure and data that simply cannot be compressed.

Signed-off-by: Tom Caputi <tcaputi@datto.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants