Resolve QAT issues with incompressible data #7338

tcaputi · 2018-03-23T18:24:07Z

Currently, when ZFS wants to accelerate compression with QAT, it
passes a destination buffer of the same size as the source buffer.
Unfortunately, if the data is incompressible, QAT can actually
"compress" the data to be larger than the source buffer. When this
happens, the QAT driver will return a FAILED error code and print
warnings to dmesg. This patch fixes this issue by allocating a
larger temporary buffer for QAT to use before copying the data to
the final destination if it was actually compressed.

Signed-off-by: Tom Caputi tcaputi@datto.com

How Has This Been Tested?

There were concerns about how the extra buffer copy would effect performance. I ran 2 tests, one with incompressible data (from /dev/urandom) and one with compressible data (from the yes executable). Each test wrote to a test dataset with compression=gzip and all other settings at their defaults:

Incompressible data

root@qat-test:~# cat /sys/module/zfs/parameters/zfs_qat_compress_disable
0 
root@qat-test:~# cat /root/urand.txt | pv > /pool/crypt/urand.txt 
1000MiB 0:00:05 [ 199MiB/s] [       <=>                                                     ]
root@qat-test:~# cat /root/urand.txt | pv > /pool/crypt/urand.txt 
1000MiB 0:00:05 [ 194MiB/s] [       <=>                                                     ]
root@qat-test:~# cat /root/urand.txt | pv > /pool/crypt/urand.txt 
1000MiB 0:00:04 [ 204MiB/s] [      <=>                                                      ]
root@qat-test:~# cat /root/urand.txt | pv > /pool/crypt/urand.txt 
1000MiB 0:00:04 [ 204MiB/s] [      <=>                                                      ]
root@qat-test:~# 
root@qat-test:~# 
root@qat-test:~# 
root@qat-test:~# echo 1 > /sys/module/zfs/parameters/zfs_qat_compress_disable                
root@qat-test:~# cat /root/urand.txt | pv > /pool/crypt/urand.txt 
1000MiB 0:00:06 [ 160MiB/s] [        <=>                                                    ]
root@qat-test:~# cat /root/urand.txt | pv > /pool/crypt/urand.txt 
1000MiB 0:00:05 [ 167MiB/s] [       <=>                                                     ]
root@qat-test:~# cat /root/urand.txt | pv > /pool/crypt/urand.txt 
1000MiB 0:00:06 [ 161MiB/s] [        <=>                                                    ]
root@qat-test:~# cat /root/urand.txt | pv > /pool/crypt/urand.txt 
1000MiB 0:00:06 [ 158MiB/s] [        <=>                                                    ]
root@qat-test:~# cat /root/urand.txt | pv > /pool/crypt/urand.txt 
1000MiB 0:00:06 [ 160MiB/s] [        <=>                                                    ]
root@qat-test:~#

During this test, CPU usage for EACH of the z_wr_iss threads was routinely at about 70% without QAT and <1% with QAT.

Compressible data

root@qat-test:~# cat /sys/module/zfs/parameters/zfs_qat_compress_disable                     
0
root@qat-test:~# cat /root/yes.txt | pv > /pool/crypt/yes.txt                                
1000MiB 0:00:05 [ 199MiB/s] [       <=>                                                     ]
root@qat-test:~# cat /root/yes.txt | pv > /pool/crypt/yes.txt 
1000MiB 0:00:04 [ 206MiB/s] [      <=>                                                      ]
root@qat-test:~# cat /root/yes.txt | pv > /pool/crypt/yes.txt 
1000MiB 0:00:04 [ 201MiB/s] [      <=>                                                      ]
root@qat-test:~# cat /root/yes.txt | pv > /pool/crypt/yes.txt 
1000MiB 0:00:04 [ 207MiB/s] [      <=>                                                      ]
root@qat-test:~# cat /root/yes.txt | pv > /pool/crypt/yes.txt 
1000MiB 0:00:04 [ 200MiB/s] [      <=>                                                      ]
root@qat-test:~# cat /root/yes.txt | pv > /pool/crypt/yes.txt 
1000MiB 0:00:05 [ 196MiB/s] [       <=>                                                     ]
root@qat-test:~# 
root@qat-test:~# 
root@qat-test:~# 
root@qat-test:~# echo 1 > /sys/module/zfs/parameters/zfs_qat_compress_disable                
root@qat-test:~# cat /root/yes.txt | pv > /pool/crypt/yes.txt                                
1000MiB 0:00:05 [ 184MiB/s] [       <=>                                                     ]
root@qat-test:~# cat /root/yes.txt | pv > /pool/crypt/yes.txt 
1000MiB 0:00:05 [ 192MiB/s] [       <=>                                                     ]
root@qat-test:~# cat /root/yes.txt | pv > /pool/crypt/yes.txt 
1000MiB 0:00:05 [ 189MiB/s] [       <=>                                                     ]
root@qat-test:~# cat /root/yes.txt | pv > /pool/crypt/yes.txt 
1000MiB 0:00:05 [ 190MiB/s] [       <=>                                                     ]
root@qat-test:~# cat /root/yes.txt | pv > /pool/crypt/yes.txt 
1000MiB 0:00:05 [ 188MiB/s] [       <=>                                                     ]

During this test, CPU usage for each of the z_wr_iss threads was routinely at about 20% without QAT and <1% with QAT.

TL:DR: It looks like even with this change the benefits of

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the ZFS on Linux code style requirements.
I have updated the documentation accordingly.
I have read the CONTRIBUTING document.
I have added tests to cover my changes.
All new and existing tests passed.
All commit messages are properly formatted and contain Signed-off-by.
Change has been approved by a ZFS on Linux member.

tcaputi · 2018-03-23T18:24:44Z

@wli5 Would you mind taking a look at this when you get a chance?

wli5 · 2018-03-26T02:24:16Z

I think it is not efficient as the memory copy is always needed.

A better way might be to allocate a temporary "addition" buffer (add_start), and send it to qat_compress as separate parameter, e.g.:
qat_compress(QAT_COMPRESS, s_start, s_len, d_start, d_len, add_start, add_len, &dstlen)
in function qat_compress, we construct destination sg-list buffer which contains the "d_start" and "add_start", for most cases, the addition buffer will not be used, if it is used, we just skip compression and return the source buffer, there is no memory copy anymore for both compressible and incompressible data.

Please note, the dest buffer allocated by ZFS is 87.5% of source buffer size:

/* Compress at least 12.5% */
d_len = s_len - (s_len >> 3);

So, if the data is compressed to less than 87.5% of source data, we return the compressed buffer. otherwise, we just delete the temp addition buffer, and return the source buffer (no-compress).
In theory, the worst expand rate is 1.125x of source size, so if we allocate addition buffer with same size as dest buffer, the total dest size will be 87.5% x 2 = 1.75x of source buffer, it is big enough to cover the worst expansion case.

tcaputi · 2018-03-26T16:18:17Z

@wli5 The PR has been updated. Is this close to what you had in mind? Tested with compressible and incompressible data and there was a slight improvement to average throughput for compressible data when compared to the previous iteration (210MB/s -> 214MB/s).

wli5 · 2018-03-27T03:25:46Z

module/zfs/gzip.c

+			if (dstlen < d_len) {
+				return ((size_t)dstlen);
+			} else {
+				if (d_len == s_len)


I think this check can be removed.
If the QAT returns success, but the output length (dstLen) is larger or equal to the d_len (87.5% of source length), we just copy the source buffer to dest buffer and return the source length, so ZFS knows the data is not compressed.

I did this to match the check below ikn the software implementation. When I didn't have it I ended up getting GPFs.

OK, it's different option - we can do the overflow check inside qat_compress function, that is what you've done, that's fine.
Another option I suggested is, not check inside the function, but check the output length in the calling side to decide if we skip the compression (e.g., copy source buffer to dest).
All can be working. Please keep your current implementation if you like, thanks!

And, can you please add one comment in the function, saying if it overflow the dest length will be set to the source length ?

Sure. Ill add the comment.

wli5 · 2018-03-27T03:33:20Z

module/zfs/qat_compress.c

@@ -379,6 +413,7 @@ qat_compress(qat_compress_dir_t dir, char *src, int src_len,

 		compressed_sz = dc_results.produced;
 		if (compressed_sz + hdr_sz + ZLIB_FOOT_SZ > dst_len) {


this check tries to make sure there is enough space in dest buffer for the compressed data and gzip head + foot, "add_len" should be added. Actually I'm now thinking maybe we can remove this check safely, as adding the addition buffer, the dest size is large enough to contain all output in theory, it is 1.75x of source buffer now!

i'd rather keep this in in case this code gets used in other contexts.

Please add add_len

I didn't want to do that because in my comment I say that the add_buf is a scratch buffer that can be discarded after the function is done, and if we leak into the add_buf we know we're over the useful limit anyway.

wli5 · 2018-03-27T03:40:54Z

module/zfs/qat_compress.c

 	QAT_PHYS_CONTIG_FREE(buffer_meta_src);
 	QAT_PHYS_CONTIG_FREE(buffer_meta_dst);
 	QAT_PHYS_CONTIG_FREE(buf_list_src);
 	QAT_PHYS_CONTIG_FREE(buf_list_dst);

-	return (ret);
+	return (status);


"status" initial value is success:
CpaStatus status = CPA_STATUS_SUCCESS
some fail branch will return success.

The idea here was that CPA_SATUS_FAIL would indicate a real hardware failure whereas a src_len == dst_len would indicate that compression simply wasn't worth it

Previously the function could return -1 but the calling code would check that it was equal to CPA_STATUS_SUCCESS

OK - I see, but for some fail cases, it will return SUCCESS?

I think it makes sense. Perhaps @behlendorf has a better idea of how we should convey this meaning?

behlendorf · 2018-03-28T17:30:53Z

module/zfs/gzip.c

+		void *add_buf = zio_data_buf_alloc(add_buf_len);
+
+		ret = qat_compress(QAT_COMPRESS, s_start, s_len, d_start,
+		    d_len, add_buf, add_buf_len, &dstlen);


Since the caller never needs to access this additonal space it's aukward to force it to do the allocation. It really shouldn't need to worry about this when it can all be handled internally in the qat code. How about renaming qat_compress() to qat_compress_impl() and then have qat_compress() be a wrapper function which handles allocating the needed memory for QAT_COMPRESS before calling qat_compress_impl().

This would then allow you to move the more involved status checking about success/failure/compressed size all in to the wrapper function. Callers qat_compress() wouldn't then need to deal with it at all and could using the existing interface.

tcaputi · 2018-03-28T22:58:22Z

@wli5 and @behlendorf Changes have been implemented as requested.

behlendorf

The comment here is a little bit misleading. Actually a d_len 12.% smaller than the s_len is passed to qat_compress().

Thanks for refactoring this to make the code clearer. I can't test this locally but it looks right to me from inspection.

behlendorf · 2018-03-29T01:49:48Z

module/zfs/gzip.c

 			return ((size_t)dstlen);
-		/* if hardware compress fail, do it again with software */
+		} else if (ret == CPA_STATUS_INCOMPRESSIBLE) {
+			if (d_len != s_len)


This case is always true today because of how zio_compress_data calls the compression function. But I agree we should keep so if we make the shift configurable as module option it will work. The same dead conditional exists in the software compression case so at least it's consistent.

behlendorf · 2018-03-29T02:04:14Z

module/zfs/qat_compress.c

+	return (status);
+}
+
+/* Entry point for QAT accelerated compression / decompression. */


nit: function header block comments should all be multi-line even they fit on one line.

/* * Entry point for QAT accelerated compression / decompression */

wli5

Looks good to me, thanks for making the changes!

codecov · 2018-03-29T07:26:03Z

Codecov Report

Merging #7338 into master will decrease coverage by 0.04%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #7338      +/-   ##
==========================================
- Coverage   76.35%   76.31%   -0.05%     
==========================================
  Files         329      329              
  Lines      104191   104189       -2     
==========================================
- Hits        79560    79510      -50     
- Misses      24631    24679      +48

Flag	Coverage Δ
#kernel	`76.07% <ø> (-0.19%)`	⬇️
#user	`65.5% <ø> (-0.22%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 13a2ff2...7cffa6b. Read the comment docs.

Currently, when ZFS wants to accelerate compression with QAT, it passes a destination buffer of the same size as the source buffer. Unfortunately, if the data is incompressible, QAT can actually "compress" the data to be larger than the source buffer. When this happens, the QAT driver will return a FAILED error code and print warnings to dmesg. This patch fixes these issues by providing the QAT driver with an additional buffer to work with so that even completely incompressible source data will not cause an overflow. This patch also resolves an error handling issue where incompressible data attempts compression twice: once by QAT and once in software. To fix this issue, a new (and fake) error code CPA_STATUS_INOMPRESSIBLE has been added so that the calling code can correctly account for the difference between a hardware failure and data that simply cannot be compressed. Signed-off-by: Tom Caputi <tcaputi@datto.com>

behlendorf approved these changes Mar 25, 2018

View reviewed changes

tcaputi force-pushed the qat branch from a01fe93 to b433631 Compare March 26, 2018 16:16

tcaputi force-pushed the qat branch from b433631 to 741c414 Compare March 26, 2018 16:58

wli5 reviewed Mar 27, 2018

View reviewed changes

behlendorf requested changes Mar 28, 2018

View reviewed changes

tcaputi force-pushed the qat branch from 741c414 to 34c81ea Compare March 28, 2018 19:37

tcaputi changed the title ~~Prevent QAT warnings with incompressible data~~ Resolve QAT issues with incompressible data Mar 28, 2018

behlendorf requested changes Mar 29, 2018

View reviewed changes

wli5 approved these changes Mar 29, 2018

View reviewed changes

tcaputi force-pushed the qat branch from 34c81ea to 7cffa6b Compare March 29, 2018 08:04

behlendorf approved these changes Mar 29, 2018

View reviewed changes

behlendorf merged commit 32dce2d into openzfs:master Mar 30, 2018

wli5 mentioned this pull request Nov 6, 2018

Reimplementation of QAT support for compression and checksums with Data Plane and and Intel QAT1.7.L.4.3.0-00033 drivers #8076

Closed

cfzhu mentioned this pull request Jul 15, 2021

Zlib may not be able to decompress some data compressed using QAT #12317

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolve QAT issues with incompressible data #7338

Resolve QAT issues with incompressible data #7338

tcaputi commented Mar 23, 2018

tcaputi commented Mar 23, 2018

wli5 commented Mar 26, 2018

tcaputi commented Mar 26, 2018

wli5 Mar 27, 2018

tcaputi Mar 27, 2018

wli5 Mar 27, 2018

wli5 Mar 27, 2018

tcaputi Mar 27, 2018

wli5 Mar 27, 2018

tcaputi Mar 27, 2018

wli5 Mar 27, 2018

wli5 Mar 27, 2018

tcaputi Mar 27, 2018

wli5 Mar 27, 2018

tcaputi Mar 27, 2018

tcaputi Mar 27, 2018

wli5 Mar 27, 2018

tcaputi Mar 27, 2018

behlendorf Mar 28, 2018

tcaputi commented Mar 28, 2018

behlendorf left a comment

behlendorf Mar 29, 2018

behlendorf Mar 29, 2018

wli5 left a comment

codecov bot commented Mar 29, 2018 •

edited

Loading

		@@ -379,6 +413,7 @@ qat_compress(qat_compress_dir_t dir, char *src, int src_len,

		compressed_sz = dc_results.produced;
		if (compressed_sz + hdr_sz + ZLIB_FOOT_SZ > dst_len) {

Resolve QAT issues with incompressible data #7338

Resolve QAT issues with incompressible data #7338

Conversation

tcaputi commented Mar 23, 2018

How Has This Been Tested?

Incompressible data

Compressible data

Types of changes

Checklist:

tcaputi commented Mar 23, 2018

wli5 commented Mar 26, 2018

tcaputi commented Mar 26, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tcaputi commented Mar 28, 2018

behlendorf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wli5 left a comment

Choose a reason for hiding this comment

codecov bot commented Mar 29, 2018 • edited Loading

Codecov Report

codecov bot commented Mar 29, 2018 •

edited

Loading