Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange test failures #92

Closed
WOnder93 opened this issue May 2, 2020 · 9 comments
Closed

Strange test failures #92

WOnder93 opened this issue May 2, 2020 · 9 comments

Comments

@WOnder93
Copy link
Contributor

WOnder93 commented May 2, 2020

When running the libkcapi testsuite, I sometimes (very rarely) notice failures in the "Symmetric synchronous multithreaded non-aligned test" instances with long (8k) messages. I discovered that this becomes much more often if I replace the sleep(1); in tests/kcapi-main.c with a shorter delay (say, usleep(100);).

To be honest, I don't see any logical reason for the sleep to be there. It seems it is only swiping a real bug under the carpet...

Here is an excerpt from strace over one such failed run:
Command:

strace -f ./../bin/kcapi  -x 1 -s -j  -e -c 'xts(aes)' -i 7fbc02ebf5b93322329df9bfccb635af -k 8d7dd9b0170ce0b5f2f8e1aa768e01e91da8bfc67fd486d081b28254c99eb423 -p $(for (( i = 0; i < 8*1024; i++)); do echo -n ee; done)

Output:

[...]
##################################################
# Setting up the cipher handles...
##################################################
socket(AF_NETLINK, SOCK_RAW, NETLINK_CRYPTO) = 4
bind(4, {sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, 12) = 0
getsockname(4, {sa_family=AF_NETLINK, nl_pid=3374825, nl_groups=00000000}, [12]) = 0
sendmsg(4, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base={{len=224, type=CRYPTO_MSG_GETALG, flags=NLM_F_REQUEST, seq=1588419112, pid=0}, {cru_name="xts(aes)", cru_driver_name="", cru_module_name="", cru_type=0, cru_mask=0, cru_refcnt=0, cru_flags=0}}, iov_len=224}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 224
recvmsg(4, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base={{len=380, type=CRYPTO_MSG_GETALG, flags=0, seq=1588419112, pid=3374825}, {cru_name="xts(aes)", cru_driver_name="xts-aes-aesni", cru_module_name="kernel", cru_type=0, cru_mask=0, cru_refcnt=4, cru_flags=0x485}, [{{nla_len=8, nla_type=CRYPTOCFGA_PRIORITY_VAL}, 401}, {{nla_len=148, nla_type=CRYPTOCFGA_REPORT_BLKCIPHER}, {type="skcipher", geniv="<none>", blocksize=16, min_keysize=32, max_keysize=64, ivsize=16}}]}, iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 380
close(4)                                = 0
pipe([4, 5])                            = 0
setsockopt(3, SOL_ALG, ALG_SET_KEY, "\215}\331\260\27\f\340\265\362\370\341\252v\216\1\351\35\250\277\306\177\324\206\320\201\262\202T\311\236\264#", 32) = 0
pipe([6, 7])                            = 0
accept(3, NULL, NULL)                   = 8
sendmsg(8, {msg_name=NULL, msg_namelen=0, msg_iov=NULL, msg_iovlen=0, msg_control=[{cmsg_len=20, cmsg_level=SOL_ALG, cmsg_type=0x3}, {cmsg_len=40, cmsg_level=SOL_ALG, cmsg_type=0x2}], msg_controllen=64, msg_flags=0}, MSG_MORE) = 0
accept(3, NULL, NULL)                   = 9
sendmsg(9, {msg_name=NULL, msg_namelen=0, msg_iov=NULL, msg_iovlen=0, msg_control=[{cmsg_len=20, cmsg_level=SOL_ALG, cmsg_type=0x3}, {cmsg_len=40, cmsg_level=SOL_ALG, cmsg_type=0x2}], msg_controllen=64, msg_flags=0}, MSG_MORE) = 0
##################################################
# First loop iteration - everything works OK
##################################################
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLDstrace: Process 3374839 attached
, child_tidptr=0x7f6f1c318a10) = 3374839
[pid 3374825] recvmsg(8,  <unfinished ...>
[pid 3374839] vmsplice(7, [{iov_base="\356\356\356\356\356\356\356\356\356\356\356\356\356\356\356\356\356\356\356\356\356\356\356\356\356\356\356\356\356\356\356\356"..., iov_len=8192}], 1, SPLICE_F_MORE|SPLICE_F_GIFT) = 8192
[pid 3374839] splice(6, NULL, 8, NULL, 8192, SPLICE_F_MORE) = 8192
[pid 3374825] <... recvmsg resumed>{msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="N\243(\315[\20\320\313;\274z\271-T\7-\343\213T\302p|$\240\243\353~_0~9\225"..., iov_len=8192}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 8192
[pid 3374839] exit_group(0)             = ?
[pid 3374825] fstat(1, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
[pid 3374839] +++ exited with 0 +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3374839, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
write(1, "4ea328cd5b10d0cb3bbc7ab92d54072d"..., 16384) = 16384
##################################################
# Second loop iteration - something goes wrong!
##################################################
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLDstrace: Process 3374840 attached
, child_tidptr=0x7f6f1c318a10) = 3374840
[pid 3374825] recvmsg(9,  <unfinished ...>
[pid 3374840] vmsplice(5, [{iov_base="\356\356\356\356\356\356\356\356\356\356\356\356\356\356\356\356\356\356\356\356\356\356\356\356\356\356\356\356\356\356\356\356"..., iov_len=8192}], 1, SPLICE_F_MORE|SPLICE_F_GIFT) = 8192
[pid 3374840] splice(4, NULL, 9, NULL, 8192, SPLICE_F_MORE <unfinished ...>
[pid 3374825] <... recvmsg resumed>{msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="N\243(\315[\20\320\313;\274z\271-T\7-\343\213T\302p|$\240\243\353~_0~9\225"..., iov_len=8192}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 3344
[pid 3374840] <... splice resumed>)     = 8192
[pid 3374825] recvmsg(9,  <unfinished ...>
[pid 3374840] exit_group(0 <unfinished ...>
[pid 3374825] <... recvmsg resumed>{msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\321{~\336\177\312\223A\16k\311\356\226\313`$\277 .\312\371\211\35<\324\376V7\345\217\253o"..., iov_len=4848}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 4848
[pid 3374840] <... exit_group resumed>) = ?
[pid 3374825] write(1, "\nTwo concurrent opfd operations "..., 4096) = 4096
[pid 3374840] +++ exited with 0 +++

For some reason, the recvmsg() call is cut off early and returns only 3344 bytes (I believe it is always this number). The second recvmsg() then returns the remaining 4848 bytes, however the resulting message doesn't match the expected output. Any idea what's going on? This looks like a bug somewhere in the AF_ALG interface.

Kernel version: 5.5.15-200.fc31.x86_64

@smuellerDD
Copy link
Owner

I am unable to reproduce the issue with the current code.

Note, I have removed the sleep(1) that you rightfully pointed out. Could you test it again on your system whether you see that bug?

Thanks.

@WOnder93
Copy link
Contributor Author

WOnder93 commented May 6, 2020

Strange, I can't reproduce it either now, even when I revert to 2da4a5a and remove the sleep. I upgraded to kernel 5.6.8 in the meantime, so let me try again with the 5.5 kernel...

@WOnder93
Copy link
Contributor Author

WOnder93 commented May 6, 2020

Yes, it does reproduce on 5.5.10, so it must have been fixed by a kernel change somewhere between 5.5 and 5.6. I'll try to bisect it.

@WOnder93
Copy link
Contributor Author

WOnder93 commented May 6, 2020

OK, never mind, it turns out I just didn't wait long enough... Now I'm getting the bug also on 5.6.8-200.fc31.x86_64 and 5.7.0-0.rc4.1.fc33.x86_64 kernels (with current master and removed sleep). It usually takes a couple hundred to thousand invocations to trigger.

BTW, you reverted the sleep removal, but the commit lacks any explanation...

@smuellerDD
Copy link
Owner

smuellerDD commented May 6, 2020 via email

@WOnder93
Copy link
Contributor Author

WOnder93 commented May 6, 2020

Wait... isn't the problem that we are testing the stream mode with xts(aes), which is not a stream cipher? I mean "stream" in the sense that it updates the IV so that the encryption can continue with in a seprate operation with the returned IV. IIUC, in stream mode the socket input can be internally split into parts, which are processed in separate skcipher operations, which would produce a different output for XTS, since it would process each part with the same IV... Am I right?

@smuellerDD
Copy link
Owner

smuellerDD commented May 6, 2020 via email

@WOnder93
Copy link
Contributor Author

WOnder93 commented May 6, 2020

The key/IV is set with the init part. Thus, the multiple updates operate on
the existing cipher handle whose IV is internally updated as defined by the
chaining mode.

Right, I probably mixed up the terminology, but my point was that xts(aes) doesn't support chaining requests like that (at least as implemented in the kernel Crypto API). You can see it in crypto/testmgr.h - the ciphers that are expected to return IVs usable for chaining have the .iv_out entries filled in in their test vectors (and those that don't support it do not). You can see that only CTR and CBC test vectors have them, not XTS.

Looking again at how XTS is defined, it seems one could implement transparent chaining by simply decrypting the final tweak using the tweak key and return it as the output IV... but I believe this has never been mandated nor implemented in the Crypto API (likely because of the overhead of the final tweak decryption, which would be pointless if you're not going to use the output IV - and there is currently no way to signal to the driver that you are going to need it).

I could remember this wrong, so you should probably double-check with Herbert, but AFAIK this is how it is (and always has been).

WOnder93 added a commit to WOnder93/tests-beaker that referenced this issue May 12, 2020
There is a race condition when these are run with the long XTS test
vectors, so skip them for now.

See: smuellerDD/libkcapi#92
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
rasibley pushed a commit to cki-project/tests-beaker that referenced this issue May 12, 2020
There is a race condition when these are run with the long XTS test
vectors, so skip them for now.

See: smuellerDD/libkcapi#92
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
sahanaprasad07 added a commit to sahanaprasad07/libkcapi that referenced this issue May 15, 2020
                   (Note : removing this causes issue : smuellerDD/issues/92)

		   Revert commit d542b05

Signed-off-by: Sahana Prasad <sahana@redhat.com>
sahanaprasad07 added a commit to sahanaprasad07/libkcapi that referenced this issue May 15, 2020
…a race condition

              when run with long XTS test vectors.

              For more information, see:
              smuellerDD#92

Signed-off-by: Sahana Prasad <sahana@redhat.com>
sahanaprasad07 added a commit to sahanaprasad07/libkcapi that referenced this issue May 15, 2020
test/kcapi-main.c: Removes unneeded sleep.
		   Reverts commit d542b05

test/test.sh: Temporarily disables the multithreaded tests that cause
              a race condition when run with long XTS test vectors.

              For more information, see:
              smuellerDD#92

Signed-off-by: Sahana Prasad <sahana@redhat.com>
sahanaprasad07 added a commit to sahanaprasad07/libkcapi that referenced this issue May 15, 2020
test/kcapi-main.c: Removes unneeded sleep.
		   Reverts commit d542b05

test/test.sh: Temporarily disables the multithreaded tests that cause
              a race condition when run with long XTS test vectors.

              For more information, see:
              smuellerDD#92

Signed-off-by: Sahana Prasad <sahana@redhat.com>
@smuellerDD
Copy link
Owner

You are quite right. I slightly reformatted the patches. Thanks for your help.

smuellerDD added a commit that referenced this issue May 18, 2020
The full discussion is provided with
#92.

Reported-by: Ondrej Mosnáček
Signed-off-by: Sahana Prasad <sahana@redhat.com>
Signed-off-by: Stephan Mueller <smueller@chronox.de>
bmwiedemann added a commit to bmwiedemann/openSUSE that referenced this issue Sep 6, 2020
https://build.opensuse.org/request/show/831601
by user msmeissn + dimstar_suse
- update to 1.2.0:
 * enhancement: kcapi-hasher: add madvise and 64 bit support by Brandur Simonsen
 * fix: fix clang warnding in KDF implementation by Khem Raj
 * fix: fix inverted logic in kcapi-main test logic reported by Ondrej Mosnáček
 * fix: return error when iteration count is zero for PBKDF as reported by
   Guido Vranken
 * enhancement: add function kcapi_cipher_stream_update_last to indicate the
   last block of a symmetric cipher stream operation
 * disable XTS multithreaded tests as it triggers a race discussed in
   smuellerDD/libkcapi#92. The conclusion is
   the following: xts(aes) doesn't support chaining requests like for other
   ciphers such as CBC (at least as implemented in the kernel Crypto API).
   That can be seen in `crypto/testmgr.h` - th
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants