Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs-8.1 corrupt kernel stack kernel crash on s390x #8992

Closed
ColinIanKing opened this issue Jul 4, 2019 · 17 comments
Closed

zfs-8.1 corrupt kernel stack kernel crash on s390x #8992

ColinIanKing opened this issue Jul 4, 2019 · 17 comments
Labels
Type: Architecture Indicates an issue is specific to a single processor architecture Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@ColinIanKing
Copy link
Contributor

Summary: On s390x with Linux 5.0 and 5.2 ZFS 7.12 works fine when creating a snapshot, however with zfs 8.1 the snapshot crashes the s390 kernel with: "Corrupt kernel stack, can't continue" error and a stack dump (that I can't capture).

Type Version/Name
Distribution Name Ubuntu
Distribution Version Disco/Eoan
Linux Kernel 5.0 and 5.2
Architecture S390x
ZFS Version 8.1
SPL Version 8.1

Running the scrub test script below with ZFS 8.1 on a 5.0 or 5.2 kernel on a S390x instance the kernel will dump stack with the kernel error message "Corrupt kernel stack, can't continue". This does not occur with ZFS 7.12 . This test works fine with the 5.0/5.2 kernels on arm64, arm64, ppc64 but not s390x.

Script:

#!/bin/sh
#
# Copyright (C) 2016 Canonical
#   
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
#
echo -n "kernel smoke test, ZFS snapshot: "
TMP=/tmp
VDEV0=${TMP}/pool0-$$.img
VDEV1=${TMP}/pool1-$$.img
POOL=pool-smoke-$$
MARKER=/${POOL}/test/marker

dd if=/dev/zero of=${VDEV0} bs=1M count=512 > /dev/null 2>&1
dd if=/dev/zero of=${VDEV1} bs=1M count=512 > /dev/null 2>&1

zpool create ${POOL} ${VDEV0} ${VDEV1}
ret=$?
if [ $ret -ne 0 ]; then
	echo "FAILED: zpool create failed, exit code=$ret"
	rm ${VDEV0} ${VDEV1}
	exit 1
fi

zfs create ${POOL}/test
ret=$?
if [ $ret -ne 0 ]; then
	echo "FAILED: zfs create failed, exit code=$ret"
	zpool destroy ${POOL}
	rm ${VDEV0} ${VDEV1}
	exit 1
fi

#
# Populate with files..
#
cp -rp /etc /${POOL}/test
echo $$ > ${MARKER}

#
# Make a snapshot
#
zfs snapshot -r ${POOL}/test@snap
ret=$?
if [ $ret -ne 0 ]; then
	echo "FAILED: zfs snapshot failed, exit code=$ret"
	zpool destroy ${POOL}
	rm ${VDEV0} ${VDEV1}
	exit 1
fi

#
# Remove files
#
rm -rf /${POOL}/test/etc ${MARKER}

#
# Roll back
#
zfs rollback ${POOL}/test@snap
ret=$?
if [ $ret -ne 0 ]; then
	echo "FAILED: zfs rollback failed, exit code=$ret"
	zpool destroy ${POOL}
	rm ${VDEV0} ${VDEV1}
	exit 1
fi

#
# Check rolled back marker contains sane data
#
if [ ! -e ${MARKER} ]; then
	echo "FAILED: zfs rollback failed, ${MARKER} not restored"
	zpool destroy ${POOL}
	rm ${VDEV0} ${VDEV1}
	exit 1
fi
data=$(cat ${MARKER})
if [ $data -ne $$ ]; then
	echo "FAILED: zfs rollback failed, ${MARKER} contained unexpected data"
	zpool destroy ${POOL}
	rm ${VDEV0} ${VDEV1}
	exit 1
fi

zfs destroy ${POOL}/test@snap
ret=$?
if [ $ret -ne 0 ]; then
	echo "FAILED: zfs destroy snapshot failed, exit code=$ret"
	zpool destroy ${POOL}
	rm ${VDEV0} ${VDEV1}
	exit 1
fi

zpool destroy ${POOL}
ret=$?
if [ $ret -ne 0 ]; then
	echo "FAILED: zpool destroy failed, exit code=$ret"
	#
	# destroy failed, try to clean up, but this
	# wil probably fail
	#
	rm ${VDEV0} ${VDEV1}
	exit 1
fi

rm ${VDEV0} ${VDEV1}
echo "PASSED"
exit 0

It's getting late here in the UK, so I will pick this up tomorrow and try and bisect the issue down.

@ColinIanKing
Copy link
Contributor Author

Bisected and double checked the bisect point, found the commit causing the issue:

d99a015 is the first bad commit
commit d99a015
Author: Chris Williamson chris.williamson@delphix.com
Date: Thu Feb 8 09:16:23 2018 -0700

@ColinIanKing
Copy link
Contributor Author

It concerns me that one of the commit messages states: " Skip some ZFS Test Suite ZCP tests on sparc64 to avoid stack overflow". I'm speculating that s390 is being bitten by stack overflow issues that could have been found with these tests if they had not been disabled.

@ColinIanKing
Copy link
Contributor Author

So the chain of events is:

  1. an ZFS_IOC_DESTROY_SNAPS ioctl()
  2. dsl_destroy_snapshots_nvl gets called along the way
    -> zcp_eval()
    -> lua_newstate() (stack now has just a few slots left)
    -> luaD_rawrunprotected()
    <- stack corrupted

8K stacks on s390 is not big enough for the new lua handler. sigh

@ColinIanKing
Copy link
Contributor Author

zcp_eval() has a nice chunky 128 byte errmsg buffer in it. That's not helpful for small stacks

@ColinIanKing
Copy link
Contributor Author

Seems to be the lua setjmp/longjmp buff size is causing the stack corruption, this helps. Not sure what the correct size should be though

diff --git a/module/lua/ldo.c b/module/lua/ldo.c
index aca02b234..a28bfd82e 100644
--- a/module/lua/ldo.c
+++ b/module/lua/ldo.c
@@ -61,7 +61,7 @@
 #elif defined(__mips__)
 #define JMP_BUF_CNT    12
 #elif defined(__s390x__)
-#define JMP_BUF_CNT    9
+#define JMP_BUF_CNT    64
 #else
 #define        JMP_BUF_CNT     1
 #endif

@ColinIanKing
Copy link
Contributor Author

So, in conclusion, the stack size is fine, but the jmp buf size was root cause.

@ColinIanKing
Copy link
Contributor Author

I wonder if the same issue exists for the Sparc arch

@behlendorf behlendorf added Type: Architecture Indicates an issue is specific to a single processor architecture Type: Defect Incorrect behavior (e.g. crash, hang) labels Jul 5, 2019
@behlendorf
Copy link
Contributor

Thanks for running these tests on s390x. When this feature was merged all of the new ZTS tests were run on the supported platforms... with the exception of s390 since we didn't have access to a test system.

Do I understand correctly that increasing the JMP_BUF_CNT allowed all of the tests to pass on an s390x system? Or did you run in other problems?

It concerns me that one of the commit messages states: " Skip some ZFS Test Suite ZCP tests on sparc64 to avoid stack overflow".

It's not quite as bad as the commit message implies. In fact, the two tests which are skipped both explicitly test the maximum stack depth. Though I completely agree it needs to be addressed,

https://github.com/zfsonlinux/zfs/blob/master/tests/zfs-tests/cmd/nvlist_to_lua/nvlist_to_lua.c#L265
https://github.com/zfsonlinux/zfs/blob/master/tests/zfs-tests/tests/functional/channel_program/lua_core/tst.nested_neg.ksh#L23

@ColinIanKing
Copy link
Contributor Author

I haven't yet tested this on the full set of ZFS regression tests, nor do I know if the change I made is optimal. I think we need a s390x kernel specialist to dig into this to determine the correct optimal size; however, the allocation I provided it has plenty of slop to ensure it worked fine against the smaller set of regression tests we use when packaging ZFS for Ubuntu.

@ColinIanKing
Copy link
Contributor Author

BTW, I'm on vacation for 2 weeks from today, so I won't be able to provide much more info during this period.

@behlendorf
Copy link
Contributor

@ColinIanKing sounds good, thanks for letting me know. @don-brady do you recall how this value was determined for s390x?

@don-brady
Copy link
Contributor

@behlendorf
I think the original sources came from here (or possibly a fork): setjmp.h

typedef unsigned long __jmp_buf[18];

I'm guessing I may have assumed the unsigned long was 4 bytes and since we use long long (at 8 bytes each) I must have done the halving (bad!!).

So perhaps the fix is:

 #elif defined(__s390x__)
-#define JMP_BUF_CNT    9
+#define JMP_BUF_CNT    18
 #else

@behlendorf
Copy link
Contributor

Based on the documentation I could find it looks like it's 4 bytes for s390and 8 bytes for s390x. So your fix looks right to me, but we'll need @ColinIanKing to test it for us.

@ColinIanKing
Copy link
Contributor Author

I'll be back at work this coming week so I'll give it a test.

@ColinIanKing
Copy link
Contributor Author

I can confirm that setting JMP_BUF_CNT to 18 works fine and passes all the ubuntu zfs regression tests, so I'm confident that this is a good fix.

#elif defined(s390x)
-#define JMP_BUF_CNT 9
+#define JMP_BUF_CNT 18
#else

@ColinIanKing
Copy link
Contributor Author

Please add my Reported-by and Tested-by: Colin Ian King <canonical.com> sign-offs to the fix :-)

behlendorf added a commit to behlendorf/zfs that referenced this issue Jul 25, 2019
When adapting the original sources for s390x the JMP_BUF_CNT was
mistakenly halved due to an incorrect assumption of the size of
a unsigned long.  They are 8 bytes for the s390x architecture.
Increase JMP_BUF_CNT accordingly.

Authored-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reported-by: Colin Ian King <canonical.com>
Tested-by: Colin Ian King <canonical.com>
Issue openzfs#8992
behlendorf added a commit to behlendorf/zfs that referenced this issue Jul 25, 2019
When adapting the original sources for s390x the JMP_BUF_CNT was
mistakenly halved due to an incorrect assumption of the size of
a unsigned long.  They are 8 bytes for the s390x architecture.
Increase JMP_BUF_CNT accordingly.

Authored-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reported-by: Colin Ian King <canonical.com>
Tested-by: Colin Ian King <canonical.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#8992
@behlendorf
Copy link
Contributor

@ColinIanKing thanks for verifying the fix, I've opened PR #9080.

behlendorf added a commit to behlendorf/zfs that referenced this issue Jul 26, 2019
When adapting the original sources for s390x the JMP_BUF_CNT was
mistakenly halved due to an incorrect assumption of the size of
a unsigned long.  They are 8 bytes for the s390x architecture.
Increase JMP_BUF_CNT accordingly.

Authored-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reported-by: Colin Ian King <canonical.com>
Tested-by: Colin Ian King <canonical.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#8992
allanjude pushed a commit to allanjude/zfs that referenced this issue Aug 12, 2019
When adapting the original sources for s390x the JMP_BUF_CNT was
mistakenly halved due to an incorrect assumption of the size of
a unsigned long.  They are 8 bytes for the s390x architecture.
Increase JMP_BUF_CNT accordingly.

Authored-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reported-by: Colin Ian King <canonical.com>
Tested-by: Colin Ian King <canonical.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#8992
Closes openzfs#9080
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Aug 13, 2019
When adapting the original sources for s390x the JMP_BUF_CNT was
mistakenly halved due to an incorrect assumption of the size of
a unsigned long.  They are 8 bytes for the s390x architecture.
Increase JMP_BUF_CNT accordingly.

Authored-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reported-by: Colin Ian King <canonical.com>
Tested-by: Colin Ian King <canonical.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#8992
Closes openzfs#9080
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Aug 21, 2019
When adapting the original sources for s390x the JMP_BUF_CNT was
mistakenly halved due to an incorrect assumption of the size of
a unsigned long.  They are 8 bytes for the s390x architecture.
Increase JMP_BUF_CNT accordingly.

Authored-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reported-by: Colin Ian King <canonical.com>
Tested-by: Colin Ian King <canonical.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#8992
Closes openzfs#9080
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Aug 22, 2019
When adapting the original sources for s390x the JMP_BUF_CNT was
mistakenly halved due to an incorrect assumption of the size of
a unsigned long.  They are 8 bytes for the s390x architecture.
Increase JMP_BUF_CNT accordingly.

Authored-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reported-by: Colin Ian King <canonical.com>
Tested-by: Colin Ian King <canonical.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#8992
Closes openzfs#9080
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Aug 23, 2019
When adapting the original sources for s390x the JMP_BUF_CNT was
mistakenly halved due to an incorrect assumption of the size of
a unsigned long.  They are 8 bytes for the s390x architecture.
Increase JMP_BUF_CNT accordingly.

Authored-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reported-by: Colin Ian King <canonical.com>
Tested-by: Colin Ian King <canonical.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#8992
Closes openzfs#9080
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Sep 17, 2019
When adapting the original sources for s390x the JMP_BUF_CNT was
mistakenly halved due to an incorrect assumption of the size of
a unsigned long.  They are 8 bytes for the s390x architecture.
Increase JMP_BUF_CNT accordingly.

Authored-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reported-by: Colin Ian King <canonical.com>
Tested-by: Colin Ian King <canonical.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#8992
Closes openzfs#9080
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Sep 18, 2019
When adapting the original sources for s390x the JMP_BUF_CNT was
mistakenly halved due to an incorrect assumption of the size of
a unsigned long.  They are 8 bytes for the s390x architecture.
Increase JMP_BUF_CNT accordingly.

Authored-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reported-by: Colin Ian King <canonical.com>
Tested-by: Colin Ian King <canonical.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#8992
Closes openzfs#9080
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Sep 23, 2019
When adapting the original sources for s390x the JMP_BUF_CNT was
mistakenly halved due to an incorrect assumption of the size of
a unsigned long.  They are 8 bytes for the s390x architecture.
Increase JMP_BUF_CNT accordingly.

Authored-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reported-by: Colin Ian King <canonical.com>
Tested-by: Colin Ian King <canonical.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#8992
Closes openzfs#9080
tonyhutter pushed a commit that referenced this issue Sep 26, 2019
When adapting the original sources for s390x the JMP_BUF_CNT was
mistakenly halved due to an incorrect assumption of the size of
a unsigned long.  They are 8 bytes for the s390x architecture.
Increase JMP_BUF_CNT accordingly.

Authored-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reported-by: Colin Ian King <canonical.com>
Tested-by: Colin Ian King <canonical.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8992
Closes #9080
allanjude pushed a commit to allanjude/zfs that referenced this issue Oct 13, 2019
When adapting the original sources for s390x the JMP_BUF_CNT was
mistakenly halved due to an incorrect assumption of the size of
a unsigned long.  They are 8 bytes for the s390x architecture.
Increase JMP_BUF_CNT accordingly.

Authored-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reported-by: Colin Ian King <canonical.com>
Tested-by: Colin Ian King <canonical.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#8992
Closes openzfs#9080
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Architecture Indicates an issue is specific to a single processor architecture Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

3 participants