-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZTS: Fix mmp_interval failure #8906
Conversation
The mmp_interval test case was failing on Fedora 30 due to the built-in 'echo' command terminating the script when it was unable to write to the sysfs module parameter. This change in behavior was observed with ksh-2020.0.0-alpha1. Resolve the issue by using the external cat command which fails gracefully as expected. Additionally, remove some incorrect quotes around the $? return values. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Codecov Report
@@ Coverage Diff @@
## master #8906 +/- ##
==========================================
+ Coverage 78.51% 78.76% +0.24%
==========================================
Files 382 382
Lines 117840 117818 -22
==========================================
+ Hits 92526 92797 +271
+ Misses 25314 25021 -293
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@behlendorf I'm one of the primary maintainers of ksh and am worried this issue represents a regression to ksh behavior. Any chance I can impose on you to instrument libtest.shlib to report the value of We recently removed the |
@krader1961 more than happy to. What I'm seeing after instrumenting Here's the small change I made, and the exact file contents for comparison. Let me know if there's something else you'd like me to instrument. Linux)
typeset zfs_tunables="/sys/module/$module/parameters"
[[ -w "$zfs_tunables/$tunable" ]] || return 1
+ echo "$PATH" >/var/tmp/PATH.log
+ echo -n "$value" >/var/tmp/echo.log
echo -n "$value" > "$zfs_tunables/$tunable"
return "$?"
;; $ hexdump -C /var/tmp/PATH.log
00000000 2f 68 6f 6d 65 2f 62 65 68 6c 65 6e 64 6f 2f 73 |/home/behlendo/s|
00000010 72 63 2f 67 69 74 2f 7a 66 73 2f 62 69 6e 0a |rc/git/zfs/bin.|
0000001f
$ hexdump -C /var/tmp/echo.log
00000000 30 |0|
00000001 |
Thanks, @behlendorf. That info shows it's not a regression we introduced recently. I suspect it was introduced by that AT&T team after the ksh93u+ release and before the project was moved to Github. I'll try to confirm that. Replacing the |
I can't reproduce this using ksh93u+, ksh93v-, or the current ksh beta release on OpenSuse when doing @behlendorf Can you provide more information regarding the nature of the failure when using |
@krader1961 I suspect what might be relevant here is that this test case is unusual in that it is testing that the a module option cannot be set to an invalid value. I'm also unable to reproduce the issue when setting it to an allowed value. Here's the error message I'm seeing logged:
I tried installing the Fedora I also verified that replacing ksh's built-in
|
i think, we have 2 options:
|
+1 for printf |
Okay, I can reproduce the write error by doing While my attempt to reproduce the problem on x86 Linux results in the write error it does not cause the ksh process to exit. Which is going to make debugging this more difficult if I can't reproduce the failure. And I don't have access to a Linux on Zseries system. @behlendorf I hate to continue imposing on you but any chance you can enable core dumps ( P.S., The ksh backtrace facility I added simply uses the libc |
Also, I'm more than happy to debug this myself if I can get access to a Linux on Zseries instance. |
@krader1961 happy to help. Using your example I was able to easily cause the failure now. This was using a clean install of Fedora 30 (x86_64) running under libvirt. Nothing fancy, so you should be able to reproduce it the same way. $ sudo /bin/bash
$ ulimit -c unlimited
$ ksh -c 'echo >/sys/module/sysrq/parameters/reset_seq'
ksh: echo: write to 1 failed [Invalid argument]
### 18766 Function backtrace:
1 (null) + 94355638756193
2 (null) + 139689715015424
3 (null) + 94355639741216
Aborted (core dumped)
$ coredumpctl debug
PID: 18766 (ksh)
UID: 0 (root)
GID: 0 (root)
Signal: 6 (ABRT)
Timestamp: Fri 2019-06-21 15:21:08 PDT (17s ago)
Command Line: ksh -c echo >/sys/module/sysrq/parameters/reset_seq
Executable: /usr/bin/ksh93
Control Group: /user.slice/user-1000.slice/session-1.scope
Unit: session-1.scope
Slice: user-1000.slice
Session: 1
Owner UID: 1000 (behlendo)
Boot ID: 91d71c6c20cd4f48a9ed5a161d706a9e
Machine ID: c3f950cadcc34cbf8192b6dc221e7d33
Hostname: localhost.localdomain
Storage: /var/lib/systemd/coredump/core.ksh.0.91d71c6c20cd4f48a9ed5a161d706a9e.18766.1561155668000000.lz4
Message: Process 18766 (ksh) of user 0 dumped core.
Stack trace of thread 18766:
#0 0x00007f0c0bd79e75 __GI_raise (libc.so.6)
#1 0x00007f0c0bd64895 __GI_abort (libc.so.6)
#2 0x000055d0e254a366 handle_sigsegv (ksh93)
#3 0x00007f0c0bd79f00 __restore_rt (libc.so.6)
#4 0x000055d0e263ab20 _Sfstdout (ksh93)
GNU gdb (GDB) Fedora 8.3-3.fc30
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/ksh93...
Reading symbols from /usr/lib/debug/usr/bin/ksh93-2020.0.0-0.2.fc30.x86_64.debug...
warning: core file may not match specified executable file.
[New LWP 18766]
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
Core was generated by `ksh -c echo >/sys/module/sysrq/parameters/reset_seq'.
Program terminated with signal SIGABRT, Aborted.
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50 return ret;
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007f0c0bd64895 in __GI_abort () at abort.c:79
#2 0x000055d0e254a366 in handle_sigsegv (signo=<optimized out>, info=<optimized out>, context=<optimized out>) at ../src/cmd/ksh93/sh/fault.c:73
#3 <signal handler called>
#4 0x000055d0e263ab20 in _Sfstdout ()
#5 0x000055d0e263ab20 in ?? ()
#6 0x0000000000000020 in ?? ()
#7 0x000055d0e263ab20 in ?? ()
#8 0x0000000000008000 in ?? ()
#9 0x000055d0e25c8e36 in _sfexcept (f=0x7f0c0bd3f6c0, f@entry=0x55d0e263ab20 <_Sfstdout>, type=type@entry=2, io=<optimized out>, io@entry=-1,
disc=disc@entry=0x55d0e2d77b60) at ../src/lib/libast/sfio/sfexcept.c:53
#10 0x000055d0e2591c86 in sfwr (f=f@entry=0x55d0e263ab20 <_Sfstdout>, buf=buf@entry=0x55d0e2d79870, n=n@entry=1, disc=0x55d0e2d77b60)
at ../src/lib/libast/sfio/sfwr.c:214
#11 0x000055d0e25826c7 in _sfflsbuf (f=f@entry=0x55d0e263ab20 <_Sfstdout>, c=<optimized out>, c@entry=-1) at ../src/lib/libast/sfio/sfflsbuf.c:90
#12 0x000055d0e258b3d8 in sfsync (f=f@entry=0x55d0e263ab20 <_Sfstdout>) at ../src/lib/libast/sfio/sfsync.c:124
#13 0x000055d0e25711f6 in b_print (argc=argc@entry=0, argv=0x55d0e2d698f8, argv@entry=0x55d0e2d698f0, context=context@entry=0x7ffd12fb10e0)
at ../src/cmd/ksh93/bltins/print.c:403
#14 0x000055d0e257171d in B_echo (argc=<optimized out>, argv=0x55d0e2d698f0, context=<optimized out>) at ../src/cmd/ksh93/bltins/print.c:142
#15 0x000055d0e252c8ae in sh_exec () at ../src/cmd/ksh93/sh/xec.c:1218
#16 0x000055d0e24ff478 in exfile (shp=shp@entry=0x55d0e263f400 <sh>, iop=iop@entry=0x55d0e2d76f80, fno=-1, fno@entry=0) at ../src/cmd/ksh93/sh/main.c:515
#17 0x000055d0e25001fb in sh_main (ac=<optimized out>, av=<optimized out>, userinit=<optimized out>) at ../src/cmd/ksh93/sh/main.c:313
#18 0x00007f0c0bd65f33 in __libc_start_main (main=0x55d0e24feef0 <main>, argc=3, argv=0x7ffd12fb1bf8, init=<optimized out>, fini=<optimized out>,
rtld_fini=<optimized out>, stack_end=0x7ffd12fb1be8) at ../csu/libc-start.c:308
#19 0x000055d0e24fef9e in _start ()
(gdb) |
@behlendorf Much thanks for bringing this to our attention. This bug was introduced by me nearly a year ago as a consequence of resolving a lot of lint warnings involving the FWIW, We would have jumped on this earlier, before you created this PR, had an issue been opened against the ksh project. The only reason I dug into this is because @siteshwar, who works for RedHat, noticed it and brought it to my attention. |
@behlendorf Can we find a way to continuously test upstream ksh with upstream zfs on linux ? I have been doing package builds through copr for Fedora and Cent OS, you should be able to use them in your CI. |
@krader1961 I'm glad you were able to identify it, and happier to see it fixed upstream. Thanks for the pointer to the ksh project, if we manage uncover any other strange issues I'll make sure to open a new issue. @siteshwar we could easily enough have the Fedora builder pull the latest packages from copr. But I'm not sure how much benefit you'd really be able to derive from our additional testing. |
It would help a lot. The problem is that ksh source code has evolved over more than 30 years and that makes it extremely hard to maintain. As we move forward, we would like to do more code refactoring to make it maintainable. Directly working with projects like zfs will help us ensure a smoother transition to new ksh, and we will be more confident that our changes do not introduce regressions. |
@siteshwar in that case, I think we can give it a try. I'll see about updating at least one of our CI bots to install your latest copr packages for testing. |
@behlendorf Awesome. Thanks! |
The mmp_interval test case was failing on Fedora 30 due to the built-in 'echo' command terminating the script when it was unable to write to the sysfs module parameter. This change in behavior was observed with ksh-2020.0.0-alpha1. Resolve the issue by using the external cat command which fails gracefully as expected. Additionally, remove some incorrect quotes around the $? return values. Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#8906
The mmp_interval test case was failing on Fedora 30 due to the built-in 'echo' command terminating the script when it was unable to write to the sysfs module parameter. This change in behavior was observed with ksh-2020.0.0-alpha1. Resolve the issue by using the external cat command which fails gracefully as expected. Additionally, remove some incorrect quotes around the $? return values. Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#8906
The mmp_interval test case was failing on Fedora 30 due to the built-in 'echo' command terminating the script when it was unable to write to the sysfs module parameter. This change in behavior was observed with ksh-2020.0.0-alpha1. Resolve the issue by using the external cat command which fails gracefully as expected. Additionally, remove some incorrect quotes around the $? return values. Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#8906
The mmp_interval test case was failing on Fedora 30 due to the built-in 'echo' command terminating the script when it was unable to write to the sysfs module parameter. This change in behavior was observed with ksh-2020.0.0-alpha1. Resolve the issue by using the external cat command which fails gracefully as expected. Additionally, remove some incorrect quotes around the $? return values. Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#8906
The mmp_interval test case was failing on Fedora 30 due to the built-in 'echo' command terminating the script when it was unable to write to the sysfs module parameter. This change in behavior was observed with ksh-2020.0.0-alpha1. Resolve the issue by using the external cat command which fails gracefully as expected. Additionally, remove some incorrect quotes around the $? return values. Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#8906
The mmp_interval test case was failing on Fedora 30 due to the built-in 'echo' command terminating the script when it was unable to write to the sysfs module parameter. This change in behavior was observed with ksh-2020.0.0-alpha1. Resolve the issue by using the external cat command which fails gracefully as expected. Additionally, remove some incorrect quotes around the $? return values. Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8906
Motivation and Context
After updating the Fedora CI builder from 29 to 30 the ZTS
mmp_interval
test case was observed to consistently fail. This was causing the CI to
mark all PRs as failed.
Description
The
mmp_interval
test case was failing on Fedora 30 due to the built-inecho
command terminating the script when it was unable to write tothe sysfs module parameter. This change in behavior was observed with
ksh-2020.0.0-alpha1. Resolve the issue by using the external
cat
command which fails gracefully as expected.
Additionally, remove some incorrect quotes around the $? return values.
How Has This Been Tested?
Locally tested on Fedora 30. Pending full run of the ZTS via the CI.
Types of changes
Checklist:
Signed-off-by
.