-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add longjmp support for Thumb-2 #9967
Conversation
Codecov Report
@@ Coverage Diff @@
## master #9967 +/- ##
========================================
+ Coverage 79% 79% +<1%
========================================
Files 385 385
Lines 121938 121938
========================================
+ Hits 96679 96823 +144
+ Misses 25259 25115 -144
Continue to review full report at Codecov.
|
@behlendorf Tried on an old Beaglebone running 4.19.31-armv7-x31, it seems to exhibit the issue:
however after merging your branch issue-9957, zlua still won't load:
No idea what that means or how to fix it ... Cordially, |
I have tested this patch by rebuilding the To rebuild the zfs deb-packages I followed this guide: https://wiki.debian.org/BuildingTutorial Verified that the patch is working by:
[ +1.585094] ZFS: Loaded module v0.8.2-3~bpo10+1, ZFS pool version 5000, ZFS filesystem version 5
zfs create storage/test-1
zfs create storage/test-2
zfs create storage/test-3
# ...
# generate random data using dd
# ...
zfs destroy storage/test-1
zfs destroy storage/test-2
zfs destroy storage/test-3 |
I have tested this patch on zfs-0.8.3 on the Helios4 with kernel 4.14.135-mvebu. I am also able to destroy datasets and snapshots (created in a test zpool). I've also tried running the zfs-tests.sh, and it fails at alloc_class_011_neg, and dmesg reports a kernel NULL pointer dereference. Logs attached below. Is this a known problem? Any suggestions for further tests or debugging are welcome.
|
Since @rdolbeau actually ran into a different issue on a Beaglebone with what I assume to be the branch of this PR, I also tested that one on the Helios4 and it works here as well. What I did was: git clone https://github.com/zfsonlinux/zfs.git
cd zfs
git remote add behlendorf https://github.com/behlendorf/zfs.git
git fetch behlendorf
git checkout --track behlendorf/issue-9957
./autogen.sh
./configure
make
make install I had to fix some of the lib path manually since this will install zfs to version: 0.8.0-577_g8268b7616 which is the git commit of the current branch. I cannot comment on the actual changes made in this PR as I only have a very limited understanding of this, but since the tests I did were all successful, this would be good to go from my end. |
Thanks for posting the test results. In addition to the create/destroy testing it would be great if someone could run the channel program test cases from the ZFS Test Suite. You can run just these tests by running: zfs-tests.sh -T channel_program @jsrlabs we haven't seen the specific crash you reported when testing. It looks like it may be related to failed memory allocation which would be unrelated to this specific issue. I'm not sure exactly what's causing the |
I ran
... and that is all that one produced. A second run with tst.large_prog disabled still causes a hang (unresponsive to ping):
Here is the kernel oops for the second run: kernel_oops2.txt |
You can run individual test cases with the $ zfs-tests.sh -t tests/functional/channel_program/lua_core/tst.large_prog
Test: .../functional/channel_program/lua_core/setup (run as root) [00:01] [PASS]
Test: .../functional/channel_program/lua_core/tst.large_prog (run as root) [00:00] [PASS]
Test: .../functional/channel_program/lua_core/cleanup (run as root) [00:00] [PASS] Or you can edit the |
The problem with that is that we cannot load the zfs module for version >= 0.8.2 prior to this fix (I assume this applies for all versions since 0.8.0). I tried running the Should the Any suggestions on how to approach this issue? Please let me know if there are any further information that I should provide to debug any of this further. |
No they should not. Channel programs were a feature introduced in the 0.8 release, this is why the 0.7.x tags are unaffected. That's something I probably should have mentioned earlier. Since all of the tests were tuned for an x86 system I'm not too surprised you ran in to some issues. A good place to start would be to run the master and its tests on a non-THUMB2 ARM system and see how it does. That should give us a baseline as to what to expect, and we can go from there. |
Of the channel_program tests, only the tst.lib_coroutine.lua part of tst.libraries.ksh is failing on the Helios4. This crash doesn't leave significant logs behind. I'm open to more suggestions. Out of time for today. For a non-THUMB2 baseline, I have a Raspberry Pi that I could try to run zfs-0.8.X (right now running 0.7.12), but I can't get to it for at maybe a week. |
I took a stab at running the tests on a Raspberry Pi 1 Model B with a ARMv6 CPU that only features the thumb instructions. All tests are done on a fully updated Raspbian Linux 10 (Debian buster) with kernel version 4.19.97+. Just to note right away, doing anything on that system is painfully slow, and the zfs compilation takes more than 2h - not the fastest CPU around. All tests are done so far with zfs master as of 17 Feb, git commit fb63fc0, and I installed zfs by doing: ./autogen.sh
./configure --prefix=/ \
--libdir=/lib \
--includedir=/usr/include \
--datarootdir=/usr/share
make -s -j1
sudo make install Running the tests wasn't all that successful:
Am I doing something wrong here or is this just expected behaviour for these CPUs? I would appreciate any hint on how to proceed. |
Just wanted to follow up to see if anyone would have any further input on this. I kinda hit a roadblock with the hardware I have access to. @jsrlabs did you get a chance to look into this? This bug prevents me (and possibly others) from upgrading to the 0.8-branch and given that many distributions (including Debian) come with the zfs 0.8, it would be good to get this sorted out somehow. Given the age (and thus suitability for ZFS) of non-THUMB2 ARM systems, I am after all not quite sure if we should really care about those systems at all. But then again, my experience with this is very limited. Maybe @behlendorf or @rdolbeau have some more insight? |
Apologies I lost track of this PR. It sounds like the limited manual testing which was done does verify this change is working properly (except on the beaglebone system). It's unfortunate we weren't able to run the ZFS Test Suite, but the logs do clearly show that's due to an entirely different issue. We've never really invested the effort in getting it working reliably on 32-bit system so that's not to surprising. @awehrfritz @jsrlabs if you can confirm that this PR resolves the issue with your hardware I'm open to merging it even though we couldn't get a full test run. |
When a Thumb-2 kernel is being used, then longjmp must be implemented using the Thumb-2 instruction set in module/lua/setjmp/setjmp_arm.S. Original-patch-by: @jsrlabs Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#9957
Yes, it does run successfully on my hardware: both the helios4 and a Raspberry Pi 2 are able to use zfs filesystems I had created previously. I also just ran the channel_tests on both, and the only ones that fail are the tst.lib_coroutine.lua part of tst.libraries.ksh (crash mentioned previously), and also tst.memory_limit. |
FWIW, the channel_program/lua_core/tst.memory_limit also fails on my Raspberry Pi 2 with vanilla zfs-0.8.3. Is there additional debugging I should do? |
No need. It sounds like it's easily reproducible in this environment. |
Codecov Report
@@ Coverage Diff @@
## master #9967 +/- ##
==========================================
+ Coverage 79.43% 79.57% +0.14%
==========================================
Files 389 389
Lines 123047 123047
==========================================
+ Hits 97739 97917 +178
+ Misses 25308 25130 -178
Continue to review full report at Codecov.
|
I'm trying again on the beaglebone just to see what happens (might take a while to compile...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm that this worked on my helios4 and raspberry pi 1. I don't have the hardware handy at the moment to run more tests though.
Running 5.3.1-bone9 from http://repos.rcn-ee.com/debian/ (newer than before), and master + the issue-9957 branch merged in, I still get the 'unknown relocation' error:
Without the merge, I still get Weird as it works for everybody else... Unfortunately I don't have a newer Beaglebone (this is an A6 White, 720 MHz, almost as old as it gets - my A3 died a long time ago - with a Cortex-A8 r3p2) to test the same kernel on different hardware. |
Thanks for merging this @behlendorf! Now that 0.8.4 is out, would it be possible to include this in 0.8.5? (I’m not sure what the process/requirements are to nominate this for a patch release, thus I just thought to ask here.) |
You're welcome. Now that it's merged to master this is something we'll consider for 0.8.5. |
Great, thanks! |
When a Thumb-2 kernel is being used, then longjmp must be implemented using the Thumb-2 instruction set in module/lua/setjmp/setjmp_arm.S. Original-patch-by: @jsrlabs Reviewed-by: @awehrfritz Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#7408 Closes openzfs#9957 Closes openzfs#9967
When a Thumb-2 kernel is being used, then longjmp must be implemented using the Thumb-2 instruction set in module/lua/setjmp/setjmp_arm.S. Original-patch-by: @jsrlabs Reviewed-by: @awehrfritz Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7408 Closes #9957 Closes #9967
When a Thumb-2 kernel is being used, then longjmp must be implemented using the Thumb-2 instruction set in module/lua/setjmp/setjmp_arm.S. Original-patch-by: @jsrlabs Reviewed-by: @awehrfritz Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#7408 Closes openzfs#9957 Closes openzfs#9967
When a Thumb-2 kernel is being used, then longjmp must be implemented using the Thumb-2 instruction set in module/lua/setjmp/setjmp_arm.S. Original-patch-by: @jsrlabs Reviewed-by: @awehrfritz Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#7408 Closes openzfs#9957 Closes openzfs#9967
Motivation and Context
Issue #9957. The longjmp() implementation is incompatible with Thumb-2
only kernels which prevents the use of ZFS on this hardware.
Description
When a Thumb-2 kernel is being used, then longjmp must be implemented
using the Thumb-2 instruction set in module/lua/setjmp/setjmp_arm.S.
Original-patch-by: @jsrlabs
How Has This Been Tested?
The original version of this patch was tested by @jsrlabs. I don't have
access to the required hardware to properly test the updated patch.
@jsrlabs, @awehrfritz, @rdolbeau would you mind reviewing and
testing the proposed change.
Types of changes
Checklist:
Signed-off-by
.