Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add longjmp support for Thumb-2 #9967

Merged
merged 1 commit into from
Apr 30, 2020
Merged

Conversation

behlendorf
Copy link
Contributor

Motivation and Context

Issue #9957. The longjmp() implementation is incompatible with Thumb-2
only kernels which prevents the use of ZFS on this hardware.

Description

When a Thumb-2 kernel is being used, then longjmp must be implemented
using the Thumb-2 instruction set in module/lua/setjmp/setjmp_arm.S.

Original-patch-by: @jsrlabs

How Has This Been Tested?

The original version of this patch was tested by @jsrlabs. I don't have
access to the required hardware to properly test the updated patch.
@jsrlabs, @awehrfritz, @rdolbeau would you mind reviewing and
testing the proposed change.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the contributing document.
  • I have added tests to cover my changes.
  • I have run the ZFS Test Suite with this change applied.
  • All commit messages are properly formatted and contain Signed-off-by.

@behlendorf behlendorf added Type: Architecture Indicates an issue is specific to a single processor architecture Type: Building Indicates an issue related to building binaries labels Feb 7, 2020
@codecov
Copy link

codecov bot commented Feb 8, 2020

Codecov Report

Merging #9967 into master will increase coverage by <1%.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff            @@
##           master    #9967    +/-   ##
========================================
+ Coverage      79%      79%   +<1%     
========================================
  Files         385      385            
  Lines      121938   121938            
========================================
+ Hits        96679    96823   +144     
+ Misses      25259    25115   -144
Flag Coverage Δ
#kernel 80% <ø> (ø) ⬇️
#user 67% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 795699a...8268b76. Read the comment docs.

@rdolbeau
Copy link
Contributor

rdolbeau commented Feb 8, 2020

@behlendorf Tried on an old Beaglebone running 4.19.31-armv7-x31, it seems to exhibit the issue:

zlua: section 3 reloc 34 sym 'longjmp': unsupported interworking call (Thumb -> ARM)

however after merging your branch issue-9957, zlua still won't load:

zlua: unknown relocation: 102

No idea what that means or how to fix it ...

Cordially,

@awehrfritz
Copy link
Contributor

awehrfritz commented Feb 8, 2020

I have tested this patch by rebuilding the zfs-dkms package version 0.8.2 from the Debian buster-backports repository (just simply replacing the file setjmp_arm.S with the one from this branch) on my Helios4 using kernel 4.19.84-mvebu and it fixes the issue.

To rebuild the zfs deb-packages I followed this guide: https://wiki.debian.org/BuildingTutorial

Verified that the patch is working by:

  1. Loading the kernel module without error. The dmesg output says:
[  +1.585094] ZFS: Loaded module v0.8.2-3~bpo10+1, ZFS pool version 5000, ZFS filesystem version 5
  1. Create and destroy zfs dataset without any errors by doing:
zfs create storage/test-1
zfs create storage/test-2
zfs create storage/test-3
# ...
# generate random data using dd
# ...
zfs destroy storage/test-1 
zfs destroy storage/test-2 
zfs destroy storage/test-3 

@jsrlabs
Copy link

jsrlabs commented Feb 8, 2020

I have tested this patch on zfs-0.8.3 on the Helios4 with kernel 4.14.135-mvebu. I am also able to destroy datasets and snapshots (created in a test zpool).

I've also tried running the zfs-tests.sh, and it fails at alloc_class_011_neg, and dmesg reports a kernel NULL pointer dereference. Logs attached below.

Is this a known problem? Any suggestions for further tests or debugging are welcome.

Test: /home/jsr/src/zfs-0.8.3/tests/zfs-tests/tests/functional/acl/posix/setup (run as root) [00:01] [PASS]
Test: /home/jsr/src/zfs-0.8.3/tests/zfs-tests/tests/functional/acl/posix/posix_001_pos (run as root) [00:00] [PASS]
Test: /home/jsr/src/zfs-0.8.3/tests/zfs-tests/tests/functional/acl/posix/posix_002_pos (run as root) [00:00] [PASS]
Test: /home/jsr/src/zfs-0.8.3/tests/zfs-tests/tests/functional/acl/posix/posix_003_pos (run as root) [00:00] [PASS]
Test: /home/jsr/src/zfs-0.8.3/tests/zfs-tests/tests/functional/acl/posix/cleanup (run as root) [00:01] [PASS]
Test: /home/jsr/src/zfs-0.8.3/tests/zfs-tests/tests/functional/alloc_class/setup (run as root) [00:00] [PASS]
Test: /home/jsr/src/zfs-0.8.3/tests/zfs-tests/tests/functional/alloc_class/alloc_class_001_pos (run as root) [00:05] [PASS]
Test: /home/jsr/src/zfs-0.8.3/tests/zfs-tests/tests/functional/alloc_class/alloc_class_002_neg (run as root) [00:07] [PASS]
Test: /home/jsr/src/zfs-0.8.3/tests/zfs-tests/tests/functional/alloc_class/alloc_class_003_pos (run as root) [00:10] [PASS]
Test: /home/jsr/src/zfs-0.8.3/tests/zfs-tests/tests/functional/alloc_class/alloc_class_004_pos (run as root) [00:08] [PASS]
Test: /home/jsr/src/zfs-0.8.3/tests/zfs-tests/tests/functional/alloc_class/alloc_class_005_pos (run as root) [00:12] [PASS]
Test: /home/jsr/src/zfs-0.8.3/tests/zfs-tests/tests/functional/alloc_class/alloc_class_006_pos (run as root) [00:02] [PASS]
Test: /home/jsr/src/zfs-0.8.3/tests/zfs-tests/tests/functional/alloc_class/alloc_class_007_pos (run as root) [00:14] [PASS]
Test: /home/jsr/src/zfs-0.8.3/tests/zfs-tests/tests/functional/alloc_class/alloc_class_008_pos (run as root) [00:14] [PASS]
Test: /home/jsr/src/zfs-0.8.3/tests/zfs-tests/tests/functional/alloc_class/alloc_class_009_pos (run as root) [00:33] [PASS]
Test: /home/jsr/src/zfs-0.8.3/tests/zfs-tests/tests/functional/alloc_class/alloc_class_010_pos (run as root) [00:05] [PASS]
Test: /home/jsr/src/zfs-0.8.3/tests/zfs-tests/tests/functional/alloc_class/alloc_class_011_neg (run as root) [00:01] [FAIL]


[  591.164271] Unable to handle kernel NULL pointer dereference at virtual address 00000030
[  591.164276] pgd = c0004000
[  591.164279] [00000030] *pgd=00000000
[  591.164285] Internal error: Oops: 5 [#1] SMP THUMB2
[  591.164797] Unable to handle kernel NULL pointer dereference at virtual address 00000030
[  591.169176] Modules linked in: zfs(PO) icp(PO) zlua(PO) zcommon(PO) zunicode(PO) znvpair(PO) zavl(PO) spl(O) lz4hc lz4hc_compress uas orion_wdt pwm_fan zram zsmalloc binfmt_misc
[  591.177306] pgd = c0004000
[  591.177307]  lm75 marvell_cesa
[  591.193175] [00000030] *pgd=00000000
[  591.195869]  nfsd ip_tables x_tables
[  591.195877] CPU: 0 PID: 1973 Comm: z_vdev_file Tainted: P           O    4.14.135-mvebu #205
[  591.202512] Hardware name: Marvell Armada 380/385 (Device Tree)
[  591.202514] task: ec0b0640 task.stack: eb436000
[  591.202531] PC is at spl_kmem_cache_alloc+0xb/0x664 [spl]
[  591.230631] LR is at zio_buf_alloc+0x20/0x58 [zfs]
[  591.235431] pc : [<bf8c34d8>]    lr : [<bfa899ed>]    psr: 000c0033
[  591.241710] sp : eb437e4c  ip : 913e9e13  fp : bf8cc58c
[  591.246945] r10: 600c0013  r9 : c0a03f88  r8 : 00000001
[  591.252180] r7 : 00000ff0  r6 : 00ff0000  r5 : daa9ede8  r4 : 00007f7f
[  591.258721] r3 : bfbdc220  r2 : 00000000  r1 : 00000004  r0 : 00000000
[  591.265262] Flags: nzcv  IRQs on  FIQs on  Mode SVC_32  ISA Thumb  Segment none
[  591.272587] Control: 50c5387d  Table: 2287404a  DAC: 00000051
[  591.278345] Process z_vdev_file (pid: 1973, stack limit = 0xeb436220)
[  591.284798] Stack: (0xeb437e4c to 0xeb438000)
[  591.289165] 7e40:                            00007f7f daa9ede8 00ff0000 00000ff0 00000001
[  591.297362] 7e60: ec8434d0 600c0013 bf8cc58c bfa899ed 00000000 00000003 00000000 eb437e98
[  591.305560] 7e80: 00000ff0 bf9ccc73 00000001 dc301a78 c0a03f88 d9a48400 eb725ec0 bfa46533
[  591.313757] 7ea0: 00000000 eb437ea4 eb437ea4 913e9e13 00000001 ec843480 ed1c24c0 d9a48700
[  591.321953] 7ec0: ffffe000 913e9e13 ec8434d0 ec843480 ed1c24c0 d9a48400 ffffe000 00000001
[  591.330150] 7ee0: ec8434d0 bf8c64ef ed1c24c8 ec8434fc bf8ccf4c c0a03f88 00000001 ec0b0640
[  591.338347] 7f00: c013b8b9 00000100 00000200 00000000 00000000 00000000 00000000 00000000
[  591.346544] 7f20: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  591.354741] 7f40: 00000000 00000000 00000000 00000000 00000000 ffffffff ffffffff 913e9e13
[  591.362938] 7f60: eb437f78 ed1c2c80 ed1c29c0 00000000 eb436000 ed1c24c0 bf8c6309 eb423c2c
[  591.371135] 7f80: ed1c2c9c c01353c1 ffffffff ed1c29c0 c01352cd 00000000 00000000 00000000
[  591.379331] 7fa0: 00000000 00000000 00000000 c01062f9 00000000 00000000 00000000 00000000
[  591.387528] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  591.395725] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[  591.403939] [<bf8c34d8>] (spl_kmem_cache_alloc [spl]) from [<ed1c24c0>] (0xed1c24c0)
[  591.411702] Code: f643 7988 f2cc 09a0 (6b03) b099
[  591.416505] Internal error: Oops: 5 [#2] SMP THUMB2
[  591.416552] ---[ end trace d9496f7e5d2d7fcb ]---
[  591.421391] Modules linked in: zfs(PO) icp(PO) zlua(PO) zcommon(PO) zunicode(PO) znvpair(PO) zavl(PO) spl(O) lz4hc lz4hc_compress uas orion_wdt pwm_fan zram zsmalloc binfmt_misc lm75 marvell_cesa nfsd ip_tables x_tables
[  591.445554] CPU: 1 PID: 25019 Comm: z_vdev_file Tainted: P      D    O    4.14.135-mvebu #205
[  591.454097] Hardware name: Marvell Armada 380/385 (Device Tree)
[  591.460029] task: eb7cf6c0 task.stack: d4fe4000
[  591.464581] PC is at spl_kmem_cache_alloc+0xb/0x664 [spl]
[  591.470181] LR is at zio_buf_alloc+0x20/0x58 [zfs]
[  591.474982] pc : [<bf8c34d8>]    lr : [<bfa899ed>]    psr: 00000033
[  591.481263] sp : d4fe5e4c  ip : 913e9e13  fp : bf8cc58c
[  591.486497] r10: 60000013  r9 : c0a03f88  r8 : 00000002
[  591.491732] r7 : 00000ba2  r6 : 00ba1c00  r5 : daa9e5f0  r4 : 00005d0d
[  591.498273] r3 : bfbdc220  r2 : 00000000  r1 : 00000004  r0 : 00000000
[  591.504814] Flags: nzcv  IRQs on  FIQs on  Mode SVC_32  ISA Thumb  Segment none
[  591.512139] Control: 50c5387d  Table: 2287404a  DAC: 00000051
[  591.517896] Process z_vdev_file (pid: 25019, stack limit = 0xd4fe4220)
[  591.524437] Stack: (0xd4fe5e4c to 0xd4fe6000)
[  591.528804] 5e40:                            00005d0d daa9e5f0 00ba1c00 00000ba2 00000002
[  591.537002] 5e60: ec8434d0 60000013 bf8cc58c bfa899ed 00000000 00000003 00000000 d4fe5e98
[  591.545199] 5e80: 00000ba2 bf9ccc73 00000001 dc303128 c0a03f88 d9a48a80 eb725ec0 bfa46533
[  591.553396] 5ea0: 00000000 d4fe5ea4 d4fe5ea4 913e9e13 ffffe000 ec843480 d9ced380 d9a48b80
[  591.561592] 5ec0: ffffe000 913e9e13 ec8434d0 ec843480 d9ced380 d9a48a80 ffffe000 00000002
[  591.569790] 5ee0: ec8434d0 bf8c64ef d9ced388 ec8434fc bf8ccf4c c0a03f88 00000000 eb7cf6c0
[  591.577985] 5f00: c013b8b9 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  591.586181] 5f20: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  591.594377] 5f40: 00000000 00000000 00000000 00000000 00000000 ffffffff ffffffff 913e9e13
[  591.602574] 5f60: d4fe5f78 eadbf780 d9cedd00 00000000 d4fe4000 d9ced380 bf8c6309 eca61e14
[  591.610770] 5f80: eadbf79c c01353c1 ffffffff d9cedd00 c01352cd 00000000 00000000 00000000
[  591.618966] 5fa0: 00000000 00000000 00000000 c01062f9 00000000 00000000 00000000 00000000
[  591.627163] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  591.635359] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[  591.643572] [<bf8c34d8>] (spl_kmem_cache_alloc [spl]) from [<d9ced380>] (0xd9ced380)
[  591.651336] Code: f643 7988 f2cc 09a0 (6b03) b099
[  591.656200] ---[ end trace d9496f7e5d2d7fcc ]---

@awehrfritz
Copy link
Contributor

Since @rdolbeau actually ran into a different issue on a Beaglebone with what I assume to be the branch of this PR, I also tested that one on the Helios4 and it works here as well.

What I did was:

git clone https://github.com/zfsonlinux/zfs.git
cd zfs
git remote add behlendorf https://github.com/behlendorf/zfs.git
git fetch behlendorf 
git checkout --track behlendorf/issue-9957 
./autogen.sh
./configure
make
make install

I had to fix some of the lib path manually since this will install zfs to /usr/local/, but I got that sorted eventually and I was able to load the zfs module, as well as create and destroy zfs dataset. modinfo zfs reports:

version:        0.8.0-577_g8268b7616

which is the git commit of the current branch.

I cannot comment on the actual changes made in this PR as I only have a very limited understanding of this, but since the tests I did were all successful, this would be good to go from my end.

@behlendorf
Copy link
Contributor Author

Thanks for posting the test results. In addition to the create/destroy testing it would be great if someone could run the channel program test cases from the ZFS Test Suite. You can run just these tests by running:

zfs-tests.sh -T channel_program

@jsrlabs we haven't seen the specific crash you reported when testing. It looks like it may be related to failed memory allocation which would be unrelated to this specific issue.

I'm not sure exactly what's causing the zlua: unknown relocation: 102 error either. I'd need to do some more reading up on ARM.

@jsrlabs
Copy link

jsrlabs commented Feb 10, 2020

I ran
zfs-tests.sh -T channel_program
and I got some successful runs, but the computer hung with a kernel oops:

Test: /home/jsr/src/zfs-0.8.3/tests/zfs-tests/tests/functional/channel_program/lua_core/setup (run as root) [00:01] [PASS]
Test: /home/jsr/src/zfs-0.8.3/tests/zfs-tests/tests/functional/channel_program/lua_core/tst.args_to_lua (run as root) [00:00] [PASS]
Test: /home/jsr/src/zfs-0.8.3/tests/zfs-tests/tests/functional/channel_program/lua_core/tst.divide_by_zero (run as root) [00:00] [PASS]
Test: /home/jsr/src/zfs-0.8.3/tests/zfs-tests/tests/functional/channel_program/lua_core/tst.exists (run as root) [00:00] [PASS]
Test: /home/jsr/src/zfs-0.8.3/tests/zfs-tests/tests/functional/channel_program/lua_core/tst.integer_illegal (run as root) [00:00] [PASS]
Test: /home/jsr/src/zfs-0.8.3/tests/zfs-tests/tests/functional/channel_program/lua_core/tst.integer_overflow (run as root) [00:00] [PASS]
Test: /home/jsr/src/zfs-0.8.3/tests/zfs-tests/tests/functional/channel_program/lua_core/tst.language_functions_neg (run as root) [00:02] [PASS]
Test: /home/jsr/src/zfs-0.8.3/tests/zfs-tests/tests/functional/channel_program/lua_core/tst.language_functions_pos (run as root) [00:00] [PASS]
Test: /home/jsr/src/zfs-0.8.3/tests/zfs-tests/tests/functional/channel_program/lua_core/tst.large_prog (run as root) [00:00] [PASS]

[   82.556130] spl: loading out-of-tree module taints kernel.
[   82.577741] zavl: module license 'CDDL' taints kernel.
[   82.577745] Disabling lock debugging due to kernel taint
[   84.246759] ZFS: Loaded module v0.8.3-1, ZFS pool version 5000, ZFS filesystem version 5
[  143.913171] Internal error: Oops: 5 [#1] SMP THUMB2
[  143.918061] Modules linked in: zfs(PO) icp(PO) zlua(PO) zcommon(PO) zunicode(PO) znvpair(PO) zavl(PO) spl(O) lz4hc lz4hc_compress zram zsmalloc uas orion_wdt binfmt_misc pwm_fan nfsd lm75 marvell_cesa ip_tables x_tables

... and that is all that one produced. A second run with tst.large_prog disabled still causes a hang (unresponsive to ping):

Warning: Test 'tst.large_prog' removed from TestGroup '/home/jsr/src/zfs-0.8.3/tests/zfs-tests/tests/functional/channel_program/lua_core' because it failed verification.
Test: /home/jsr/src/zfs-0.8.3/tests/zfs-tests/tests/functional/channel_program/lua_core/setup (run as root) [00:01] [PASS]
Test: /home/jsr/src/zfs-0.8.3/tests/zfs-tests/tests/functional/channel_program/lua_core/tst.args_to_lua (run as root) [00:00] [PASS]
Test: /home/jsr/src/zfs-0.8.3/tests/zfs-tests/tests/functional/channel_program/lua_core/tst.divide_by_zero (run as root) [00:00] [PASS]
Test: /home/jsr/src/zfs-0.8.3/tests/zfs-tests/tests/functional/channel_program/lua_core/tst.exists (run as root) [00:01] [PASS]
Test: /home/jsr/src/zfs-0.8.3/tests/zfs-tests/tests/functional/channel_program/lua_core/tst.integer_illegal (run as root) [00:00] [PASS]
Test: /home/jsr/src/zfs-0.8.3/tests/zfs-tests/tests/functional/channel_program/lua_core/tst.integer_overflow (run as root) [00:00] [PASS]
Test: /home/jsr/src/zfs-0.8.3/tests/zfs-tests/tests/functional/channel_program/lua_core/tst.language_functions_neg (run as root) [00:02] [PASS]
Test: /home/jsr/src/zfs-0.8.3/tests/zfs-tests/tests/functional/channel_program/lua_core/tst.language_functions_pos (run as root) [00:00] [PASS]

Here is the kernel oops for the second run: kernel_oops2.txt
I can keep trying tests during the week. What is the best way to run tests individually?

@behlendorf
Copy link
Contributor Author

What is the best way to run tests individually?

You can run individual test cases with the -t option. For example, to run the tst.large_prog test you can use:

$ zfs-tests.sh -t tests/functional/channel_program/lua_core/tst.large_prog
Test: .../functional/channel_program/lua_core/setup (run as root) [00:01] [PASS]
Test: .../functional/channel_program/lua_core/tst.large_prog (run as root) [00:00] [PASS]
Test: .../functional/channel_program/lua_core/cleanup (run as root) [00:00] [PASS]

Or you can edit the tests/runfiles/common.run file to only include the test cases you want to run. It would be great to determine if this is particular to a specific test case. The majority of our testing is done on x86 so it would be good to determine if this issue is related to this change.

@awehrfritz
Copy link
Contributor

The majority of our testing is done on x86 so it would be good to determine if this issue is related to this change.

The problem with that is that we cannot load the zfs module for version >= 0.8.2 prior to this fix (I assume this applies for all versions since 0.8.0).

I tried running the channel_program tests on zfs 0.7.12 (installed via the Debian packages), but it seems those test don't exist there. Also, running the tests that are available leads to a crash in another test. Here the results as far as it gets: zfs-test-0.7.12-crash.txt

Should the channel_program tests be available in zfs 0.7.12?

Any suggestions on how to approach this issue?

Please let me know if there are any further information that I should provide to debug any of this further.

@behlendorf
Copy link
Contributor Author

Should the channel_program tests be available in zfs 0.7.12?

No they should not. Channel programs were a feature introduced in the 0.8 release, this is why the 0.7.x tags are unaffected. That's something I probably should have mentioned earlier.

Since all of the tests were tuned for an x86 system I'm not too surprised you ran in to some issues. A good place to start would be to run the master and its tests on a non-THUMB2 ARM system and see how it does. That should give us a baseline as to what to expect, and we can go from there.

@jsrlabs
Copy link

jsrlabs commented Feb 11, 2020

Of the channel_program tests, only the tst.lib_coroutine.lua part of tst.libraries.ksh is failing on the Helios4. This crash doesn't leave significant logs behind. I'm open to more suggestions.

Out of time for today. For a non-THUMB2 baseline, I have a Raspberry Pi that I could try to run zfs-0.8.X (right now running 0.7.12), but I can't get to it for at maybe a week.

@awehrfritz
Copy link
Contributor

I took a stab at running the tests on a Raspberry Pi 1 Model B with a ARMv6 CPU that only features the thumb instructions. All tests are done on a fully updated Raspbian Linux 10 (Debian buster) with kernel version 4.19.97+.

Just to note right away, doing anything on that system is painfully slow, and the zfs compilation takes more than 2h - not the fastest CPU around.

All tests are done so far with zfs master as of 17 Feb, git commit fb63fc0, and I installed zfs by doing:

    ./autogen.sh
    ./configure --prefix=/ \
                --libdir=/lib \
                --includedir=/usr/include \
                --datarootdir=/usr/share
    make -s -j1
    sudo make install

Running the tests wasn't all that successful:

  1. I naively ran /usr/share/zfs/zfs-tests.sh and test functional/alloc_class/alloc_class_011_neg failed. There appears to be a memory access error, but I cannot confirm that this is related to the failing test. The output and respective dmesg are here: https://gist.github.com/awehrfritz/061b18d73978a80078954336bb6905be
  2. Commenting out that test in runfiles/common.run leads to the same error, indicating that it is the following test that leads to the memory access error, though it does not print which test is failing. The output and respective dmesg are here: https://gist.github.com/awehrfritz/333691dfa179f8b8360bc94531f99a4f
  3. Running only the channel program tests didn't yield anything, i.e. no output from the tests and it seems nothing is happening (also dmesg doesn't show anything). I killed the process after more than 6h and here is what I get: https://gist.github.com/awehrfritz/5e8e8a66beb8e4c0bdc82c84d3a7cb26

Am I doing something wrong here or is this just expected behaviour for these CPUs? I would appreciate any hint on how to proceed.

@awehrfritz
Copy link
Contributor

Just wanted to follow up to see if anyone would have any further input on this. I kinda hit a roadblock with the hardware I have access to.

@jsrlabs did you get a chance to look into this?

This bug prevents me (and possibly others) from upgrading to the 0.8-branch and given that many distributions (including Debian) come with the zfs 0.8, it would be good to get this sorted out somehow.

Given the age (and thus suitability for ZFS) of non-THUMB2 ARM systems, I am after all not quite sure if we should really care about those systems at all. But then again, my experience with this is very limited. Maybe @behlendorf or @rdolbeau have some more insight?

@behlendorf
Copy link
Contributor Author

Apologies I lost track of this PR. It sounds like the limited manual testing which was done does verify this change is working properly (except on the beaglebone system). It's unfortunate we weren't able to run the ZFS Test Suite, but the logs do clearly show that's due to an entirely different issue. We've never really invested the effort in getting it working reliably on 32-bit system so that's not to surprising.

@awehrfritz @jsrlabs if you can confirm that this PR resolves the issue with your hardware I'm open to merging it even though we couldn't get a full test run.

When a Thumb-2 kernel is being used, then longjmp must be implemented
using the Thumb-2 instruction set in module/lua/setjmp/setjmp_arm.S.

Original-patch-by: @jsrlabs
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#9957
@jsrlabs
Copy link

jsrlabs commented Apr 28, 2020

@awehrfritz @jsrlabs if you can confirm that this PR resolves the issue with your hardware I'm open to merging it even though we couldn't get a full test run.

Yes, it does run successfully on my hardware: both the helios4 and a Raspberry Pi 2 are able to use zfs filesystems I had created previously. I also just ran the channel_tests on both, and the only ones that fail are the tst.lib_coroutine.lua part of tst.libraries.ksh (crash mentioned previously), and also tst.memory_limit.

@jsrlabs
Copy link

jsrlabs commented Apr 28, 2020

FWIW, the channel_program/lua_core/tst.memory_limit also fails on my Raspberry Pi 2 with vanilla zfs-0.8.3.

Is there additional debugging I should do?

@behlendorf
Copy link
Contributor Author

behlendorf commented Apr 28, 2020

Is there additional debugging I should do?

No need. It sounds like it's easily reproducible in this environment.

@behlendorf behlendorf added the Status: Accepted Ready to integrate (reviewed, tested) label Apr 28, 2020
@codecov-io
Copy link

codecov-io commented Apr 28, 2020

Codecov Report

Merging #9967 into master will increase coverage by 0.14%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9967      +/-   ##
==========================================
+ Coverage   79.43%   79.57%   +0.14%     
==========================================
  Files         389      389              
  Lines      123047   123047              
==========================================
+ Hits        97739    97917     +178     
+ Misses      25308    25130     -178     
Flag Coverage Δ
#kernel 80.03% <ø> (+0.10%) ⬆️
#user 66.21% <ø> (+0.51%) ⬆️
Impacted Files Coverage Δ
module/os/linux/spl/spl-kmem-cache.c 75.22% <0.00%> (-9.59%) ⬇️
cmd/zvol_id/zvol_id_main.c 76.31% <0.00%> (-5.27%) ⬇️
module/zfs/vdev_raidz_math.c 76.57% <0.00%> (-2.26%) ⬇️
module/zfs/zil.c 91.90% <0.00%> (-1.22%) ⬇️
cmd/zdb/zdb.c 80.69% <0.00%> (-1.18%) ⬇️
module/zfs/bptree.c 92.70% <0.00%> (-1.05%) ⬇️
module/zcommon/zfs_uio.c 87.75% <0.00%> (-1.03%) ⬇️
cmd/zed/agents/zfs_mod.c 77.55% <0.00%> (-0.67%) ⬇️
module/zfs/dsl_pool.c 94.51% <0.00%> (-0.64%) ⬇️
module/zfs/bpobj.c 90.61% <0.00%> (-0.54%) ⬇️
... and 49 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 89a6610...1efaa86. Read the comment docs.

@rdolbeau
Copy link
Contributor

I'm trying again on the beaglebone just to see what happens (might take a while to compile...)

Copy link
Contributor

@awehrfritz awehrfritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm that this worked on my helios4 and raspberry pi 1. I don't have the hardware handy at the moment to run more tests though.

@rdolbeau
Copy link
Contributor

Running 5.3.1-bone9 from http://repos.rcn-ee.com/debian/ (newer than before), and master + the issue-9957 branch merged in, I still get the 'unknown relocation' error:

dolbeau@beaglebone:~$ sudo dmesg |grep -C3 lua | head -5
[ 8199.298769] spl: loading out-of-tree module taints kernel.
[ 8199.475453] zavl: module license 'CDDL' taints kernel.
[ 8199.475470] Disabling lock debugging due to kernel taint
[ 8199.861836] zlua: unknown relocation: 102
[ 8201.049190] zfs: Unknown symbol lua_isstring (err -2)

Without the merge, I still get zlua: section 3 reloc 34 sym 'longjmp': unsupported interworking call (Thumb -> ARM)

Weird as it works for everybody else... Unfortunately I don't have a newer Beaglebone (this is an A6 White, 720 MHz, almost as old as it gets - my A3 died a long time ago - with a Cortex-A8 r3p2) to test the same kernel on different hardware.

@behlendorf behlendorf merged commit ab16c87 into openzfs:master Apr 30, 2020
@awehrfritz
Copy link
Contributor

Thanks for merging this @behlendorf!

Now that 0.8.4 is out, would it be possible to include this in 0.8.5? (I’m not sure what the process/requirements are to nominate this for a patch release, thus I just thought to ask here.)

@behlendorf
Copy link
Contributor Author

You're welcome. Now that it's merged to master this is something we'll consider for 0.8.5.

@awehrfritz
Copy link
Contributor

Great, thanks!

awehrfritz pushed a commit to awehrfritz/zfs that referenced this pull request Aug 18, 2020
When a Thumb-2 kernel is being used, then longjmp must be implemented
using the Thumb-2 instruction set in module/lua/setjmp/setjmp_arm.S.

Original-patch-by: @jsrlabs
Reviewed-by: @awehrfritz
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#7408 
Closes openzfs#9957 
Closes openzfs#9967
@awehrfritz awehrfritz mentioned this pull request Aug 18, 2020
12 tasks
tonyhutter pushed a commit that referenced this pull request Aug 19, 2020
When a Thumb-2 kernel is being used, then longjmp must be implemented
using the Thumb-2 instruction set in module/lua/setjmp/setjmp_arm.S.

Original-patch-by: @jsrlabs
Reviewed-by: @awehrfritz
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7408 
Closes #9957 
Closes #9967
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Sep 15, 2020
When a Thumb-2 kernel is being used, then longjmp must be implemented
using the Thumb-2 instruction set in module/lua/setjmp/setjmp_arm.S.

Original-patch-by: @jsrlabs
Reviewed-by: @awehrfritz
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#7408 
Closes openzfs#9957 
Closes openzfs#9967
jsai20 pushed a commit to jsai20/zfs that referenced this pull request Mar 30, 2021
When a Thumb-2 kernel is being used, then longjmp must be implemented
using the Thumb-2 instruction set in module/lua/setjmp/setjmp_arm.S.

Original-patch-by: @jsrlabs
Reviewed-by: @awehrfritz
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#7408 
Closes openzfs#9957 
Closes openzfs#9967
@behlendorf behlendorf deleted the issue-9957 branch April 19, 2021 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested) Type: Architecture Indicates an issue is specific to a single processor architecture Type: Building Indicates an issue related to building binaries
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants