Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't use Vere 3.0 on Raspberry Pi #620

Open
bazfum opened this issue Mar 12, 2024 · 8 comments
Open

Can't use Vere 3.0 on Raspberry Pi #620

bazfum opened this issue Mar 12, 2024 · 8 comments

Comments

@bazfum
Copy link

bazfum commented Mar 12, 2024

I have two piers on a Pi 4 4GB, a planet and a comet. The planet had issues with chop previously and would not upgrade until I moved it to my Mac. It ran on the Mac before I moved it back to the Pi. The comet did upgrade fine on the Pi. However, neither will run now on the Pi, with the same error:

rbit 3.0
boot: home is redacted
disk: loaded epoch 0i795477796
loom: mapped 8192MB
boot: protected loom
live: mapped: GB/1.378.074.624
live: loaded: KB/16.384
boot: installed 967 jets
loom: external fault: 0x10 (0x200000000 : 0x400000000)

Assertion '0' failed in pkg/noun/manage.c:1791

bail: oops
home: bailing out
Aborted (core dumped)

@joemfb
Copy link
Member

joemfb commented Mar 12, 2024

@bazfum this looks like a null pointer dereference somewhere in startup (address 0x10 is a 16-byte offset from NULL). The best way to track this down would be to reproduce it in a debugger. Can you try to start this pier inside of gdb and capture a backtrace?

gdb --args ....normal command to restart urbit....
handle SIGSEGV nostop noprint
b manage.c:1791
continue
... wait for crash ...
bt

@midden-fabler
Copy link
Contributor

I wasn't able to replicate on a RPi 4 8GB. Chop and upgrade worked fine.

@bazfum
Copy link
Author

bazfum commented Mar 12, 2024

This is on 64-bit Bullseye if that makes a difference.

When I ran it in GDB, I get this:

Program received signal SIGILL, Illegal instruction.
0x0000000000608a88 in _armv8_pmull_probe ()

The backtrace gives:
#0 0x0000000000608a88 in _armv8_pmull_probe ()
#1 0x00000000004039ac in OPENSSL_cpuid_setup ()
#2 0x0000000000749084 in __libc_start_init ()
#3 0x00000000007490ac in libc_start_main_stage2 ()

It then repeats that same line #3 until I kill it.

@joemfb
Copy link
Member

joemfb commented Mar 12, 2024

Apparently SIGILL is normal during openssl setup on arm, see https://stackoverflow.com/questions/25708907/ssl-library-init-cause-sigill-when-running-under-gdb.

Can you try again, first setting handle SIGILL nostop noprint to let the library generate and catch that exception?

@bazfum
Copy link
Author

bazfum commented Mar 12, 2024

loom: external fault: 0x10 (0x200000000 : 0x400000000)

Breakpoint 1, u3m_fault (ser_i=, adr_v=)
at pkg/noun/manage.c:1791
1791 pkg/noun/manage.c: No such file or directory.

(gdb) bt
#0 u3m_fault (ser_i=, adr_v=)
at pkg/noun/manage.c:1791
#1 u3m_fault (adr_v=, ser_i=)
at pkg/noun/manage.c:1776
#2 0x0000000000748530 in sigsegv_handler ()
#3
#4 0x000000000074ae70 in get_meta ()
#5 0x000000000074b27c in __libc_free ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

@joemfb
Copy link
Member

joemfb commented Mar 15, 2024

@bazfum sorry for the delay. I'm not sure what to make of this trace. It looks like it might be a double free, or trying to free a pointer into the stack. But it also might just be arbitrary heap corruption -- all bets are off. We have not been able to reproduce this crash.

I think this will require interactive debugging. I'd be happy to join a video call next week and try to find the root cause we can coordinate on urbit (I'm ~master-morzod) or over email (joe at urbit.org). Alternately, I could send you a binary with lots of extra printfs during initialization, that might help narrow it down.

@bazfum
Copy link
Author

bazfum commented Mar 16, 2024

I replied via email. let me know if you didn't get it.

@bazfum
Copy link
Author

bazfum commented Mar 19, 2024

FWIW I ended up grabbing a used mini PC and moving my pier over, everything is happy on the new system. I'd been debating moving off the Pi before all this, so no worries if it's not worth anyones time to troubleshoot on the Pi.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants