-
-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node segfault (error 139) in container release v11.x.x #697
Comments
Please provide the logs from the container with verbose logging enabled. If you have a docker-compose file that would be helpful as well. |
Did you report this yesterday by any chance on the FoundryVTT discord? I'm seeing someone with the same error here: https://discord.com/channels/170995199584108546/486930822465716249/1111373639321911427 The issue most likely not a container bug, as Foundry was installed and successfully launched. Are you sure there are no other processes / containers running with that same |
Negative, I'm not on the FoundryVTT server - I'd be uberDoward on there as well, though... This folder is dedicated to only FoundryVTT data, and this is the only container with access to that folder :) edit - Opened foundryvtt/foundryvtt#9482 as well. I do agree this is likely an internal bug to FoundryVTT, but I wanted to check in here, first. |
Ok then if that's not you... good news then if you believe that miserly loves company.
I saw your issue that you opened to the foundryvtt project. Just be aware that there is a very strong container phobia with the support team. To the point that they will refuse to support any containerized foundry debugging. 🤷 I can't think of any solutions or tests that you haven't already run. Can you think of anything novel about your setup or configuration? |
Nothing novel, beyond the basic binding of the volume for the host storage. I'll see if they reply, and I'll go ahead and see if I can re-create it outside the containerization just to quell that discussion ahead of time... Appreciate the heads up! |
Ok, so noticed something potentially of interest. Node's dying with exit code 139 (SIGSEGV) - and when I went to launch using the same data config from the straight node FoundryVTT install, all worked fine. As part of trying to get it to run, though, I had to get GCC 3.4.29 - and since I'm running on Bullseye (Raspian) I had to upgrade to unstable (Bookworm) - then I could run FoundryVTT from node non-containerized perfectly fine. What GLIBCXX is available within the container? I'm trying to hit it with /bin/sh so I can check, but now that I migrated the default world, I'm just spinning and crashing the container lol. I'll keep digging... Edit - Here's the actual log where I go to start the world (now post-migration):
BTW, can I get an invite to the discord channel? Edit Part Deux - My concentration on the lock file was just a symptom. What's happening here, is that upon launch of a world, Node is crashing with exit code 139 (SIGSEGV), then the container restarts. Upon startup, the lock file still exists from the prior crash and that is what I was seeing originally. So the "digging" really needs to be understanding the first 139 exit code from Node... Note - Node 18.16.0 is what I used from the non-containerized test, and that matches what's claimed in the container as well. Really thinking this may be a glibc issue... |
Attach latest log - created a completely fresh image. Running on a raspberry pi4, btw - so this is ARM, rather than x86 based. Completely new container won't launch any world when running in a container, but works fine running directly from my RPi4 host. I'm really leaning towards a container issue |
do you have any issues installing/updating modules too? |
Click |
FWIW Here is the info for the two machines I normally run Foundry on: ➤ uname -a
Linux 5.15.84-v8+ #1613 SMP PREEMPT Thu Jan 5 12:03:08 GMT 2023 aarch64 GNU/Linux
➤ cat /etc/issue.net
Debian GNU/Linux 11
➤ grep Model /proc/cpuinfo
Model : Raspberry Pi 4 Model B Rev 1.4 ➤ uname -a
Darwin 22.5.0 Darwin Kernel Version 22.5.0: Mon Apr 24 20:52:24 PDT 2023; root:xnu-8796.121.2~5/RELEASE_ARM64_T6000 arm64
➤ sw_vers
ProductName: macOS
ProductVersion: 13.4
BuildVersion: 22F66
➤ sysctl -n machdep.cpu.brand_string
Apple M1 Ultra And all continuous integration tests are run on |
I haven't been able to reproduce this yet. I do have a few ideas that could help diagnose the cause.
Something like this: docker run --rm -it node:18-alpine3.18 /bin/sh cd /tmp
npm install diskdb
cat <<EOF > test.js
const db = require('diskdb');
db.connect('/tmp', ['test']);
const data = [];
for(let i = 0; i < 1000000; i++) {
data.push({ number: i });
}
db.test.save(data);
console.log(db.test.find());
EOF
node test.js cat <<EOF > test-memory.js
let array = [];
setInterval(() => {
for(let i = 0; i < 1000000; i++) {
array.push("This is some text.");
}
const memoryUsage = process.memoryUsage();
console.log("Memory usage: " + JSON.stringify(memoryUsage, null, 2) + " bytes");
}, 1000);
EOF
node test-memory.js |
I just published the In addition to the previous tests I suggested above, please try pulling See: |
I wanted to chime in that I'm having the exact same issue as @DasOcko on my Raspberry Pi 4, with a completely fresh install using 11.300.0. The container crashes with a segmentation fault and exit code 139 when trying to load a just created world. @felddy I've also tried running the example code above to try and segfault node in the default alpine container using the disk and memory test. The disk test did not fail, but the memory test ended up crashing with this error:
|
@mxvs That is the "good" outcome of the memory test. It ran out of memory and got a |
I just build a version of the image using Node 16 instead of 18. When you get a chance please pull See: |
I just ran the modified container and it unfortunately still segfaults as soon as I try to load the world :-( |
Ok. That gives us more data. Thank you for helping run this down. I wish I could reproduce it myself. I"ve got another idea. Let's start up the last version of the image that was working, but request Foundry version So that would mean starting Normally this would work fine. You'd just have a mismatch between the container and Foundry which means you might not have access to specifying some options via environment variables. You will see a warning like this in the logs: foundryvtt-mine-foundry-1 | Entrypoint | 2023-06-08 21:21:10 | [warn] FOUNDRY_VERSION has been manually set and does not match the container's version.
foundryvtt-mine-foundry-1 | Entrypoint | 2023-06-08 21:21:10 | [warn] Expected 10.291 but found 11.300
foundryvtt-mine-foundry-1 | Entrypoint | 2023-06-08 21:21:10 | [warn] The container may not function properly with this version mismatch. I can confirm that this test works on my side. If you still see the segfault then we know it's something specific to Foundry v11. Then the debugging fun really begins. ;) |
Alright, I just tried the above both with 11.300 in the 10.291.1 container and the just released 11.301 and in both instances the segfault is still thrown when trying to open the world. So its something specific to v11 and not the container.. |
Thanks for testing. I think this is the most telling test results so far.
I'll keep working on this. Thank you again for the test help. |
So, to be absolutely certain, I wanted to try running things in a barebones container, install Foundry V11 manually and see if it would still segfault. The short version is that it works just fine this way. This might indicate it might not be a Foundry bug after all, or at least there is some weird interaction going on with the way the container is set-up as I got it to work in another container configuration on the same hardware. So here is what I did:
So its definitely possible to run Foundry V11 in a container on a Raspberry PI 4.. not sure if this makes it easier or harder now, but I feel this takes us a step closer to a solution.. |
@felddy I tried two more things (tell me when to stop):
So on the Raspberry PI 4 hardware, running Foundry v11 works with an Ubuntu based image but not with Alpine. Is there a way we could get a release of your container setup based on another distro than Alpine for this case or would it be possible for me to modify and build a version myself? Thanks |
Do you have a 32-bit version of Raspbian installed on your 64-bit Raspberry Pi? I'm trying to think of why you would be pulling an |
Hey that's a great point @mcdonnnj - My setup was running 64bit raspbian on 64bit raspberry pi 4 and refused to work. |
@uberDoward Have you tried the image tag that @felddy created for working on this issue ( |
@mcdonnnj - not yet. Work has been killing me, and I had a bit of a rough zfs pool migration that JUST finished on the home lab. I've been utterly exhausted and why I haven't participated after I booted FoundryVTT-docker from the ARM64 RPi4 over to the x86 server LOL - I will try to try it tonight, but not making promises*, my wife just went back into the hospital this morning and I'm running the house as well... |
I gotta cook dinner BUT I am fundamentally incapable of leaving a problem alone. It's a weakness. Here's what I got:
Using
|
I think we've found the culprits: |
That jives with my gut instinct of this being a GLIBC issue. If I get any free time (HAHAHA), I'll try rebuilding the dependencies from source. My home lab is Gentoo, and my day job is a senior software engineer. I'm familiar with building code, lol |
Yes, I'm running the 32 bit OS, when I first set-up the PI4, the 64 bit version was still in beta and even today on their website the 32 bit version is mentioned as "Our recommended operating system for most users." |
As a workaround for this issue you can try the following:
In your container logs you will see lines similar to this after Foundry is installed but before it starts: ...
foundryvtt-foundry-1 | Entrypoint | 2023-06-19 17:51:50 | [info] Using CONTAINER_PATCHES: /data/container_patches
foundryvtt-foundry-1 | Entrypoint | 2023-06-19 17:51:50 | [info] Container patches directory detected. Starting patch application...
foundryvtt-foundry-1 | Entrypoint | 2023-06-19 17:51:50 | [info] Sourcing patch from file: /data/container_patches/issue-697.sh
foundryvtt-foundry-1 | Entrypoint | 2023-06-19 17:51:50 | [info] Applying "Fix for issue #697"
foundryvtt-foundry-1 | Entrypoint | 2023-06-19 17:51:50 | [info] See: https://github.com/felddy/foundryvtt-docker/issues/697
foundryvtt-foundry-1 | fetch https://dl-cdn.alpinelinux.org/alpine/v3.18/main/aarch64/APKINDEX.tar.gz
foundryvtt-foundry-1 | fetch https://dl-cdn.alpinelinux.org/alpine/v3.18/community/aarch64/APKINDEX.tar.gz
foundryvtt-foundry-1 | (1/29) Installing libstdc++-dev (12.2.1_git20220924-r10)
foundryvtt-foundry-1 | (2/29) Installing zstd-libs (1.5.5-r4)
foundryvtt-foundry-1 | (3/29) Installing binutils (2.40-r7)
foundryvtt-foundry-1 | (4/29) Installing libgomp (12.2.1_git20220924-r10)
foundryvtt-foundry-1 | (5/29) Installing libatomic (12.2.1_git20220924-r10)
...
foundryvtt-foundry-1 | Entrypoint | 2023-06-19 17:52:11 | [info] Completed file patching.
... This should apply the steps as documented in the associated issue here: Please let me know how this works. If it succeeds I can add this file to the repo's patch library for others to use. |
It doesn't run, you get:
I think its because the container behind |
Sorry I wasn't clear. You should use the standard releases. e.g., |
This fixed the segfault I was getting on a pi4, thank you! (to be clear, I applied the changes to alpine, not the debian branch.) |
I too can confirm that this works on my Raspberry Pi 4 and solves the segfault 🥳 I will note that this increases the startup time of the container to a little over two minutes on the Pi 4 hardware due to the compilation step, perhaps a useful disclaimer to mention when publishing the patch. Thanks for all the efforts to resolve this issue! |
This issue has been automatically marked as stale because it has been inactive for 28 days. To reactivate the issue, simply post a comment with the requested information to help us diagnose this issue. If this issue remains inactive for another 7 days, it will be automatically closed. |
Apologies for the delayed response - I'll test as well this weekend on my rpi4 |
I was having this problem and your solution worked for my system. ProblemWhen upgrading or launching a world, foundry crashed with the follow logs: Segmentation fault (core dumped)
Launcher | 2023-05-27 02:44:37 | [error] Node process exited with code 139 SolutionExactly as described here: #697 (comment) My system$ uname -a
Linux raspberrypi 5.15.84-v7l+ #1613 SMP Thu Jan 5 12:01:26 GMT 2023 armv7l GNU/Linux
$ cat /etc/issue.net
Raspbian GNU/Linux 11
$ grep Model /proc/cpuinfo
Model : Raspberry Pi 4 Model B Rev 1.5 |
Exact same issue for me, except here was my System:
Thank you @felddy! |
This issue has been automatically marked as stale because it has been inactive for 28 days. To reactivate the issue, simply post a comment with the requested information to help us diagnose this issue. If this issue remains inactive for another 7 days, it will be automatically closed. |
This issue has been automatically closed due to inactivity. If you are still experiencing problems, please open a new issue. |
Add hotfix for issue #697 - v11 database glibc workaround
I'd added a patch file for this issue. This allows you to specify the patch using an environment variable instead of creating a patch file manually. See: e.g.; ---
version: "3.8"
services:
foundry:
image: felddy/foundryvtt:release
hostname: my_foundry_host
volumes:
- type: bind
source: <your_data_dir>
target: /data
environment:
- CONTAINER_PATCH_URLS=
https://raw.githubusercontent.com/felddy/foundryvtt-docker/develop/patches/hotfix_issue_697.sh
- FOUNDRY_PASSWORD=<your_password>
- FOUNDRY_USERNAME=<your_username>
- FOUNDRY_ADMIN_KEY=atropos
ports:
- target: 30000
published: 30000
protocol: tcp |
Bug description
I can neither migrate old data nor can I start a new world. I always get the same error:
The data is stored on the host system as a bind volume under a directory. All was well with V10.
I attempted to 'reset' all the data as well, by stopping the container, moving the data to a new folder, re-creating the old folder completely empty, and creating a brand new world. The error logs above are from that last attempt.
I have tried removing the options.json.lock directory (isn't this supposed to be a file, not a directory, btw?), recycling the container, to no avail. I suspect this may be a bug in foundry itself, but I'm not certain, and figured anyone else running into this issue from this container may find their way here...
Steps to reproduce
Expected behavior
I expect the world to launch without error.
Container metadata
Relevant log output
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: