Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kotlin.IllegalStateException: Process descriptor file table is full after many connection attempts #118

Open
ekzyis opened this issue Oct 11, 2024 · 3 comments

Comments

@ekzyis
Copy link

ekzyis commented Oct 11, 2024

Hello, I have been running phoenixd on a VPS for a while now and I noticed that it keeps crashing after some time with the following error (assuming I remember the error last time correctly):

Uncaught Kotlin exception: kotlin.IllegalStateException: Process descriptor file table is full
    at 0   phoenixd                            0x55e680           kfun:kotlin.Throwable#<init>(kotlin.String?){} + 32
    at 1   phoenixd                            0x55bece           kfun:kotlin.Exception#<init>(kotlin.String?){} + 30
    at 2   phoenixd                            0x55bf8e           kfun:kotlin.RuntimeException#<init>(kotlin.String?){} + 30
    at 3   phoenixd                            0x55c1ae           kfun:kotlin.IllegalStateException#<init>(kotlin.String?){} + 30
    at 4   phoenixd                            0x4b05bb           kfun:io.ktor.network.sockets.TCPServerSocketNative.$acceptCOROUTINE$0.invokeSuspend#internal + 1371
    at 5   phoenixd                            0x4b09ae           kfun:io.ktor.network.sockets.TCPServerSocketNative#accept#suspend(kotlin.coroutines.Continuation<io.ktor.network.sockets.Sock
et>){}kotlin.Any + 126
    at 6   phoenixd                            0x4b3ced           kfun:io.ktor.network.sockets.ServerSocket#accept#suspend(kotlin.coroutines.Continuation<io.ktor.network.sockets.Socket>){}kot
lin.Any-trampoline + 93
    at 7   phoenixd                            0xb524a4           kfun:io.ktor.server.cio.backend.$httpServer$lambda$3COROUTINE$0.invokeSuspend#internal + 2820
    at 8   phoenixd                            0x5dfc69           kfun:kotlin.coroutines.native.internal.BaseContinuationImpl#invokeSuspend(kotlin.Result<kotlin.Any?>){}kotlin.Any?-trampoline
 + 25
    at 9   phoenixd                            0x560300           kfun:kotlin.coroutines.native.internal.BaseContinuationImpl#resumeWith(kotlin.Result<kotlin.Any?>){} + 224
    at 10  phoenixd                            0x5dfcf5           kfun:kotlin.coroutines.Continuation#resumeWith(kotlin.Result<1:0>){}-trampoline + 85
    a
    at 11  phoenixd                            0x63a5ea           kfun:kotlinx.coroutines.DispatchedTask#run(){} + 634                                                                 [0/1935]
    at 12  phoenixd                            0x64be9d           kfun:kotlinx.coroutines.Runnable#run(){}-trampoline + 77
    at 13  phoenixd                            0x649149           kfun:kotlinx.coroutines.MultiWorkerDispatcher.$workerRunLoop$lambda$2COROUTINE$0.invokeSuspend#internal + 777
    at 14  phoenixd                            0x5dfc69           kfun:kotlin.coroutines.native.internal.BaseContinuationImpl#invokeSuspend(kotlin.Result<kotlin.Any?>){}kotlin.Any?-trampoline
 + 25
    at 15  phoenixd                            0x560300           kfun:kotlin.coroutines.native.internal.BaseContinuationImpl#resumeWith(kotlin.Result<kotlin.Any?>){} + 224
    at 16  phoenixd                            0x5dfcf5           kfun:kotlin.coroutines.Continuation#resumeWith(kotlin.Result<1:0>){}-trampoline + 85
    at 17  phoenixd                            0x63a5ea           kfun:kotlinx.coroutines.DispatchedTask#run(){} + 634
    at 18  phoenixd                            0x64be9d           kfun:kotlinx.coroutines.Runnable#run(){}-trampoline + 77
    at 19  phoenixd                            0x60f4a8           kfun:kotlinx.coroutines.EventLoopImplBase#processNextEvent(){}kotlin.Long + 168
    at 20  phoenixd                            0x64bd53           kfun:kotlinx.coroutines.EventLoop#processNextEvent(){}kotlin.Long-trampoline + 19
    at 21  phoenixd                            0x645f07           kfun:kotlinx.coroutines.BlockingCoroutine.joinBlocking#internal + 167
    at 22  phoenixd                            0x645676           kfun:kotlinx.coroutines#runBlocking(kotlin.coroutines.CoroutineContext;kotlin.coroutines.SuspendFunction1<kotlinx.coroutines.
CoroutineScope,0:0>){0§<kotlin.Any?>}0:0 + 598
    at 23  phoenixd                            0x64576e           kfun:kotlinx.coroutines#runBlocking$default(kotlin.coroutines.CoroutineContext?;kotlin.coroutines.SuspendFunction1<kotlinx.co
routines.CoroutineScope,0:0>;kotlin.Int){0§<kotlin.Any?>}0:0 + 94
    at 24  phoenixd                            0x64854d           kfun:kotlinx.coroutines.MultiWorkerDispatcher.workerRunLoop#internal + 109
    at 25  phoenixd                            0x648cc5           kfun:kotlinx.coroutines.MultiWorkerDispatcher.<init>$lambda$1$lambda$0#internal + 21
    at 26  phoenixd                            0x649526           kfun:kotlinx.coroutines.MultiWorkerDispatcher.$<init>$lambda$1$lambda$0$FUNCTION_REFERENCE$5.invoke#internal + 22
    at 27  phoenixd                            0x649579           kfun:kotlinx.coroutines.MultiWorkerDispatcher.$<init>$lambda$1$lambda$0$FUNCTION_REFERENCE$5.$<bridge-UNN>invoke(){}#internal
 + 25
    at 28  phoenixd                            0x5de1bb           kfun:kotlin.Function0#invoke(){}1:0-trampoline + 75
    at 29  phoenixd                            0x565379           WorkerLaunchpad + 57
    at 30  phoenixd                            0x5f979f           _ZN6Worker19processQueueElementEb + 927
    at 31  phoenixd                            0x5f93bc           _ZN12_GLOBAL__N_113workerRoutineEPv + 76
    at 32  libc.so.6                           0x7f55b70c69ea     0x0 + 140006119991786
    at 33  libc.so.6                           0x7f55b714a7cb     0x0 + 140006120531915
    Suppressed: kotlinx.coroutines.internal.DiagnosticCoroutineContextException: [LazyStandaloneCoroutine{Cancelling}@b2abb600, DefaultDispatcher@b422fce0]
        at 0   phoenixd                            0x55e680           kfun:kotlin.Throwable#<init>(kotlin.String?){} + 32
        at 1   phoenixd                            0x55bece           kfun:kotlin.Exception#<init>(kotlin.String?){} + 30
        at 2   phoenixd                            0x55bf8e           kfun:kotlin.RuntimeException#<init>(kotlin.String?){} + 30
        at 3   phoenixd                            0x649da5           kfun:kotlinx.coroutines.internal.DiagnosticCoroutineContextException#<init>(kotlin.coroutines.CoroutineContext){} + 85
        at 4   phoenixd                            0x638ada           kfun:kotlinx.coroutines.internal#handleUncaughtCoroutineException(kotlin.coroutines.CoroutineContext;kotlin.Throwable){}
+ 314
        at 5   phoenixd                            0x60d2dc           kfun:kotlinx.coroutines#handleCoroutineException(kotlin.coroutines.CoroutineContext;kotlin.Throwable){} + 124
        at 6   phoenixd                            0x607af5           kfun:kotlinx.coroutines.StandaloneCoroutine.handleJobException#internal + 85
        at 7   phoenixd                            0x64ca63           kfun:kotlinx.coroutines.JobSupport#handleJobException(kotlin.Throwable){}kotlin.Boolean-trampoline + 19
        at 8   phoenixd                            0x611e7c           kfun:kotlinx.coroutines.JobSupport.finalizeFinishingState#internal + 316
        at 9   phoenixd                            0x615be7           kfun:kotlinx.coroutines.JobSupport.continueCompleting#internal + 119
        at 10  phoenixd                            0x6170ed           kfun:kotlinx.coroutines.JobSupport.ChildCompletion.invoke#internal + 141
        at 11  phoenixd                            0x64caf3           kfun:kotlinx.coroutines.JobNode#invoke(kotlin.Throwable?){}-trampoline + 19
        at 12  phoenixd                            0x612cb7           kfun:kotlinx.coroutines.JobSupport.notifyCompletion#internal + 519
        at 13  phoenixd                            0x612577           kfun:kotlinx.coroutines.JobSupport.completeStateFinalization#internal + 295
        at 14  phoenixd                            0x611f3f           kfun:kotlinx.coroutines.JobSupport.finalizeFinishingState#internal + 511
        at 15  phoenixd                            0x615743           kfun:kotlinx.coroutines.JobSupport.tryMakeCompletingSlowPath#internal + 659
        at 16  phoenixd                            0x615460           kfun:kotlinx.coroutines.JobSupport.tryMakeCompleting#internal + 192
        at 17  phoenixd                            0x6152a4           kfun:kotlinx.coroutines.JobSupport#makeCompletingOnce(kotlin.Any?){}kotlin.Any? + 148
        at 18  phoenixd                            0x606f7a           kfun:kotlinx.coroutines.AbstractCoroutine#resumeWith(kotlin.Result<1:0>){} + 106
        at 19  phoenixd                            0x5dfcf5           kfun:kotlin.coroutines.Continuation#resumeWith(kotlin.Result<1:0>){}-trampoline + 85
        at 20  phoenixd                            0x5603ed           kfun:kotlin.coroutines.native.internal.BaseContinuationImpl#resumeWith(kotlin.Result<kotlin.Any?>){} + 461
        at 21  phoenixd                            0x5dfcf5           kfun:kotlin.coroutines.Continuation#resumeWith(kotlin.Result<1:0>){}-trampoline + 85
        at 22  phoenixd                            0x63a54a           kfun:kotlinx.coroutines.DispatchedTask#run(){} + 474
        ... and 22 more common stack frames skipped
Aborted (core dumped)

Before that, I see a lot of these messages:

2024-10-10 14:42:00 connecting to lightning peer...
2024-10-10 14:42:00 disconnected from lightning peer
2024-10-10 14:42:03 connecting to lightning peer...
2024-10-10 14:42:03 disconnected from lightning peer
2024-10-10 14:42:06 disconnected from lightning peer
2024-10-10 14:42:09 connecting to lightning peer...
2024-10-10 14:42:09 disconnected from lightning peer
2024-10-10 14:42:12 connecting to lightning peer...
2024-10-10 14:42:12 disconnected from lightning peer
2024-10-10 14:42:15 connecting to lightning peer...
2024-10-10 14:42:15 disconnected from lightning peer
2024-10-10 14:42:18 connecting to lightning peer...
2024-10-10 14:42:18 disconnected from lightning peer

In fact, there has been this pair of messages every 3 seconds for at least an hour. That's how far I was able to scroll up in my logs so this might have happened for a very long time.

It could be that my VPS lost internet access at some point but it definitely had internet access between 1 and 3pm so not sure why it wasn't able to connect to the LSP.

But anyway, shouldn't phoenixd retry forever? It could also use exponential backoff.

I have restarted it now and it connected. I'll report back when it goes down again.

@ekzyis
Copy link
Author

ekzyis commented Oct 11, 2024

Oh, no, actually, the error last time was Could not open /dev/urandom when trying to pay an invoice and then it crashed.

@pm47
Copy link
Member

pm47 commented Oct 15, 2024

Hello,

What kind of host is it? and does it run something else than phoenixd? What's the result of ulimit -a ?

But anyway, shouldn't phoenixd retry forever? It could also use exponential backoff.

We fixed a bug in 0.4.0 that caused the retry loop to hang. Do you reproduce with phoenixd 0.4.0 ?

error last time was Could not open /dev/urandom when trying to pay an invoice and then it crashed.

Looks like a different problem. Phoenixd must be able to access a source of randomness

@ekzyis
Copy link
Author

ekzyis commented Oct 15, 2024

What kind of host is it? and does it run something else than phoenixd?

It's a Nanode 1 GB from https://cloud.linode.com/. It also runs nginx and gitea.

What's the result of ulimit -a ?

$ ulimit -a
real-time non-blocking time  (microseconds, -R) unlimited
core file size              (blocks, -c) unlimited
data seg size               (kbytes, -d) unlimited
scheduling priority                 (-e) 0
file size                   (blocks, -f) unlimited
pending signals                     (-i) 127233
max locked memory           (kbytes, -l) 8192
max memory size             (kbytes, -m) unlimited
open files                          (-n) 1024
pipe size                (512 bytes, -p) 8
POSIX message queues         (bytes, -q) 819200
real-time priority                  (-r) 0
stack size                  (kbytes, -s) 8192
cpu time                   (seconds, -t) unlimited
max user processes                  (-u) 127233
virtual memory              (kbytes, -v) unlimited
file locks                          (-x) unlimited

We fixed a bug in 0.4.0 that caused the retry loop to hang. Do you reproduce with phoenixd 0.4.0 ?

Thanks, I will try out 0.4.0. I am currently running version 0.3.2-97025b7.

Looks like a different problem. Phoenixd must be able to access a source of randomness

Yes but wanted to mention anyway

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants