-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"threads.c" error when running binaries or docker image for amd64 devel version #3569
Comments
There seems to be a mistake above,
Edit: Corrected the issue description. |
So, essentially you are saying that the static build fails to run, right? Can't reproduce, yet:
Can't reproduce by downloading the binary either. |
Can reproduce with the downloaded binary and a local PG (non nix tooled)
And with the fresh build of
This should allow bisecting. |
Same for me, it doesn't happen when using the nix |
Error comes from kerberos lib: https://github.com/krb5/krb5//blob/5f0023d5f05e95021a7caa1193f76f86871222ce/src/util/support/threads.c#L558-L565 Maybe the error starts from #3169? (GSSAPI) |
Interesting: The error doesn't seem to happen every time. It fails sometimes and sometimes doesn't - on the same commit. |
|
Ah damn, it likely is the exception throwing and catching I did postgrest/src/PostgREST/Listener.hs Lines 45 to 53 in a46bea1
Somehow it must be affecting some state on the C lib. I will try a different approach. Why this doesn't happen on the local nix postgres though? Because it connects through unix sockets? |
I don't think so. Using |
I think it needs the combination of static executable + TCP to break. |
So it wasn't the exception catching/throwing, I just removed that and it fails the same. Another change on 3cf5656 is that now the listener and the connection worker start in parallel. Before the listener waited for the connection worker, here 3cf5656#diff-f6074642dc32a94e9e1997b21b0df631b2d3f098e0a81673edc22b89a86f49edL534-L538. But that was only done to reuse the exponential backoff retrying from the connection worker, I didn't know it would have this effect. I guess the kerberos lib doesn't work with parallel connections? |
Hm, not sure whether this is about kerberos (but yes, could be). Could also be about libpq itself: https://www.postgresql.org/docs/current/libpq-threading.html
Maybe something like that is happening accidentally? |
Oh, on the same page later on:
Ah, that doesn't seem relevant, though, because we are not using kerberos additionally. But it could explain why libpq throws up in that area when maybe the thing from above happens? |
Clears the limitation mentioned on PostgREST#3536 The Listener no longer uses the https://hackage.haskell.org/package/retry package and instead uses a much simpler IORef in AppState for the delays. Additionally it no longer uses exception throwing/catching, which is rather messy and brings some concerns(PostgREST#3569 (comment)).
Clears the limitation mentioned on #3536 The Listener no longer uses the https://hackage.haskell.org/package/retry package and instead uses a much simpler IORef in AppState for the delays. Additionally it no longer uses exception throwing/catching, which is rather messy and brings some concerns(#3569 (comment)).
I tested the current main branch with disabled gssapi / krb5. This makes the error go away, too. The question is whether we can find a way / configuration where we can have your changes and gssapi-support? |
For now, I pushed a revert of the GSSAPI commit. I will try this again, once we get a truly static |
Oh wait.. I have a fix ready now that brings back waiting between the listener and connection worker.. |
Happy to have that in, if it improves the code. But I would still not want to release the GSSAPI change right now. I'm not 100% convinced that this can not cause problems elsewhere, maybe when two pool connections start up at the same time or something like that. I actually kind of like the fact that we have a reproducer for this case, so I'm fine with keeping it as is. |
Brings back the the signaling/waiting between the connection pool and the Listener. Prevents the GSSAPI error shown on PostgREST#3569
Yeah, I think it's worth adding it back. If anything it reduces wastefully retrying both listener and the connection worker when postgres is completely down. Doing it on #3573.
Right, that could still be trouble. |
Brings back the the signaling/waiting between the connection pool and the Listener. Prevents the GSSAPI error shown on #3569
Environment
Description of issue
Executing the binary or docker release for the
devel
branch , returns an error when trying to connect to the database.Local binary:
Docker:
Running the
main
branch from a Nix environment does not return this error:The text was updated successfully, but these errors were encountered: