-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use gen_statem in Postgrex.SimpleConnection #643
Conversation
LGTM too. |
lib/postgrex/simple_connection.ex
Outdated
# We have to do a hack here to carry the actual caller PID over to the handle_call/3 | ||
# callback, because gen_statem uses a proxy process to do calls with timeout != :infinity. | ||
# This results in the caller PID not being the same as the PID in the "from" tuple, | ||
# so things like Postgrex.Notifications cannot use that "from"'s PID to register | ||
# notification handlers. This approach is paired with reconstructing the proper | ||
# "from" tuple in the reply/2 function in this module. | ||
{from_pid, from_ref} = from | ||
callback_from = {from_pid, {from_ref, caller_pid}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally speaking, the second element of from
must be opaque and I think we are going to be in trouble if we ask users to look into it. I think we should rather tell users of this library to not rely on from
and instead explicitly pass the caller too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other words, I don't think the from
should be passed on the caller here. Rather the .listen
function in notifications
should explicitly pass the pid
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sent a similar PR to Oban which may also be affected by this: oban-bg/oban#843
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@josevalim yes, the second element of from is opaque here outside of SimpleConnection
. This is to be backwards compatible for folks using the SimpleConnection
behaviour, so that they're unlikely to have to change their callbacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the point. Your fix has handle_call
looking at the second element of from
, but there was no guarantee that would be of any shape whatsoever before this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps you meant to write this:
# We have to do a hack here to carry the actual caller PID over to the handle_call/3 | |
# callback, because gen_statem uses a proxy process to do calls with timeout != :infinity. | |
# This results in the caller PID not being the same as the PID in the "from" tuple, | |
# so things like Postgrex.Notifications cannot use that "from"'s PID to register | |
# notification handlers. This approach is paired with reconstructing the proper | |
# "from" tuple in the reply/2 function in this module. | |
{from_pid, from_ref} = from | |
callback_from = {from_pid, {from_ref, caller_pid}} | |
# We have to do a hack here to carry the actual caller PID over to the handle_call/3 | |
# callback, because gen_statem uses a proxy process to do calls with timeout != :infinity. | |
# This results in the caller PID not being the same as the PID in the "from" tuple, | |
# so things like Postgrex.Notifications cannot use that "from"'s PID to register | |
# notification handlers. This approach is paired with reconstructing the proper | |
# "from" tuple in the reply/2 function in this module. | |
{from_pid, from_ref} = from | |
callback_from = {caller_pid, from} |
That would be a bit more okayish (and backwards compatible).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah jeez, that's exactly what I wanted to do. I mixed things up 🙈 Thanks for catching that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and fixed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this change after erlang/otp#7081 ?
Aaaand all tests pass 🎉 |
lib/postgrex/notifications.ex
Outdated
@@ -146,7 +146,7 @@ defmodule Postgrex.Notifications do | |||
* `:timeout` - Call timeout (default: `#{@timeout}`) | |||
""" | |||
@spec listen(GenServer.server(), String.t(), Keyword.t()) :: | |||
@spec listen(:gen_statem.server_ref(), String.t(), Keyword.t()) :: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@type server :: :gen_statem.server_ref()
at the top?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done ✅
See current discussion in elixir-ecto/postgrex#643.
@whatyouhide it seems we have some heisen-failures now in the test suite. :( |
You can try |
@josevalim some race conditions in the tests themselves I think, not in the gen_statem code. I sprinkled some tracing in there, and it seems like the Heisenfailingtest is gone. |
test/simple_connection_test.exs
Outdated
# Simulate a small delay, because it used to cause flaky test failures | ||
# in the past. | ||
# https://github.com/elixir-ecto/postgrex/actions/runs/4151824827/jobs/7182370012 | ||
Process.sleep(10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to sleep if we have the trace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think yes as a regression. I added a comment for it too 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am skeptical about relying on a sleep. Perhaps tracing SC.call
is the wrong thing, especially if SC.call uses another process? Perhaps it is best to assert that the connection has received a message with "SELECT 123" (picking 123 so it is distinct enough?), this way we guarantee that request has come first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my tests tracing SC.call
was enough. I triggered CI with just that and if it passes I think we're ok to merge that, otherwise yeah I can trace the query getting to the connection.
aca5030
to
cdc9958
Compare
I rebased on #644 and remove the direct dependency on |
💚 💙 💜 💛 ❤️ |
See current discussion in elixir-ecto/postgrex#643.
See current discussion in elixir-ecto/postgrex#643.
First pass, but seems to be quite promising. 🙃