-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rely on ActiveSupport::Notifications
and ActiveSupport::LogSubscriber
for logging and instrumentation
#208
Conversation
2fead34
to
4f89ebb
Compare
… subscribers And instrument a bunch of actions.
Fix bug found for async execution mode when worker didn't correctly finish shutting down. Also, always instrument retry_all and release_many_blocked even if none is retried/released, so we have that information as well, with the number of jobs that actually were acted on.
And fix issue when discarding in batches, where the discarded count was always being zero when the foreign key with cascade deletion is present from jobs to ready executions. Discarding all jobs would delete all ready executions in cascade, and then deleting them afterwards, for the case where there aren't any foreign keys, would return 0 jobs discarded.
This was sending two events per error instead of one.
And set log subscriber's logger to be Solid Queue's logger. Also: refactor a bit how (OS) process attributes are exposed so they can be logged more easily.
With Active Support notifications.
e8debee
to
c0b52a5
Compare
if supervised_fork = forks.delete(pid) | ||
payload[:fork] = supervised_fork | ||
start_fork(supervised_fork) | ||
end | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rosa This "Restarting fork" log line was a key clue in helping us debug and I know just how common it is for servers to auto-restart rails processes because of memory. What if we future-proof this for all the poor souls in the years ahead who are going to be diagnosing auto-killed workers. :)
In our situation the line previously read as:
[SolidQueue] Restarting fork[67] (status: )
You're already correcting the "restarting fork" language and I think the new log output for this situation will read as:
SolidQueue-#{ver} replace_fork (#{ms}) supervisor_pid: 12345, pid: 67, status:
That's slightly more helpful but for someone like me who isn't super familiar with SolidQueue it doesn't lead me much towards a possible root cause. I probably would have still posted an Issue to get help. What if you:
- Notice if
status.pid
is nil and displaySIGKILL
for that - Make it clear that
status
is the status of the previous worker which is being replaced; I would not have inferred that based on the current log output. - Include an additional
msg
key in the hash.
Here's my suggested improvement which incorporates these:
SolidQueue-#{ver} replace_fork (#{ms}) supervisor_pid: 12345, pid: 67, status: SIGKILL, msg: "Detected a terminated worker so it's being replaced with a new one. Exit status of previous worker was #{status || 'SIGKILL'}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking the time to look into this one! I've added some more info in e5790d0, which I hope it's useful in these cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 That looks great! Now there are definitely some clear clues as to what’s happening. I’m excited about all the investment in SolidQueue and am leaning in with a new product I’m building (rather than my old Sidekiq + Redis setup). Thanks for all your work on this!
ef5023c
to
e5790d0
Compare
…vent So we at least can tell when it received a SIGKILL or when it exited normally.
e5790d0
to
b9db22d
Compare
This includes the basics to make it work, plus instrumentation for the most relevant actions.
Closes #103.