Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rely on ActiveSupport::Notifications and ActiveSupport::LogSubscriber for logging and instrumentation #208

Merged
merged 12 commits into from
May 8, 2024

Conversation

rosa
Copy link
Member

@rosa rosa commented Apr 12, 2024

This includes the basics to make it work, plus instrumentation for the most relevant actions.

Closes #103.

rosa added 11 commits April 30, 2024 11:54
… subscribers

And instrument a bunch of actions.
Fix bug found for async execution mode when worker didn't correctly finish
shutting down.

Also, always instrument retry_all and release_many_blocked even if none is
retried/released, so we have that information as well, with the number of jobs
that actually were acted on.
And fix issue when discarding in batches, where the discarded count
was always being zero when the foreign key with cascade deletion is
present from jobs to ready executions. Discarding all jobs would delete
all ready executions in cascade, and then deleting them afterwards, for
the case where there aren't any foreign keys, would return 0 jobs
discarded.
This was sending two events per error instead of one.
And set log subscriber's logger to be Solid Queue's logger.

Also: refactor a bit how (OS) process attributes are exposed so they can
be logged more easily.
With Active Support notifications.
if supervised_fork = forks.delete(pid)
payload[:fork] = supervised_fork
start_fork(supervised_fork)
end
end
end
Copy link

@krschacht krschacht Apr 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rosa This "Restarting fork" log line was a key clue in helping us debug and I know just how common it is for servers to auto-restart rails processes because of memory. What if we future-proof this for all the poor souls in the years ahead who are going to be diagnosing auto-killed workers. :)

In our situation the line previously read as:

[SolidQueue] Restarting fork[67] (status: )

You're already correcting the "restarting fork" language and I think the new log output for this situation will read as:

SolidQueue-#{ver} replace_fork (#{ms}) supervisor_pid: 12345, pid: 67, status:

That's slightly more helpful but for someone like me who isn't super familiar with SolidQueue it doesn't lead me much towards a possible root cause. I probably would have still posted an Issue to get help. What if you:

  1. Notice if status.pid is nil and display SIGKILL for that
  2. Make it clear that status is the status of the previous worker which is being replaced; I would not have inferred that based on the current log output.
  3. Include an additional msg key in the hash.

Here's my suggested improvement which incorporates these:

SolidQueue-#{ver} replace_fork (#{ms}) supervisor_pid: 12345, pid: 67, status: SIGKILL, msg: "Detected a terminated worker so it's being replaced with a new one. Exit status of previous worker was #{status || 'SIGKILL'}"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to look into this one! I've added some more info in e5790d0, which I hope it's useful in these cases.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 That looks great! Now there are definitely some clear clues as to what’s happening. I’m excited about all the investment in SolidQueue and am leaning in with a new product I’m building (rather than my old Sidekiq + Redis setup). Thanks for all your work on this!

@rosa rosa force-pushed the log-and-instrumentation branch from ef5023c to e5790d0 Compare May 1, 2024 15:17
…vent

So we at least can tell when it received a SIGKILL or when it exited normally.
@rosa rosa force-pushed the log-and-instrumentation branch from e5790d0 to b9db22d Compare May 1, 2024 15:32
@rosa rosa merged commit 1d41b53 into main May 8, 2024
8 checks passed
@rosa rosa deleted the log-and-instrumentation branch May 8, 2024 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switch to ActiveSupport::LogSubscriber
2 participants