Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In the MessageBuffer, detect if we've just been called by a fork child. #199

Closed
wants to merge 3 commits into from

Conversation

remeh
Copy link
Contributor

@remeh remeh commented Jul 13, 2021

In order to clean the MessageBuffer in this case, because it can contains data from the parent process instead.

Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few notes!

One thing that occurred to me is that another possible approach would be to move the fork checking to the Sender implementaiton. Doing so would allow a trivial optimization -- the default Sender can avoid all of this checking, because naturally its inner thread dies on fork, and so, while that thread lives, we're sure that no fork happened.

So only the SingleThreadSender would need to have the fork-checking behavior.

Also -- probably useful to add some tests and a microbenchmark to validate that the cost of calling Process.pid is reasonable :)

lib/datadog/statsd/forwarder.rb Outdated Show resolved Hide resolved
lib/datadog/statsd/message_buffer.rb Show resolved Hide resolved
lib/datadog/statsd/message_buffer.rb Outdated Show resolved Hide resolved
@ivoanjo ivoanjo mentioned this pull request Aug 13, 2021
@remeh
Copy link
Contributor Author

remeh commented Sep 1, 2021

One thing that occurred to me is that another possible approach would be to move the fork checking to the Sender implementation. Doing so would allow a trivial optimization -- the default Sender can avoid all of this checking, because naturally its inner thread dies on fork, and so, while that thread lives, we're sure that no fork happened.

@ivoanjo Please correct me if I'm wrong in my thinking but I think that what you are saying isn't correct: in both mode (with or without companion thread), we have to detect that we are now running in a different PID, because it means that we are in a new process (it can happen if an user is forking directly in their code or within a framework). Because of that fork, we have to clean the buffer of its existing data in the new process, to avoid having both processes sending the same data which was originally in the buffer the moment the fork happened. Does it make sense?

@ivoanjo
Copy link
Member

ivoanjo commented Sep 1, 2021

One thing that occurred to me is that another possible approach would be to move the fork checking to the Sender implementation. Doing so would allow a trivial optimization -- the default Sender can avoid all of this checking, because naturally its inner thread dies on fork, and so, while that thread lives, we're sure that no fork happened.

@ivoanjo Please correct me if I'm wrong in my thinking but I think that what you are saying isn't correct: in both mode (with or without companion thread), we have to detect that we are now running in a different PID, because it means that we are in a new process (it can happen if an user is forking directly in their code or within a framework).

Yup!

Because of that fork, we have to clean the buffer of its existing data in the new process, to avoid having both processes sending the same data which was originally in the buffer the moment the fork happened. Does it make sense?

Not necessarily -- which was what prompted my suggestion (but it was more of a "may be interesting" and not a "I really think this needs to change" suggestion). In the current approach in this PR, the fork detection has been added to the MessageBuffer, and then every time we try to interact with it, we check the pid.

What I was saying is, that in the Sender class, there's another way we could detect a fork -- by checking that the background thread died. This is because threads do not survive forks:

[1] pry(main)> background_thread = Thread.new { sleep }
=> #<Thread:0x00007fc41a84c430 (pry):1 sleep>
[2] pry(main)> fork do
[2] pry(main)*   puts "background thread status: #{background_thread.inspect}"
[2] pry(main)* end
background thread status: #<Thread:0x00007fc41a84c430 (pry):1 dead>
=> 22780
[3] pry(main)> background_thread.inspect
=> "#<Thread:0x00007fc41a84c430 (pry):1 sleep>"
[4] pry(main)>

So an alternative in the regular Sender to checking the pid every time is to check if the thread is still alive. If it is, you're sure that a fork did not occur. If it is not, then it's possible that the checking process is a child process in a fork; so an option here is, rather than cleaning the MessageBuffer, just create a new MessageBuffer + background thread combo entirely.

But, as I said above, in the SingleThreadedSender this is not an option -- that one would still need the pid check.

@remeh
Copy link
Contributor Author

remeh commented Sep 1, 2021

Got it, thanks for the detailed explanation. Indeed performance-wise that may be interesting to do that instead in the original Sender class! This is where benchmarks will also be useful 👍 I'll try that.

@remeh
Copy link
Contributor Author

remeh commented Sep 3, 2021

Replaced by #203.

@remeh remeh deleted the remeh/fork-detect branch September 3, 2021 12:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants