Skip to content
This repository has been archived by the owner on Feb 7, 2018. It is now read-only.

Deadlock when spawned process writes to stderr #96

Closed
kilink opened this issue May 17, 2017 · 4 comments
Closed

Deadlock when spawned process writes to stderr #96

kilink opened this issue May 17, 2017 · 4 comments

Comments

@kilink
Copy link

kilink commented May 17, 2017

When a process spawned by Cocaine writes sufficient data to stderr, both the processes will hang indefinitely. This is because Cocaine is blocked reading from stdin, while the process eventually blocks writing to stderr when the internal buffer for the pipe becomes full.

The issue can be reproduced with the following trivial script:

#!/usr/bin/env ruby

1000.times do
  STDERR.puts "*" * 1000
end

Called via Cocaine, it blocks:

Cocaine::CommandLine.new("./hang.rb").run # hangs forever

Using strace, you'll be able to see that the invoked process is blocked in a write syscall on the STDERR file descriptor. The other giveaway is that this does not block:

Cocaine::CommandLine.new("./hang.rb", "", swallow_stderr: true).run

Obviously that's not ideal, since we may want to capture both stdout and stderr.

This Zendesk blog post touches on the issue briefly in the section entitled "Bonus round: avoiding deadlocks." The solution is to avoid making blocking calls like read, and using IO::select in a loop to figure out which file descriptors are ready to be read from. Here is how it is implemented in the subprocess gem.

I see potentially fixing this in Multipipe#read, or by adding a new runner that uses the subprocess gem. Thoughts?

@mmangino
Copy link

mmangino commented Jun 2, 2017

Did you ever create this runner? If not, I'm going to give it a shot. We're having the same issue.

@joshuapinter
Copy link

Did this ever get fixed? Running into the same problem transcoding medium sized files (26mbs+).

Thanks!

@joshuapinter
Copy link

FYI, we tried @mmangino's branch and it works great in production for larger files. No more deadlocks. You can see more of it here: #97

@jyurek
Copy link
Contributor

jyurek commented Feb 7, 2018

The PR that fixes this has been moved to thoughtbot/terrapin#5. I'll close this and any necessary discussion on this point can be found there. Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants