Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nodejs process seems to hang on coffee-resque #40

Open
carlost opened this issue Jun 13, 2012 · 6 comments
Open

nodejs process seems to hang on coffee-resque #40

carlost opened this issue Jun 13, 2012 · 6 comments
Labels

Comments

@carlost
Copy link
Collaborator

carlost commented Jun 13, 2012

I am not certain what is causing this but mimeo seems to hang at some point after coffee-resque has rewritten the process title.

I can't get the complete process title (even when i cat /proc//stat or /proc//cmdline). The tuncated title i get top & /proc/*/stat is "resque-0.1.4: S". This appears to correspond with the following line in coffee-resque:

@procline "Sleeping for #{@conn.timeout/1000}s"

Here is a snapshot of the nodejs process from top:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
29356 mimeogra 20 0 1034m 156m 5416 R 99.7 4.1 251:03.85 resque-0.1.4: S

I need to determine what tools i can use to check node at runtime and or add additional logging output.

@steelThread
Copy link
Owner

So is it you can't get the complete process title or is the node process hanging? The distinction is important. Don't rely on the process title from coffee-resque for monitoring purposes.

@carlost
Copy link
Collaborator Author

carlost commented Jun 14, 2012

I am not certain what is causing node to hang. The process hangs sometime after the coffee-resque sets the process title to "resque-0.1.4: S" and before something else resets it. However, that covers a lot of possible code.

I am going to try to get the node 0.4.8 debugger working and see if i can get a a list/trace of what code nodejs is stuck on. Then reevaluate. Something to consider is that we are still on an ancient version of nodejs. This problem may be caused by something that has long since been resolved by V8 or node ... although my money is on something silly i did.

Carlos

@carlost
Copy link
Collaborator Author

carlost commented Jun 14, 2012

Adding this here because until 20 minutes ago i didn't know anything about debugging a nodejs app. I had some difficulty connecting with & getting good info from the stock nodejs debugger. However, node-inspector was pretty easy to use. I will attach this to mimeo the next time it hangs ... see if i can get a better sense of what the hell it is locked up on.

@carlost
Copy link
Collaborator Author

carlost commented Jun 14, 2012

OK. So this just happened again.

the call stack that node-inspector goes back 5 frames. it looks like nodejs is waiting on IO

the call stack is as follows:

Socket._onBufferChange net.js:505
Socket._writeOut net.js:444
Socket._flush net.js:529
Socket._onWriteable net.js:609
anonymous function net.js:188

I suspect it is attempting to write to redis ... however, when i went to inspect the socket obj to confirm this mimeo crashed. i suspect that redis didn't like waiting around while i had the process paused. I also have no idea what is being written. It is resque writing to redis, mimeo writing a giant doc etc

I will take another look at this the next time it happens.

Steps to connect debugger:

1 get the pid for mimeo
1 kill -s USR1
1 node-inspector &

then connect a webkit browser on localhost:8080

Carlos

@carlost
Copy link
Collaborator Author

carlost commented Jun 14, 2012

some observations

==spot node is locking up on==
the app is spending its time inside of onWritable() in net.js. this is the callback attached to the IOWatcher assigned to the sockets _writeWatcher property. the socket associated to this is connecting to redis. so ... it seems to be taking an unbelievable amount of time and CPU to transfer data to redis.

==slow data transfer with redis==
using iptraf between various hosts running mimeo and the redis host it appears that when this condition occurs mimeo is able to transfer very little data to/from the redis host. during this same time other mimeo hosts are able to transfer data very quickly with redis.

example

  • node that is spending a lot of time in IOWatcher.onWriteable: incoming/outgoing rates in the tens of kbits p/sec
  • node that is flying through jobs: incoming/outgoing rates consistently in the 100s of kbits p/sec with bursts into tens-of -thousands of kbits p/sec

during this time the host with the hanging mimeo is able to rapid transfer files to/from other hosts using scp & wget. restarting mimeo (which appears to require a kill -9) returns performance and data transfer rates to normal.

i have a bunch of data - just no answer to this issue. i probably need to get a better understanding on the nodejs event loop and how IOWatcher plays into that.

@carlost
Copy link
Collaborator Author

carlost commented Jun 19, 2012

anecdotal evidence indicates that this occurs when setting a giant string into redis. This can occur when writing the final stitched pdf back into redis. In some cases the source PDFs are 100s of pages long and approach 25MB in size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants