-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nsqd: mystery bug #346
Comments
@visionmedia curious what OS you're running on? also what |
hmm interesting nsqd is sitting on a futex(), appears to be locked up, low cpu/mem |
|
which version of Ubuntu is that? You mean locked up and it doesn't recover or you mean that it locks up and eventually recovers and you believe that's contributing to the timeouts? While you're experiencing this issue, if you can capture a CPU profile from the HTTP endpoint Thanks! |
super weird the only activity in our worker seems to be redis, the rest just might be a side-effect of this, I should probably stop trying to shove 1m events into one process but we need the throughput haha. I didn't see this redis issue earlier so this could be it |
think im going to close this actually, finally got a consistent failure going, cpu is pegged in our worker but I still can't really imagine how it's stuck for 1m hmm |
well, if you do suspect Good luck 🔥 🚒 |
on a side note i think nsqadmin is timing out requests a little too eagerly as well, making the /stats request locally is working fine but the UI gives me 0 sometimes like I mentioned there, I thought that was a result of blocking but it seems ok |
yea, |
goroutine:
and this is what I get back from /debug/pprof/profile:
|
and the process just exploded OOM haha, I'll try just toning down the v8 GC abuse and see how that goes |
wait, the That |
nope just our node process, I'm going to run it for a while capped at 100k messages and see if this issues pops up at all |
ah fuck, sorry for the noise i forgot v8 has a heap limit of ~1.9G or so, gc was just being a bitch |
not much info yet, basically it only happens when one of our uploads to s3 fails, but we're not even REQing much in this case (300 or so), so it's not overwhelming nsqd like I had previously thought
followed by ~50:
pretty positive it's something I'm doing but I'll keep updating.
Ahh just got an EPIPE on the client end, and nsqadmin does seem to have periods where it'll display 0 (instead of ~3,000,000) for the topic/chan, there's no sign of nsqd going down though and cpu usage is minimal
followed by:
The text was updated successfully, but these errors were encountered: