-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WebSocketSend CPU overload #67
Comments
Ok, in trying to reproduce a minimum working example I found the following fix which seems to resolve the problem, albeit in a way that I'm not entirely comfortable with. On the receive side (in the webpage) I was doing all the data processing in the ws.onmessage = function(event) {
setTimeout( function(){ handleMessage(event);}, 100 );
} the problem seems to have been solved. I presume that the code in Anyway, the moral of story seems to be: do as little as possible in |
There appears to be an issue with the TupleConsumer base class, which can lead to one of the threads spinning and taking up 100 % of a CPU. Moving the WebSocketSend operator to derive from AbstractOperator to avoid this issue.
So I've looked into this in some detail lately as the issue started raising itself again when we moved from a large server to a smaller one on a temporary basis. I've established that the high CPU usage is coming from a thread whose stack trace shows it is running the function At a certain point a glitch occurs and the sleeping thread wakes up and does not go back to sleep but runs continuously, eating up the CPU. I used If my reasoning above is correct this is a bug in the interaction between the scheduler threads (which seem to be coming from the |
Thanks for the investigation, that sounds like an issue I fixed in Streams (not yet released), but I wouldn't expect it to occur in this operator. I'll take a look soon. |
Ok, I think I can see why it would occur in the WebSocketSend operator. You might be able to work around the problem by removing these lines in the WebSocketSend.java (lines 134-135) This issue is that too many pending invocations of processBatch() cause the completer thread to have cpu issues. Another alternative might be to just increase the batch size using the batchSize parameter. |
Thanks Dan, I'll give that a shot in the coming days. I don't want to increase the batchSize parameter as my data arrives relatively infrequently (two metrics every 4 seconds for each input stream, typically 1 to 4 input streams) so batching could introduce unwanted latency. As I mentioned in my previous comment I have been testing the fix odrisci/streamsx.inet@63e4c34 and this has been running just fine for the last two days. Note that this has been applied to the streams_3.2.1 branch as we haven't upgraded to 4.0.0 as yet. |
Is this what you saw in the debugger? Because the completer thread explicitly sets completerNotify to false when it wakes up, so I can't see how completerNotify would remain true. |
I haven't been able to capture the moment of transition from normal operation to high CPU usage, but I have obtained strace outputs. This makes it relatively difficult to determine what's going on, since I'm not looking at the java source but at the system calls. One thing is clear: when the high CPU usage occurs, the notifier thread no longer calls the "wait" function. My assumption is that the I don't have the source code here, so I can't really be sure that this is what's happening, but the strace outputs suggest that this so. |
Perfect, thanks, I see the issue now. |
I confirmed the issue is in Streams, and have fixed it for the next release. In the meantime, I see if there's a work-around that still allows the operator to extend from TupleConsumer. |
That's great Dan, I appreciate your looking into this. Meanwhile the Regards, Cillian |
Hi,
I have an issue which I think is related to TooTallNate/Java-WebSocket#225.
Whenever I run the WebSocketSend for an extended period of time, the CPU usage of the streams PEC in which the operator is running goes into overload - usually exhibiting 90 to 100 % processor usage.
I haven't been able to produce a minimum working example as it seems to be somewhat random, and usually only crops up after a few hours of continuous use. My application generates a relatively low rate of one tuple every 1 to 4 seconds.
I was wondering if anyone else is seeing something similar?
Cillian
The text was updated successfully, but these errors were encountered: