-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Buffer full warning while under load #4715
Comments
I wonder if the issue is the underlying MQTT server, perhaps the bigger size needs to be larger/configurable to handle higher packet volumes or high latency between AS and the subscribed client. |
The On the |
This is still occuring on 3.15.1. I have manually set the buffer size on mystque to 1024 and that resolved the majority of the buffer full messages. I'd appreciate if you have time to look at our metrics (attached) as we still get 'work dropped'. I can ship you a bunch of logs via email if needed. We are running pretty two dedicated subscribers on fargate in the same region as the ec2 so it should be able to keep up. What would you recommend in terms of worker counts? We did have some issues with the HTTP integration locking up on us some time ago hence moved to MQTT |
I've checked our internal metrics and we don't really see Regarding webhook deadlocks - from what we know, it is this infamous beauty: golang/go#32388, which needs to be backported (golang/go#48650, golang/go#48649). #4790 will allow you to be more aggressive with how traffic is dropped for different subscribers (such as MQTT), but I consider it to be rather abnormal that you see worker pool drops - we've never experienced work being dropped for Edit: Can you plot |
@adriansmares things have definitely been running better but we hit performance issues again yesterday and found that I had set the udp handlers to 32 (double the old default but significantly less than the new 1024..). That resolved the work dropped by the udp handler but there is a new spike on upstream_handlers. Looks like its fixed at 32 workers, based on the attached chart should we try bump that to 1024 too? |
The 32 worker is limit is on a per gateway basis - each gateway has its own worker pool of 32 workers for submission to the upstream (which in this case is the Network Server - this is what the Do you have an actual gateway that could see so many packets ? Is this some bridge that actually backs multiple gateways at once, but in TTS a singular one is used ? Is it always the same gateway ? For reference, we don't see any drops whatsoever on that pool in any of our deployments - they occur when we may restart the Network Server and the peer is not available, but we don't see them at steady state in any case. You may increase the 32 limit but it is very abnormal that one gateway can produce so much traffic - this may be a sign that the Network Server cannot keep up with the traffic and perhaps it should be scaled up. |
Summary
When under significant load the application server reports dropping uplink packets as the buffer is full.
Steps to Reproduce
What do you see now?
What do you want to see instead?
No failed publish logs
...
Environment
TTS v3.14.2
How do you propose to implement this?
Is there any ability to tune or increase the buffers on mqtt?
...
How do you propose to test this?
Happy to run test branches on our load simulator
Can you do this yourself and submit a Pull Request?
Need guidance
The text was updated successfully, but these errors were encountered: