You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
under Docker version 17.05.0-ce, build 9f07f0e-synology
on Synology Diskstation
Steps to reproduce:
Not deterministic.
Start Telegraf. Sometimes it runs for a couple minutes and stops receiving messages. Sometimes it runs for hours. And three times now it's crashed on a null pointer.
We have had mixed reports of both version 1.1.0 (used in Telegraf 1.7.x) and 1.1.1 (Telegraf 1.8.x) not working properly. What would be the most helpful is if someone could volunteer to do a deep dive on the plugin and open some issues upstream.
I also think we should look into removing the AutoReconnect feature of mqtt and handle reconnection ourselves, this may allow us to sidestep some of the bugs in the library.
I'm going to do some more diagnosis, and will probably still create an issue upstream for this crash -- if nothing else, that error message isn't helpful. ;-)
My experience suggests that it's the ping that's the problem. In most of my failure cases, Mosquitto has justifiably stopped publishing messages because it's not been recently pinged by Telegraf. I'll see if I can narrow that down.
@rgitzel I made some changes in 1.8.2 that should resolve this issue. There are also some fairly large changes for 1.9.0 to support the decoupling of inputs and outputs (#4938), which could impact this plugin, do you think you could test with the latest release candidate (1.9.0-rc2 currently)?
Since I upgraded from 1.7.1 to 1.8.0 on Friday I've been having all manner of stability issues, with big gaps in my graphs.
Definitely I'm seeing issues similar to #4594. But I am also occasionally seeing outright crashes. Logs of one of them are below.
Relevant telegraf.conf:
System info:
Steps to reproduce:
Not deterministic.
Start Telegraf. Sometimes it runs for a couple minutes and stops receiving messages. Sometimes it runs for hours. And three times now it's crashed on a null pointer.
Expected behavior:
Don't crash. Handle the error gracefully.
Actual behavior:
Additional info:
All sorts of interesting things in that log:
%!s(<nil>)
error... that being created by Paho, haven't been able to isolate it just looking at code, but here's the error class: https://github.com/eclipse/paho.mqtt.golang/blob/master/packets/packets.go#L93As mentioned elsewhere probably Paho is the problem -- but Telegraf should still handle the error gracefully.
The text was updated successfully, but these errors were encountered: