Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU Usage 100% with MQTTClient async, depending on mosquitto version. #781

Closed
kollokollo opened this issue Dec 31, 2019 · 17 comments
Closed

Comments

@kollokollo
Copy link

kollokollo commented Dec 31, 2019

Describe the bug
The CPU usage of a paho async-client (subscribing to multiple topics on a remote mosquitto broker) is 100% (from beginning on, after connecting), but only on certain mosquitto broker versions. If the connection is lost, the CPU-load goes down to about 0. After reconnection, it is up again.

So, the same client code works fine when connecting to Machine A with mosquitto broker A, but connecting to another Machine B with another mosquitto broker B, the identical client takes 100% CPU (for one core), but it works.

To Reproduce
I am devellopping "MQTT-Hyperdash" (see https://github.com/kollokollo/MQTT-Hyperdash ), running this on linux machines using different mosquitto brokers on multiple raspberry pis but maybe of different Raspian versions and so different mosquitto versions.
But the problem is on the client side, always reproducible, and this should not be the case. Paho should work identical with any mosquitto version.

Expected behavior
The CPU usage of the async client should be negligible, connectiong to whatever version of the mosquitto broker is used on whatever remote machine.

** Environment (please complete the following information):**

  • Client OS: Ubuntu Linux

** Context
Machines A B and C running on the same local network. A and B are Raspberry Pi/Raspbian, Machine C is Ubuntu Linux. mosquitto broker running on all machines, The client (with the problem) is running on machine C. The problem occurs when the client connects to A but not when it connects to B. When starting two clients the cpu usage can go up to 200% (2 cores) and so on.

@kollokollo
Copy link
Author

I use this paho.mqtt.c commit:
9cb048c

@icraggs
Copy link
Contributor

icraggs commented Dec 31, 2019

Different versions of Mosquitto can also have bugs and behave differently! Then again we shouldn't get 100% CPU being used.

We'll need more information. Such as a trace of the client lib when it first happens (see the readme for how). Or a test program plus the version number of Mosquitto which causes a problem for you.

These are the sort of questions that I'm wondering about:

  • Is the subscribing to multiple topics in multiple subscribe commands, or one?
  • When does the 100% CPU usage start, after connect or a subscribe, or the receipt of a message?
  • Are there retained messages on any of the brokers which are sent to the client?
  • How frequently are you expecting messages to be sent to the client applications?

@kollokollo
Copy link
Author

OK, I understand. Ill do my best.
Mosquitto-Versions found by $SYS/broker/version:

-- 0.15 (this causes the problem)
-- 1.3.4 (and mybe others) seem to work fine.

Subscriptions: done one by one in a loop. I need to check, if even the first subscription causes the problem already. All topics subscribed to have retained values. Which my application gets in a callback after subscription (which is what is expected). QoS is always 0.

I expect messages sent with 1/minute to 1/second. However, I have tested, that I can also have 10/second or even more, which does not change the bahavior. (Wich is great!)

Using mosquitto_sub everything seems OK, CPU-Usage is near 0.
mosquitto_sub -h sonnenschein -t "HOME/SOLAR/AGM_VOLT"

@kollokollo
Copy link
Author

I have put together the subscriber.c from your examples to produce 100% CPU-Load when connected to the problematic broker.
a.zip

@kollokollo
Copy link
Author

And this is the output directly after start and now it waits for the next message...

`Subscribing to topic HOME/SOLAR/AGM_VOLT
for client ExampleClientSub using QoS1

Press Q to quit

Message arrived
topic: HOME/SOLAR/AGM_VOLT
message: 11.65 V
`

@kollokollo
Copy link
Author

Dont be confused: This example obviously uses QoS1, I forgot to change this, but with QoS0 it is the same... (just confirmed).
`Subscribing to topic HOME/SOLAR/AGM_VOLT
for client ExampleClientSub using QoS0

Press Q to quit

Message arrived
topic: HOME/SOLAR/AGM_VOLT
message: 11.69 V
`

@kollokollo
Copy link
Author

Tracing: I have set:

setenv MQTT_C_CLIENT_TRACE ON and
setenv MQTT_C_CLIENT_TRACE_LEVEL PROTOCOL

And now the output looks like this: (do we have block-quote here?)

`/c/mqtt>./subscriber =========================================================
Trace Output
Product name: Eclipse Paho Synchronous MQTT C Client Library
Version: 1.3.1
Build level: Fr 13. Dez 22:39:57 CET 2019
/proc/version: Linux version 4.13.0-45-generic (buildd@lgw01-amd64-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu116.04.9)) #5016.04.1-Ubuntu SMP Wed May 30 11:18:27 UTC 2018

=========================================================
20200101 111738.830 3 ExampleClientSub -> CONNECT version 4 clean: 1 (0)
20200101 111738.840 waitfor unexpectedly is NULL for client ExampleClientSub, packet_type 2, timeout 29862
20200101 111738.863 4 ExampleClientSub -> CONNECT version 3 clean: 1 (0)
20200101 111738.873 4 ExampleClientSub <- CONNACK rc: 0
Subscribing to topic HOME/SOLAR/AGM_VOLT
for client ExampleClientSub using QoS0

Press Q to quit

20200101 111738.873 4 ExampleClientSub -> SUBSCRIBE msgid: 1 (0)
20200101 111738.884 4 ExampleClientSub <- SUBACK msgid: 1
20200101 111738.922 4 ExampleClientSub <- PUBLISH msgid: 0 qos: 0 retained: 1 payload: 11.69 V
Message arrived
topic: HOME/SOLAR/AGM_VOLT
message: 11.69 V
20200101 111747.407 4 ExampleClientSub <- PUBLISH msgid: 0 qos: 0 retained: 0 payload: 11.71 V
Message arrived
topic: HOME/SOLAR/AGM_VOLT
message: 11.71 V
`

@kollokollo
Copy link
Author

I have now commented the subscriptions, and still CPU-Load is 100%. So the problem starts after connecting already. I now see only the PINGs in the trace.

// MQTTClient_subscribe(client, TOPIC, QOS);

@icraggs
Copy link
Contributor

icraggs commented Jan 1, 2020

I don't get a 100% CPU usage with the program you attached and Mosquitto 0.15. That Mosquitto version is really old (2012) - I had to download it to try it out. A full trace might give me something more to go on.

However, one difference between that version of Mosquitto and the recent versions is that it supports MQTT 3.1 only, not 3.1.1 the standardized version which didn't exist then. The Paho client library tries to connect with 3.1.1 and if that fails, falls back to 3.1. In the trace, that is shown by:

20200101 111738.840 waitfor unexpectedly is NULL for client ExampleClientSub, packet_type 2, timeout 29862

When connecting to a recent version, the first connect will succeed, using 3.1.1. It would be best not to use a broker which doesn't support 3.1.1.

Another thing you can try is to use the MQTTVersion connect option. Set it to MQTTVERSION_3_1 to only use MQTT 3.1 and MQTTVERSION_3_1_1 to only use 3.1.1 with no fallback.

@kollokollo
Copy link
Author

OK, maybe I can try. BTW. The 100% cpu load shown by top or xload is mainly SYS load (following the xosview classification USR/SYS/WIO/IDLE), but I dont know what that means. Maybe the load is produced in some sys function handling the socket. Maybe of importance: The troubling broker is remote and the client is connected via WLAN to the network. But I doubt that it matters, because as said, a connection to another Raspberry Pi running a more recent broker works fine.
I am going to find out how to use the MQTTVersion connect option and then see how it works...

@kollokollo
Copy link
Author

kollokollo commented Jan 1, 2020

Success:
conn_opts.MQTTVersion = MQTTVERSION_3_1;

makes a difference. It now works, not showing "waitfor unexpectedly is NULL", and now the CPU-Load is down/normal.

It looks like that fixes it. I have found no issues using this old protocol even together with more recent broker version. But will there be disadvantages in doing so?

@icraggs
Copy link
Contributor

icraggs commented Jan 9, 2020

MQTT 3.1.1 has been published for >5 years now so it is the most prevalent version by far. There is no real significant difference between 3.1.1 and 3.1 so the most likely effect is that you try to connect to a broker which doesn't support 3.1 any more.

If you can take a trace (normal level) as described in the readme of the connect situation when you get 100% CPU usage that might help me work it out.

@kollokollo
Copy link
Author

OK, I have made now a full trace. Interestingly, I get connection losses with the debug output on, which I do not get with quiet debug. It runs for 5 seconds at 100%cpu, then connection loss and it tries to reconnect. But however, here is a huge log.
a.zip

@kollokollo
Copy link
Author

Maybe since this was a slightly different example code used for the log above
commander.zip
, I should post it here:

@icraggs
Copy link
Contributor

icraggs commented Jan 11, 2020

So the title of this issue says 'async client' whereas the example you give uses the synchronous client library (MQTTClient_). Now I assume we are talking about MQTTClient not MQTTAsync?

@kollokollo
Copy link
Author

Hm, if you are referring the library used, yes. (Looks like I have not understood the difference.) I thought, because I use the callbacks, that was asyncronous. I see, that the concept doesnt allow for polling. I am sorry for that confusion.

@icraggs icraggs added this to the 1.3.3 milestone Apr 23, 2020
@icraggs icraggs changed the title CPU Usage 100% with async client, depending on mosquitto version. CPU Usage 100% with MQTTClient async, depending on mosquitto version. Apr 23, 2020
@icraggs icraggs modified the milestones: 1.3.5, 1.3.6 Jun 8, 2020
@icraggs icraggs removed this from the 1.3.7 milestone Nov 2, 2020
@icraggs
Copy link
Contributor

icraggs commented Nov 2, 2020

I don't get this with the latest code, with paho_cs_sub or your example.

@icraggs icraggs closed this as completed Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants