Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

paho Rust slower than paho Python? #63

Closed
tobdub-snce opened this issue Jan 8, 2020 · 13 comments
Closed

paho Rust slower than paho Python? #63

tobdub-snce opened this issue Jan 8, 2020 · 13 comments
Labels
fix added A fix was added to an unreleased branch
Milestone

Comments

@tobdub-snce
Copy link

tobdub-snce commented Jan 8, 2020

I have an application implemented in both Rust and Python using the paho mqtt libraries for each language.
The app is receiving around 800 mqtt messages per second and then triggering http calls for a few of the messages based on some simple parsing.
The Rust version is using the futures API with tokio 0.2. The Python version is using PyPy3.6 v7.3.0.
For some reason the Rust version is using 50% more cpu than the Python version (running on AWS T3 instance). This was a bit surprising to me as I expected the Rust version to consume less resources.

flamegraph.svg.gz

@fpagliughi
Copy link
Contributor

Ouch! Yes, I'm with you. The Rust version should be more efficient. I really do want to get a set of standard measurement for the Paho libraries so that we can get side-by-side comparisons of the performance and requirements of each. Messages per sec, memory use, CPU use, etc.

The one performance issue that I'm aware of is that there is more memory copying than might be necessary on the border between the Rust and underlying C library. Sometimes a buffer is copied in order to ensure Rust lifetime guarantees, but there might be places to improve on this. But I wouldn't imagine that it degrades performance to what you report.

The only thing I can think of is that there have been some bugs against the C library filed recently that the library is "spinning" and using up a lot of CPU in some instances:
eclipse-paho/paho.mqtt.c#781

That could be related.

@tobdub-snce
Copy link
Author

According to the flamegraph, a lot of time is spent in WebSocket_getch. It seems to be reading single chars from the TCP stream? But I guess that is on the C library side.

@fpagliughi
Copy link
Contributor

Ah. (Sorry, I didn't have much time this morning to dig into the graph).
The WebSocket implementation is a fairly recent addition to the C lib, from a contribution about a year ago. On the Rust side it was awesome in that it came completely for free. It just worked. But if the performance is lagging, that would be worth looking into. Perhaps its worth us cross-posting this information on the Paho C repo as well.

@icraggs
Copy link

icraggs commented Jan 9, 2020

I need to look at the Websocket implementation, or someone does. It works to the extent that basic function operates but there are issues that need addressing.

Also remember that the Python implementation has no disk persistence. You can turn that off in the C library if you want.

@tobdub-snce
Copy link
Author

The code is not actually using websockets. Looks like WebSocket_getch just calls SSLSocket_getch. I also tried disabling SSL, but that was actually slightly slower...
Disk persistence is disabled (mqtt::PersistenceType::None) and the subscriptions is made with QoS 0.

@icraggs
Copy link

icraggs commented Jan 9, 2020

Ok.

WebSocket_getch() is where we wait for the next incoming packet to be delivered (the first byte of the MQTT packet). WebSocket_getdata() is where the rest of the packet will be read in. So I'd be surprised if the getch() call is using a lot of CPU time. Elapsed time?

@tobdub-snce
Copy link
Author

tobdub-snce commented Jan 9, 2020

Should be CPU time. Created using cargo-flamegraph.
The message payloads are around 100 bytes.
I created a new flamegraph in the cloud (the first was from my laptop) using an example based on https://github.com/eclipse/paho.mqtt.rust/blob/master/examples/futures_consume.rs
and the results look a bit different, WebSocket_getch is smaller, but still larger than WebSocket_getdata, syscall overhead?. Could the I/O be made buffered?
A fair amount of time seems to be related to Rust futures as well, will remeasure tomorrow using the non-futures version.
The Python version is still faster (with PyPy).

flamegraph.svg.gz

@tobdub-snce
Copy link
Author

tobdub-snce commented Jan 10, 2020

I remeasured again using the https://github.com/eclipse/paho.mqtt.rust/blob/master/examples/async_subscribe.rs example, and it is faster, almost as fast as the Python PyPy version.
The C lib StackTrace seems to cause noticeable overhead (37.6% CPU time), maybe that could be disabled for release builds or added as a feature flag?
flamegraph.svg.gz

@tobdub-snce
Copy link
Author

tobdub-snce commented Jan 14, 2020

There are also some logs triggered by the C lib that may be possible to disable (21% CPU time with StackTrace enabled).

@tobdub-snce
Copy link
Author

A PAHO_HIGH_PERFORMANCE CMake flag is now available in the C lib. It doubles the performance for my use case and makes the Rust version faster than the Python version. It would be great if that flag could be enabled, or exposed by a feature in the Rust lib.

@fpagliughi
Copy link
Contributor

Agreed. I pushed out v0.7 based on what had been sitting in the repo for months waiting on the upstream bug fixes. But I'm immediately jumping on the next release and will start testing this.

I was assuming I would just enable this in the build. I didn't imagine not wanting to use it, but I suppose I can add an inverted feature to turn it off, just in case.

@fpagliughi fpagliughi added this to the v0.8 milestone Apr 28, 2020
@fpagliughi
Copy link
Contributor

This is in the develop branch.

@fpagliughi fpagliughi added the fix added A fix was added to an unreleased branch label May 27, 2020
@fpagliughi
Copy link
Contributor

Released in v0.8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix added A fix was added to an unreleased branch
Projects
None yet
Development

No branches or pull requests

3 participants