-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: fix flaky unit tests #3338
refactor: fix flaky unit tests #3338
Conversation
let mut servers = HashMap::new(); | ||
servers.insert("1".to_string(), server_config); | ||
|
||
rumqttd::Config { | ||
id: 0, | ||
router: router_config, | ||
cluster: None, | ||
console: Some(console_settings), | ||
console: None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why this was ever enabled, I haven't observed us using it anywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears to be a defensive diagnostic mechanism kept in place to debug the server when some rare flaky failure occurs, as running the test again by turning it ON may not reproduce the failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can safely ignore these console settings.
|
||
// The messages might get processed out of order, we don't care about the ordering of the messages | ||
requests.sort(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably isn't technically necessary as I imagine the requests are always sent in the order from the original c8y message, but I think the point still stands that the ordering is irrelevant.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files📢 Thoughts on this report? Let us know! |
Robot Results
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main fix that ensures that all the messages to be published are exhausted properly with acks looks fine. But, the test failures look concerning. Just some queries/comments on the other bits as well.
Ok(Ok(())) => { | ||
// I don't know why it happened, but I have observed this once while testing | ||
// So just log the error and retry starting the broker on a new port | ||
eprintln!("MQTT-TEST ERROR: `broker.start()` should not terminate until after `spawn_broker` returns") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But when this happens and when we retry the broker start on the next iteration, the client thread that's started (line. 137) in the last iteration also must be aborted somehow, right? Because once that client thread enters the loop, I don't see how it can break out even on a connection error because of the very specific if let
check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The client thread is not part of the loop, it's only started once we have a healthy broker.
let mut servers = HashMap::new(); | ||
servers.insert("1".to_string(), server_config); | ||
|
||
rumqttd::Config { | ||
id: 0, | ||
router: router_config, | ||
cluster: None, | ||
console: Some(console_settings), | ||
console: None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears to be a defensive diagnostic mechanism kept in place to debug the server when some rare flaky failure occurs, as running the test again by turning it ON may not reproduce the failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for finding and fixing this bug on closing the connection a bit too soon. However, one has to do the same for QoS 2.
Co-authored-by: Didier Wenzek <didier.wenzek@free.fr>
This should fix thin-edge#3021
f4bc3fd
to
51408a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a semaphore with a unique permit is a nice idea to notify the receiver loop from the sender loop that the connection has to be closed. However, I don't fully get how the receiver loop close when there is stream of input messages.
@@ -17,14 +17,13 @@ log = { workspace = true } | |||
rumqttc = { workspace = true } | |||
serde = { workspace = true } | |||
thiserror = { workspace = true } | |||
tokio = { workspace = true, features = ["rt", "time"] } | |||
tokio = { workspace = true, features = ["rt", "time", "rt-multi-thread"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not used.
tokio = { workspace = true, features = ["rt", "time", "rt-multi-thread"] } | |
tokio = { workspace = true, features = ["rt", "time"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is used when running doctests as we use #[tokio::main]
there. Trying to test in isolation without it breaks, and that made checking for flakiness hard.
zeroize = { workspace = true } | ||
|
||
[dev-dependencies] | ||
anyhow = { workspace = true } | ||
mqtt_tests = { workspace = true } | ||
serde_json = { workspace = true } | ||
serial_test = { workspace = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
/// Prefixes a topic/session name with a module path and line number | ||
/// | ||
/// This allows multiple tests to share an MQTT broker, allowing them to | ||
/// run concurrently within a single test process. | ||
macro_rules! uniquify { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice idea!
broker.publish(topic, "msg 1").await?; | ||
broker.publish(topic, "msg 2").await?; | ||
broker.publish(topic, "msg 3").await?; | ||
broker.publish(topic, "msg 1").await.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious: why do you prefer to unwrap()
errors in a tests?
On my side, I prefer to raise such errors, to keep the test closer to real code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because there's no panic, ?
doesn't give any indication of which line failed, and that makes cases like this impossible to debug (if I don't know which message didn't publish, I've got almost no clue as to the nature of the bug).
d7a6f2e
to
529e71b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved. Thank you for your perseverance in finding the more appropriate solution to properly detect end of connection.
849f313
to
2483809
Compare
I confirm my approval after this commit using |
Proposed changes
This fixes some flaky unit tests I observed locally. One was the
mqtt_channel
tests, which were failing due to a genuine bug where the connection was closed before all messages were published. The other was a mapper test, which implicitly assumed messages would be delivered in a fixed order.A few other things that have changed in this PR:
--status-level fail
tocargo nextest run
injust test
. This stopscargo nextest
listing every passing test name, which was previously obscuring the error output in my terminal.serial-test
frommqtt_channel
in favour of using distinct topic/session names against a single broker in each teststd::env::var
calls withenv!
and some relative file paths with absolute paths so the test processes can be called without running them undercargo test
. This is important for using tools likecargo stress
Types of changes
Paste Link to the issue
I don't believe either of these had associated issues.
Checklist
cargo fmt
as mentioned in CODING_GUIDELINEScargo clippy
as mentioned in CODING_GUIDELINESFurther comments