-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate and fix sidechain integration test failures #1087
Conversation
Fix calculation of remaining slot time for log output (cherry picked from commit 03b8616)
(cherry picked from commit 1b52a2e)
(cherry picked from commit 185030d8178bafee0fd04b0760d7006cc6bd857b)
(cherry picked from commit 4299c20351ef891f1a1173333a3da74051de5971)
(cherry picked from commit c814f99fd647670281c38b7aa21d3769c27f77c0)
Since getters are executed immediately and are not put into the TOP pool anymore (where order is guaranteed), we can get wrong results because of timings
Also increased the slot fraction for calls again to 0.7 (from 0.4)
restart: always | ||
restart: "no" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a minor change, discovered while running the tests in docker. It's confusing in a test setting when the worker automatically restarts after a crash.
if slot.duration_remaining().is_none() { | ||
warn!("No time remaining in slot, skipping AURA execution"); | ||
return Ok(()) | ||
} | ||
|
||
log_remaining_slot_duration(&slot, "Before AURA"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some log messages and timers to help diagnose the timings and phases of the sidechain slot
pub const BLOCK_PROPOSAL_SLOT_PORTION: f32 = 0.8; | ||
pub const BLOCK_PROPOSAL_SLOT_PORTION: f32 = 0.7; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small reduction in the fraction of the slot time we use for executing calls. When running on CI, I found that we often don't have enough time to broadcast a sidechain block before the next slot starts (resulting in duplicate block numbers and discarded blocks)
proposing_remaining_duration(&slot_info, duration_now()) > SLOT_DURATION / 2 | ||
&& proposing_remaining_duration(&slot_info, duration_now()) | ||
< SLOT_DURATION.mul_f32(BLOCK_PROPOSAL_SLOT_PORTION + 0.01) | ||
proposing_remaining_duration(&slot_info, duration_now()) | ||
< SLOT_DURATION.mul_f32(BLOCK_PROPOSAL_SLOT_PORTION + 0.01) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first condition of this assertion > SLOT_DURATION / 2
is only true if the BLOCK_PROPOSAL_SLOT_PORTION
is > 0.5
, which is not guaranteed (and so this test failed when I lowered the value to 0.4
for testing)
tokio = "*" | ||
tokio = { version = "1.6.1", features = ["full"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Necessary change to allow running cargo test
in this crate alone.
pub fn duration_remaining(&self) -> Option<Duration> { | ||
let duration_now = duration_now(); | ||
if self.ends_at <= duration_now { | ||
return None | ||
} | ||
Some(self.ends_at - duration_now) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a convenience function to the SlotInfo
struct to get the remaining duration in this slot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for this and ouch for those re-introduced sleeps. 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
This issue is to investigate the intermittent failures we see in our integration tests.
Related GH issue: #1023
Observations:
Conclusions
sleep
s in our integration tests scripts at some point. Those sleeps were placed between executing calls and getters.Solution
sleep
s in our integration test scripts again, so that we wait for the calls to be executed and included in a sidechain block before calling a getter to verify that effect.Closes #1023