-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configurable step duration transitions #124
Conversation
@@ -1224,6 +1250,14 @@ impl Engine<EthereumMachine> for AuthorityRound { | |||
let header = block.header().clone(); | |||
let first = header.number() == 0; | |||
|
|||
if let Some((_, dur)) = self.step_duration.range(0..=header.number()).last() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't on_prepare_block
the wrong place for this? The method is meant to only be called when this node starts to prepare a block. The current implementation calls it more often, but only in nodes with a signing key. With #103, we'd actually only call it when it is our turn.
But even non-validators need to keep track of the step duration, I think? Maybe we should call this in on_close_block
, which is called for our own as well as for imported blocks (if I'm not mistaken).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved it to on_close_block
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mapping doesn't seem to be supported when parsing spec from json file.
For spec
{
"name": "testnetpoa",
"engine": {
"authorityRound": {
"params": {
"stepDuration": {
"0": 5
},
"blockReward": "0xDE0B6B3A7640000",
"maximumUncleCountTransition": 0,
"maximumUncleCount": 0,
"validators": {
getting error
Loading config file from node1.toml
Spec json is invalid: invalid type: map, expected a hex encoded or decimal uint at line 7 column 3
Does this work? "stepDuration": {
"transitions": {
"0": 5
}
}, |
Same error |
It seems that some syntax checker needs to be corrected for the |
It needs |
Just try without quotes:
It works for me. |
e163ed1
to
c681342
Compare
c681342
to
c0482f7
Compare
Now it works for me in
format However I did a test with two validator nodes and noticed some strange things. Here is the piece of log:
and it stalled:
|
I'm not sure about (1) because in my case blocks are produced at 5 second intervals:
|
I also find it strange. Do you get correct timestamps after severl restarts? |
There is a problem with |
I tested other intervals successfully before the first change of duration. Then it always stalls. |
I also tried to launch it with |
This one also makes it stall:
|
Anything with more than one step duration will make it stall. I tried to move the step duration update from |
@phahulin I tried to launch it with
I think the reason is that our modified Parity can't call the |
Also there are no logs which would tell anything about the error. |
I added logging in the latest commit. |
This case works, though :)
|
It's still the same duration though. |
It might be useful to print the calculated step numbers (if there is such a thing); I still don't have a good overview how the steps are computed. If there's any other place where we just say something like |
@@ -144,7 +162,7 @@ impl Step { | |||
let now = unix_now(); | |||
let expected_seconds = self.load() | |||
.checked_add(1) | |||
.and_then(|ctr| ctr.checked_mul(self.duration as u64)) | |||
.and_then(|ctr| ctr.checked_mul(self.duration.load(AtomicOrdering::SeqCst) as u64)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line is the main problem:
With a constant step duration, step number n
just ended n * duration
seconds after the beginning of 1970. Now, with a variable step duration, we'd have to count the number of steps up to the latest duration change, and then continue counting from there, with the new duration!
It almost looks like it would be easier to change the duration at a specified time (or step number) instead of a block number. Then at least Step
wouldn't have to know about a particular block's timestamp, and could make the computation based on the step duration map only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could compute time by adding up all preceding (num_blocks * duration)
. But that wouldn't work if (or rather when) blocks appear that are longer than duration
because of excess load.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Step duration and block time don't necessarily coincide, and it looks like the step number computation is independent of actual blocks (in particular if the network wasn't started in 1970…); we should probably keep it that way and just compute it based on the table of step duration changes.
@@ -1220,7 +1226,7 @@ impl Engine<EthereumMachine> for AuthorityRound { | |||
|
|||
/// Change the next step duration if necessary and apply the block reward on finalisation of the block. | |||
fn on_close_block(&self, block: &mut ExecutedBlock) -> Result<(), Error> { | |||
self.update_step_duration(block.header().number() + 1); | |||
self.update_step_duration(block.header().number()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand this commit, but won't that still compute the step the wrong way in Step::duration_remaining
? E.g. let's say we have step duration 1 second and are 100 seconds into 1970, i.e. we're at step 100. Now we change the step duration to 3 seconds and then increment the step. So it will change its step number to 101, and Step::duration_remaining
will think that step 101 lasts until the 303rd second of 1970, making the step last more than 200 seconds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted the step duration update to happen between steps, so that the duration of the current step is not changed as opposed to when duration
is changed directly.
let next_dur = self.next_duration.load(AtomicOrdering::SeqCst); | ||
let cur_dur = self.duration.swap(next_dur, AtomicOrdering::SeqCst); | ||
if cur_dur != next_dur { | ||
let starting_sec = unix_now().as_secs() as usize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be safer to use the block's timestamp instead of our local time here, so the validators' step counting doesn't go out of sync.
(And I still feel it would be even safer to just configure the step duration changes by time instead of by block.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can make it preserve the synchrony across step duration changes purely by time and the number of steps since the last step duration change. I don't think getting the block timestamp would be necessary; in any case it would not help to restore synchrony in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggested solution is in c025e11.
let next_step = ( | ||
(unix_now().as_secs() as usize - starting_sec) / | ||
(self.duration.load(AtomicOrdering::SeqCst) as usize) | ||
) + starting_step; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still concerned that this value is going to be different in two validators if one of them handles the block earlier than the other. Won't that cause them to go out of sync?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is possible too, although the likelihood is extremely small. As a remedy I've been considering linking the duration update with a step number or scheduling an update one block ahead. If the update is already scheduled in a previous on_close_block
, there will be no race conditions at the time of step duration change because it will be done in Step::increment
but in that case we must forbid changing the step duration without any blocks in between.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
linking the duration update with a step number
Yes, I think that's the right approach. But making it a timestamp might be even more convenient: Then you could just specify the time of the change in the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you prefer having timestamps in the spec rather than block numbers? Than we'd have to map timestamps to block numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we'd ever have to map them at all: The step duration would just change at the steps corresponding to those timestamps, independent of any block numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but we can just define that they only affect the following step.
I think that's still more convenient than having to calculate and specify a step number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the time being I implemented a simpler modification with the step numbers. If needed it can be further extended to timestamps. However this only has the difference that the step numbers no longer need to be calculated for writing the spec. There is another issue with calibrate
which needs to be fixed in either case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way using steps (or block numbers) or timestamps in the spec seems complementary because internally we have to compute both in order to synchronise blocks. Which is why if we use timestamps in the spec, there would be internal synchronisation of block numbers which is similar to the step number routine. If using timestamps is only improves user convenience, I'd prefer to stay with block numbers for now and maybe convert to timestamps later if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I guess so. But if we used block numbers, I think we'd have to use the block's timestamp to change the step number, to guarantee that the nodes don't go out of sync. And that might complicate things more because our internal step counter could already have progressed further in the meantime. And what if there's a reorg that undoes a block that changed the step duration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following d170486 step numbers coincide with block numbers. The initial_step
atavism has been removed.
StepDuration::Single(u) => { | ||
let mut durs: BTreeMap<u64, u16> = BTreeMap::new(); | ||
durs.insert(0, map_step_duration(u)); | ||
durs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe simpler:
iter::once((0, map_step_duration(u))).collect()
duration: u16, | ||
inner: AtomicU64, | ||
/// Duration of the current step. | ||
current_duration: AtomicU16, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation is off. (Spaces/tabs…)
if let Some(&next_dur) = self.durations.get(&next_step) { | ||
let prev_dur = *self.durations.range(0 .. next_step).last().expect("step duration map is empty").1; | ||
let prev_starting_sec = self.starting_sec.load(AtomicOrdering::SeqCst); | ||
let prev_starting_step = self.starting_step.load(AtomicOrdering::SeqCst); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether we need to put the whole Step
in a Mutex
instead of making its fields atomic individually: Things probably go wrong if some of those fields are mutated while increment
is running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only place where these fields are modified apart from the constructor. I agree though that if we start mutating them elsewhere we need an atomic lock on them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I'm not sure it's even guaranteed that no two instances of increment
run at the same time. 😬
And even if it is, it would be safer in a single Mutex
(or RwLock
?), in case future modifications change this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the Atomic
stuff was only supposed to help mutating fields of a Step
that's passed by an immutable reference. Synchronisation is simulated by calibrate
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but if the Step
itself were passed as an Arc<Mutex<Step>>
, that interior mutability wouldn't be necessary.
(I feel like this pattern—interior mutability of fields—is overused in Parity: It creates the illusion of thread-safety.)
calibrate: our_params.start_step.is_none(), | ||
duration: our_params.step_duration, | ||
current_duration: AtomicU16::new(duration), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Also indentation.)
let engine = Arc::new( | ||
AuthorityRound { | ||
transition_service: IoService::<()>::start()?, | ||
step: Arc::new(PermissionedStep { | ||
inner: Step { | ||
inner: AtomicU64::new(initial_step), | ||
inner: AtomicU64::new(0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we compute the actual current step number on startup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually another point is true: there is a configurable start_step
which got thrown away with the water. I have to take it into account in order not to break the spec.
Done in #152. |
Fixes #122.
The PR adds a configuration map and at the same time keeps the previous configuration option for backwards compatibility.
I'll rebase it on top of the rebased
aura-pos
as soon as we update the working branch.