-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[r2r] Fix a race condition in AbortableQueue
#1528
Conversation
AbortableQueue
AbortableQueue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nice! one suggestion.
@@ -153,32 +160,28 @@ impl QueueInner { | |||
/// Inserts the given future `handle`. | |||
fn insert_handle(&mut self, handle: oneshot::Sender<()>) -> FutureId { | |||
match self.finished_futures.pop() { | |||
Some(finished_id) => { | |||
// We can reuse the given `finished_id`. | |||
Ok(finished_id) => { | |||
self.abort_handlers[finished_id] = handle; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about doing a check here before indexing ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good note! If there is a wrong FutureId
in finished_futures
, we can get 2 problems:
- If
finished_id < abort_handlers.len()
, we'll reset a valid future handle atabort_handlers[finished_id]
, and it will be aborted - that can lead to a very complicated debugging process; - If
finished_id >= abort_handlers.len()
, we'll get a panic.
If the 2)
option happens, it means that we are in a wrong state, and later some other wrong finished_id
can appear and lead to the 1)
option. I think it's worth to panic in order to let the developers to quick fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright then. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still an open topic, so if there is a way to improve it, I'll be glad to read 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, I think we can handle the case for 2)
I mean we should do nothing
if option 2)
is true and log something like current index doesn't belong to any future
instead of panic. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's interesting! Will consider implementing it tomorrow :) Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the checking, thank you for the note!
self.abort_handlers[finished_id] = handle; | ||
// The freed future ID. | ||
finished_id | ||
}, | ||
None => { | ||
// There are no finished future IDs. | ||
Err(_) => { | ||
self.abort_handlers.push(handle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here ..line 165
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix! Just a question.
@@ -1,6 +1,7 @@ | |||
use crate::executor::abortable_system::{AbortableSystem, InnerShared, InnerWeak, SystemInner}; | |||
use crate::executor::spawner::{SpawnAbortable, SpawnFuture}; | |||
use crate::executor::{spawn, AbortSettings, Timer}; | |||
use crossbeam::queue::SegQueue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SegQueue
is linked list implementation with dynamic allocation which causes significant performancei impacts. Is this must-have
for this implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SeqQueue
is definitely not the best solution, but in comparison with Arc<PaMutex<Vec<FutureId>>>
, SeqQueue
is not that bad.
I performed a few tests with a fixed size ArrayQueue
and PaMutex<Vec<FutureId>>
, and here the results:
- ITEMS_NUMBER = 5_000_000:
push_values_lock_free_dynamic took 337.420958ms: items=5000000
pop_values_lock_free_dynamic took 371.537791ms: items=5000000, missing=732
====
vvvv
pop_values_lock_free_array took 328.117666ms: items=5000000, missing=15717
push_values_lock_free_array took 328.12ms: items=5000000
====
vvvv
push_values_mutex took 3.156110208s: items=5000000
pop_values_mutex took 3.154702583s: items=5000000, missing=24710287
====
- ITEMS_NUMBER = 500_000:
push_values_lock_free_dynamic took 37.359958ms: items=500000
pop_values_lock_free_dynamic took 41.183875ms: items=500000, missing=0
====
push_values_lock_free_array took 37.030666ms: items=500000
pop_values_lock_free_array took 37.027375ms: items=500000, missing=4524
====
push_values_mutex took 368.443375ms: items=500000
pop_values_mutex took 369.029791ms: items=500000, missing=3175914
The code is available here.
Unfortunately, we can't approximate total number of futures, but the difference between ArrayQueue
and SegQueue
is not so high to use an approximation.
Also I agree with you that it's not an optimal solution because when we spawn a future, we need to lock the PaMutex<Inner>
, mutex and then try to pop a FutureId
from the Inner::finished_futures
lock-free collection.
But I couldn't find a better solution to avoid using the second thread-synchronization primitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed by #1528 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed by #1528 (comment)
Awesome :)
AbortableQueue
AbortableQueue
* This allows us to avoid using `SegQueue`
While I tried to find another solution to avoid using I decided to refactor the abortable system the way that it can't be longer used once @artemii235 @ozkanonur please review the changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2nd review iteration
impl From<EnableSlpError> for InitTokensAsMmCoinsError { | ||
fn from(e: EnableSlpError) -> Self { | ||
match e { | ||
EnableSlpError::GetBalanceError(balance_err) => { | ||
InitTokensAsMmCoinsError::CouldNotFetchBalance(balance_err.to_string()) | ||
}, | ||
EnableSlpError::UnexpectedDerivationMethod(internal) | EnableSlpError::Internal(internal) => { | ||
InitTokensAsMmCoinsError::Internal(internal) | ||
}, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this too specific code block for generic platform_coin_with_tokens
? I would do this under the slp modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good note! Moved to bch_with_tokens_activation.rs
match self.futures.entry(future_id) { | ||
let futures = match self { | ||
SimpleMapInnerState::Ready { futures, .. } => futures, | ||
SimpleMapInnerState::Aborted => return false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about refactoring the result type to some meaningful enum? There are multiple spots where false
can be returned, with enums we can clarify the reason more specificly. Just a suggestion, not a blocker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're totally right! Done
false
on SimpleMapInnerState::Aborted
actually led to an unnecessary spawning in lp_ordermatch.rs, now it should be fine.
* Move `impl From<EnableSlpError> for InitTokensAsMmCoinsError` to bch_with_tokens_activation.rs * Return `AbortedError` on `AbortableSimpleMap` operations
AbortableQueue
AbortableQueue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
amazing 🔥
# Conflicts: # mm2src/coins/lightning.rs # mm2src/coins/lightning/ln_errors.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥
Thanks to @borngraced for funding and reporting this bug.
There was a race condition in
AbortableQueue
, so the following code was leading to the panic: