-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: avoid FuturesUnordered #1647
Conversation
Workaround for #1646 and likely makes more sense in the context of Tokio usage anyway. In general `JoinSet` is simply used instead.
CI is sad because MSRV bump to 1.71 is needed due to the new dep. |
/netsim |
|
Ran tests with valgrind last night. Made 52 loops before it died. No errors reported. LGTM |
So what exactly is the difference in terms of how the futures are being polled? Or is this just trying something until it works? |
this is not about how futures are polled, it is about how the |
iroh-gossip/src/net/util.rs
Outdated
.pending | ||
.join_next() | ||
.await | ||
.expect("not canceled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it not be safer to log the JoinError and skip to the next item in the set if it was cancelled? I don't even know right now if this is a guarantee upheld. It was still in pending_peers
i guess but is that a guarantee? And it seems like something that could easily change in a refactor without realising.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, to try to find the first working one
iroh-gossip/src/net/util.rs
Outdated
std::task::Poll::Ready(Some(Err(e))) => { | ||
// Should not happen unless the task paniced or got canceled | ||
// TODO: is this what we want to do here? | ||
panic!("{:?}", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log the error? panicking because we couldn't dial something seems a bit harsh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe, usually panics in one part of the code result in the other parts not quite working as intended anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now only logs
iroh-net/src/netcheck/reportgen.rs
Outdated
Some(Ok(Err(_))) => (), | ||
Some(Err(e)) => { | ||
warn!("fatal probes error: {:?}", e); | ||
probes.abort_all(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if one probeset pancicked we don't need to abort all the other probesets I think. it should be fine to let the other probesets continue. (same for cancelled, but less likely to happen due to how we cancel things i think)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
continues now
iroh-net/src/netcheck/reportgen.rs
Outdated
self.handle_abort_probes(); | ||
} | ||
None => { | ||
probes.abort_all(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is nothing to abort anymore, they're all done. it probably doesn't do any harm though, but line 303 already ensures it cleans up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
@@ -253,6 +252,7 @@ impl Actor { | |||
|
|||
_ = &mut probe_timer => { | |||
warn!("probes timed out"); | |||
probes.abort_all(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The probes are aborted on line 303 and they get a log message there. if you abort here i'm not sure if you get the log message. (admittedly the log message there is a bit misleading in case of a timeout, maybe it could be tweaked)
Then self.handle_abort_probes()
makes sure that we break out of the loop. It's designed this way because it should not matter whether we fall in this condition by the timeout or by the actor message, and this way everything is done by the same code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
iroh-net/src/netcheck/reportgen.rs
Outdated
async fn prepare_probes_task( | ||
&mut self, | ||
) -> Result<FuturesUnordered<Pin<Box<impl Future<Output = Result<ProbeReport>>>>>> { | ||
async fn prepare_probes_task(&mut self) -> Result<JoinSet<Result<ProbeReport>>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This no longer prepares a future but will spawn the futures on tasks and they'll start running right away. Maybe rename the function to start_probe_tasks
or spawn_probe_tasks
and update the first line of the docstring as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed
iroh-net/src/netcheck/reportgen.rs
Outdated
Err(err) => { | ||
warn!("fatal probe set error, aborting: {:#}", err); | ||
return Err(anyhow::anyhow!(err)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a single probe panics (or is cancelled) I think it's probably fine to continue the other probes. Not really sure if all of them would panic as they run the same code... but really none should panic so I guess I'd still choose to carry on.
Not sure if we can log the panic in a reasonable way though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kept the logging, can continue
now
iroh-net/src/netcheck/reportgen.rs
Outdated
@@ -266,11 +266,19 @@ impl Actor { | |||
} | |||
|
|||
// Drive the probes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe update this comment to say something like "wait for probe tasks to finish" as this no longer drives the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
Oh boy... Did you open an issue? |
of course : #1646 and rust-lang/futures-rs#2781 |
## Description Workaround for #1646 and likely makes more sense in the context of Tokio usage anyway. In general `JoinSet` is simply used instead.
## Description Workaround for #1646 and likely makes more sense in the context of Tokio usage anyway. In general `JoinSet` is simply used instead.
Description
Workaround for #1646 and likely makes more sense in the context of Tokio usage anyway.
In general
JoinSet
is simply used instead.