-
Notifications
You must be signed in to change notification settings - Fork 342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
channel sender / receiver disagree on content #834
Comments
I'm believe #827 is very likely to solve this issue. cc @dignifiedquire Until then, you can use the |
Heya I threw in async-channel and I do see the same issue :( - I'll do some more digging perhaps I find something interesting (fingers crossed) |
@Licenser I'm a bit confused - how can the sender and receiver side of the same channel report different
Are you sure |
Ja I went through that I'm super confused too, I'm about 95% sure then again there is 5% of me being just silly and missing something super obvious. That's totally possible but after 3 days of hunting this I figured I look at the possibility of another cause, especially as I saw the same a different combination of channels too. I'll go dig more into this tomorrow. I'm completely puzzled by this. Next thing I came up with is mem::transmuting the sender/receiver to something I can expect and comparing the pointers in the Arc to be really the same and things getting mixed up somewhere. |
I've run this in debug and it looks like they're the same both times the debugger says for the Arc pointer:
I couldn't easily get this in release mode yet as the compiler is doing so much magic that I lose access to the inspection capabilites |
I added printing of the arc pointer to debug of sender and receiver and the log starts hanging with the last few entries attached below. I'm about 99% sure now there is a race condition somewhere in the channels since replacing the sending code:
with:
resolves the issue that indicates to me (and I might be wrong here so it's not 100% :P) that send misses the update of "now you can progress" in some case while the
|
update: Changing https://github.com/stjepang/async-channel/blob/master/src/lib.rs#L242 to
also resolves the issue. So I think everything points to some issue with the notifications |
Okay, next update, I reduced the reproducibility to a single line change: tremor-rs/tremor-runtime@d3ce267 This seems to happen when there is a try_recv loop without an explicit yield so it seems that prevents the sender to be notified. |
And good news with all those things found out I managed to create a minimal reproduction case: |
@Licenser I took a look at loop-test and found: task::spawn(async move {
loop {
if let Ok(m) = off_rx.try_recv() {
r_tx.send(m).await;
}
}
}); I think this is a core issue - the loop is spinning with task::spawn(async move {
while let Ok(m) = off_rx.recv().await {
r_tx.send(m).await;
}
}); The current executor in async-std works under the assumption that futures always yield in a reasonable amount of time. The new scheduler in #836 assumes that some features might be 'bad' and loop forever, so it more aggressively wakes up worker threads. Can you perhaps try your original code with the #836 branch and see if that resolves the issue? |
#836 does solve the issue for cores > tasks (while sad it makes sense that it won't solve it for tasks > cores) |
But that's kind of an impossible problem to solve, right? :) |
Ja that's why it makes sense 😂 since there is a PR I should probably close this! |
Ahead of time let me apologize for this not ideal bug request since I couldn't manage to reproduce it outside of a rather complex application.
The issue: "randomly" (or under conditions I couldn't rack down yet) the sender/receiver of a channel disagree on the amount of messages in it.
My gut feeling says there is some kind of race condition I've not yet found but I'm going to file this issue in the hope that there is an "obvious" answer I'm missing :)
Basically what happens, and I've seen the same issue on two different code path, the sender thinks the channel is full, while the receiver think it's empty leading to the sender never sending another message and the receiver never being able to read a message.
This seems to trigger more often when using the
merge()
on two channels.The code that triggers this has a layout of:
where the processor prioritizes
c3
overc2
and the source loop usestry_send
to ensure even ifc1
is full it can read fromc4
.The issue manifests in that the
c4
sender thinksc4
is fulland the receiver thinks c4 is empty
It is al worth noting that this only triggers in release mode, as long as each step is "slow enough" it seems not to manifest.
This is reproducible from https://github.com/wayfair-tremor/tremor-runtime/pull/new/async-channel-issue by running:
I'll add a few lines of debug output below for the sake of having context.
The text was updated successfully, but these errors were encountered: