Skip to content

Conversation

@hannahhoward
Copy link
Collaborator

@rvagg I believe your WaitForNoActiveTasks was just one small concurrency fix away from being the solution. The issue here is that FinishTask is currently an asynchronous call -- it simply sends the finish message to the response manager and moves on. That means the worker queue can finish draining before the response manager thread actually processes the finish message.

The easy solution, or at least the proof of concept solution, is simply to make the FinishTask method synchronous - it doesn't return till the message is processed (this is done by using a return channel).

@hannahhoward
Copy link
Collaborator Author

also my M1 machine fails to get any failures reliably, even with counts > 1000. So I'm not sure how to test other than push to CI.

Copy link
Member

@rvagg rvagg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Started my repro loop on this branch and forgot about it, came back and it was still running with no failures now that it's done on assertCompleteRequestWith().

Also, I've just found a need to implement almost the exact same code RequestManager, getting the executor wait until the request is fully released so I can properly collect traces.

This is a good pattern because it means we have more strict bounds on the parallelism with no lose edges.

@rvagg rvagg merged commit 493ff1c into main Nov 30, 2021
@rvagg rvagg deleted the fix/test-flaky branch November 30, 2021 04:18
marten-seemann pushed a commit that referenced this pull request Mar 2, 2023
* feat(graphsync): check current peer channels

new method gets diagnostics on current dt-channel id -> request id mapping, also fixes up some
logging

* docs(graphsync): add some code comments
marten-seemann pushed a commit that referenced this pull request Mar 2, 2023
Improve GraphSync Transport Diagnostics (#287)

* feat(graphsync): check current peer channels

new method gets diagnostics on current dt-channel id -> request id mapping, also fixes up some
logging

* docs(graphsync): add some code comments

fix(deps): update go-graphsync v0.10.7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants