Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unpin the yamux dependency #7532

Merged
merged 1 commit into from
Oct 20, 2021
Merged

unpin the yamux dependency #7532

merged 1 commit into from
Oct 20, 2021

Conversation

Stebalien
Copy link
Member

@Stebalien Stebalien commented Oct 18, 2021

This has been pinned for a while under suspicion that it was causing graphsync stalls. However, the libp2p team has been unable to reliably reproduce the issue and believes that it's likely a graphsync issue.

Leaving it "pinned" indefinitely isn't going to fix the issue, so I'm filing this PR hoping we can find a solution.

My latest theory is that graphsync is deadlocking somewhere when sending a message blocks on the other side reading the message. However, to confirm this, we'd need goroutine dumps from both sides of a stall showing that:

  1. One side is writing on the yamux stream.
  2. The other side is reading.
  3. It's actually stalled.

This could also be done with sufficient logging.

Libp2p issue: libp2p/go-yamux#61

💣 👈 👇 👉 ❓

This has been pinned for a while under suspicion that it was causing
graphsync stalls. However, the libp2p team has been unable to reliably
reproduce the issue and believes that it's likely a graphsync issue.

Libp2p issue: libp2p/go-yamux#61
@Stebalien
Copy link
Member Author

cc @marten-seemann & @BigLep

@codecov
Copy link

codecov bot commented Oct 18, 2021

Codecov Report

Merging #7532 (3b2e8c4) into master (98ff1c4) will decrease coverage by 0.11%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7532      +/-   ##
==========================================
- Coverage   39.96%   39.85%   -0.12%     
==========================================
  Files         632      632              
  Lines       66926    66926              
==========================================
- Hits        26750    26675      -75     
- Misses      35560    35625      +65     
- Partials     4616     4626      +10     
Impacted Files Coverage Δ
markets/loggers/loggers.go 89.28% <0.00%> (-10.72%) ⬇️
chain/stmgr/call.go 67.87% <0.00%> (-7.28%) ⬇️
chain/events/observer.go 71.64% <0.00%> (-6.72%) ⬇️
extern/sector-storage/manager_calltracker.go 57.70% <0.00%> (-4.85%) ⬇️
chain/actors/builtin/paych/paych.go 19.31% <0.00%> (-4.55%) ⬇️
chain/sub/incoming.go 54.43% <0.00%> (-3.38%) ⬇️
miner/miner.go 55.29% <0.00%> (-2.65%) ⬇️
chain/exchange/client.go 52.96% <0.00%> (-1.70%) ⬇️
itests/kit/blockminer.go 93.65% <0.00%> (-1.59%) ⬇️
chain/store/store.go 63.00% <0.00%> (-1.34%) ⬇️
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 98ff1c4...3b2e8c4. Read the comment docs.

@Stebalien
Copy link
Member Author

It also came up that there was an underflow issue in the graphsync allocation budgeting system that could cause stalls. Given that the repro for this always involved graphsync, I'm finding it more and more likely that changing how yamux receive buffers work triggered an underlying issue in graphsync.

@raulk
Copy link
Member

raulk commented Oct 18, 2021

That was probably the case, @Stebalien. Just to play it safe, can we ask Estuary (or some other very active deal maker) to upgrade to the latest yamux before we merge this in?

@Stebalien
Copy link
Member Author

Once we figure out the stalls they're experiencing... yes.

@magik6k magik6k merged commit 8182ffc into master Oct 20, 2021
@magik6k magik6k deleted the chore/unpin-yamux branch October 20, 2021 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants