identify: stuck at reading multistream header #2379

dennis-tra · 2023-06-19T07:20:26Z

I recently updated Nebula to use go-libp2p v0.28.0. Further, when I crawl another peer, I started to explicitly wait for the Identify exchange to complete before extracting data from the peerstore.

Over the weekend, I saw that two crawls, which usually take ~5m, didn't terminate after >12h. I extracted a goroutine dump which is attached below.

From that dump, I found out that one of the 1,000 crawl-workers is waiting to receive an event from a channel:

I dug deeper into what it's waiting for and found that it should be this select- statement. Here the excerpt:

select {
case <-ctx.Done():
   // ...
case <-c.host.IDService().IdentifyWait(conn):
   // ...
}

So, it's waiting for the identify exchange to complete. If I interpret the goroutine dump correctly, it's hanging at:

SelectProtoOrFail
readMultistreamHeader (go-multistream)
ReadNextToken (go-multistream)

It seems like it can't read the multistream header from the identify stream. My hypotheses: 1) the remote peer is just super slow to respond, or 2) something hung internally, or 3) something else?

I had expected some of the various timeouts across the stack to kick in (transport dial timeouts, security handshake timeouts, connection timeouts, etc.) and cancel the exchange.

My fix, for now, is to time out after 15s:

timeoutCtx, cancel := context.WithTimeout(ctx, 15*time.Second)
defer cancel()

select {
case <-timeoutCtx.Done():
   // ...
case <-c.host.IDService().IdentifyWait(conn):
   // ...
}

However, I think this leaks resources if the above happens again.

Hyper speculative: If this proves to be not and issue on my end, could this be abused in a slowloris-like fashion?

goroutine.zip

The text was updated successfully, but these errors were encountered:

Wondertan · 2023-06-19T11:18:37Z

We had similar issues a while ago, and one of the possible reasons is explained in #2361. This behavior happens when remote peer deadlocks during the connection handling in Swarm. The linked case happens because the Swarm is deadlocked on emitting the event. This recent issue may also explain it, so trying on future v0.28.1 is worth a shot.

dennis-tra · 2023-06-19T11:33:25Z

@Wondertan thanks for the pointer 👍

This behavior happens when remote peer deadlocks during the connection handling in Swarm

Even if the remote peer hangs, I still think the local peer should terminate the stream/connection after a timeout.

marten-seemann · 2023-06-19T15:39:02Z

Even if the remote peer hangs, I still think the local peer should terminate the stream/connection after a timeout.

You're right. We should set a stream deadline. Unfortunately, the msmux doesn't take a context, which would be the better way to solve this.

Maybe we should add that API, even if it means spawning an additional Go routine. This has bitten us before.

MarcoPolo · 2023-06-19T22:36:45Z

I'm still curious what caused the original hang.

dennis-tra changed the title ~~Identify hangs indefinitely~~ identify: stuck at reading multistream headers Jun 19, 2023

dennis-tra changed the title ~~identify: stuck at reading multistream headers~~ identify: stuck at reading multistream header Jun 19, 2023

marten-seemann mentioned this issue Jun 19, 2023

identify: set stream deadlines for Identify and Identify Push streams #2382

Merged

marten-seemann closed this as completed in #2382 Jun 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

identify: stuck at reading multistream header #2379

identify: stuck at reading multistream header #2379

dennis-tra commented Jun 19, 2023 •

edited

Loading

Wondertan commented Jun 19, 2023 •

edited

Loading

dennis-tra commented Jun 19, 2023

marten-seemann commented Jun 19, 2023

MarcoPolo commented Jun 19, 2023

identify: stuck at reading multistream header #2379

identify: stuck at reading multistream header #2379

Comments

dennis-tra commented Jun 19, 2023 • edited Loading

Wondertan commented Jun 19, 2023 • edited Loading

dennis-tra commented Jun 19, 2023

marten-seemann commented Jun 19, 2023

MarcoPolo commented Jun 19, 2023

dennis-tra commented Jun 19, 2023 •

edited

Loading

Wondertan commented Jun 19, 2023 •

edited

Loading