fix: memory leak in latency tracker on timeout after cancel #164

dirkmc · 2019-07-31T15:46:25Z

May be the cause of #162

Stebalien · 2019-07-31T19:50:35Z

sessionpeermanager/sessionpeermanager.go

-	if !ok || !data.lt.WasCancelled(ptm.k) {
+	// If the request was cancelled, make sure we clean up the request tracker
+	if ok && data.lt.WasCancelled(ptm.k) {
+		data.lt.RemoveRequest(ptm.k)


@hannahhoward can you remember why we don't just remove the request on cancel?

Ah, nevermind. @dirkmc pointed me to

go-bitswap/sessionpeermanager/sessionpeermanager_test.go

Line 335 in 5204f40

// if we make a targeted request that is then cancelled, but we still

.

So, I think the correct fix is to detect that all peers have sent back their responses. This fix will only record one latency.

However, this shouldn't be an issue as we should be hitting the timeout and removing all these trackers anyways. Something else is fishy.

I think the latency tracker records latency per-peer.

Each peerData has an associated latency tracker:

type peerData struct { hasLatency bool latency time.Duration lt *latencyTracker }

The latency tracker keeps track of the time that a request for a CID was sent to a peer, whether a cancel was then issued, and how long it took for a response to arrive:

type requestData struct { startedAt time.Time wasCancelled bool timeoutFunc *time.Timer } type latencyTracker struct { requests map[cid.Cid]*requestData }

There are broadcast requests, and requests to a specific set of peers

if the request was a broadcast, then if the request times out we clean up the latency tracker data for the request for that CID

if the request was to a specific peer, then if the request times out we call recordResponse(). recordResponse() calls data.AdjustLatency() on the peerData, which in turn cleans up the latency tracker data for the request for that CID.
The problem is that if the request was cancelled, we don't call recordResponse() so the latency tracker data for that request never gets cleaned up.

I think your initial suggestion makes the most sense - I don't know if it's worth the complexity and extra memory required to wait for a timeout after a cancel. I think we should just clean up the latency tracker information as soon as there is a cancel.

Ah, I missed the per-peer part. This code looks correct.

I think we should just clean up the latency tracker information as soon as there is a cancel.

Without tracking latencies for canceled wants, we may not correctly update latencies. For example:

Assume peers 1 and 2 have recorded latencies of 1s.

Ask peers 1 and 2.

2 seconds pass.

Get a response for peer 1.

Record a latency of 2s for peer 1.

Cancel the request to peer 2.

Remove the latency tracker.

We've just penalized peer 1 for responding before peer 2.

On the other hand, the current code will penalize peer 1 and not peer 2 if peer 2 processes the cancel before sending the block.

The correct solution may be randomization. That is:

Record the first latency and remove all other trackers.

Make sure to randomly sample peers that aren't the fastest.

Ah yes you're right, that's a tricky problem to solve 🤔

Stebalien · 2019-08-01T19:01:20Z

We may also be able to reduce the per-peer logic/state by centralizing this logic a bit.

…r-memory-leak fix: memory leak in latency tracker on timeout after cancel This commit was moved from ipfs/go-bitswap@f62bf54

fix: memory leak in latency tracker on timeout after cancel

213edd7

Stebalien reviewed Jul 31, 2019

View reviewed changes

Stebalien approved these changes Aug 1, 2019

View reviewed changes

Stebalien merged commit f62bf54 into ipfs:master Aug 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: memory leak in latency tracker on timeout after cancel #164

fix: memory leak in latency tracker on timeout after cancel #164

dirkmc commented Jul 31, 2019 •

edited

Loading

Stebalien Jul 31, 2019

Stebalien Aug 1, 2019

dirkmc Aug 1, 2019

Stebalien Aug 1, 2019

dirkmc Aug 1, 2019

Stebalien commented Aug 1, 2019

fix: memory leak in latency tracker on timeout after cancel #164

fix: memory leak in latency tracker on timeout after cancel #164

Conversation

dirkmc commented Jul 31, 2019 • edited Loading

Stebalien Jul 31, 2019

Choose a reason for hiding this comment

Stebalien Aug 1, 2019

Choose a reason for hiding this comment

dirkmc Aug 1, 2019

Choose a reason for hiding this comment

Stebalien Aug 1, 2019

Choose a reason for hiding this comment

dirkmc Aug 1, 2019

Choose a reason for hiding this comment

Stebalien commented Aug 1, 2019

dirkmc commented Jul 31, 2019 •

edited

Loading