Skip to content
This repository has been archived by the owner on Feb 1, 2023. It is now read-only.

fix: memory leak in latency tracker on timeout after cancel #164

Merged
merged 1 commit into from
Aug 1, 2019

Conversation

dirkmc
Copy link
Contributor

@dirkmc dirkmc commented Jul 31, 2019

May be the cause of #162

if !ok || !data.lt.WasCancelled(ptm.k) {
// If the request was cancelled, make sure we clean up the request tracker
if ok && data.lt.WasCancelled(ptm.k) {
data.lt.RemoveRequest(ptm.k)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hannahhoward can you remember why we don't just remove the request on cancel?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nevermind. @dirkmc pointed me to

// if we make a targeted request that is then cancelled, but we still
.

So, I think the correct fix is to detect that all peers have sent back their responses. This fix will only record one latency.


However, this shouldn't be an issue as we should be hitting the timeout and removing all these trackers anyways. Something else is fishy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the latency tracker records latency per-peer.

Each peerData has an associated latency tracker:

type peerData struct {
	hasLatency bool
	latency    time.Duration
	lt         *latencyTracker
}

The latency tracker keeps track of the time that a request for a CID was sent to a peer, whether a cancel was then issued, and how long it took for a response to arrive:

type requestData struct {
	startedAt    time.Time
	wasCancelled bool
	timeoutFunc  *time.Timer
}

type latencyTracker struct {
	requests map[cid.Cid]*requestData
}

There are broadcast requests, and requests to a specific set of peers

I think your initial suggestion makes the most sense - I don't know if it's worth the complexity and extra memory required to wait for a timeout after a cancel. I think we should just clean up the latency tracker information as soon as there is a cancel.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I missed the per-peer part. This code looks correct.

I think we should just clean up the latency tracker information as soon as there is a cancel.

Without tracking latencies for canceled wants, we may not correctly update latencies. For example:

Assume peers 1 and 2 have recorded latencies of 1s.

  1. Ask peers 1 and 2.
  2. 2 seconds pass.
  3. Get a response for peer 1.
  4. Record a latency of 2s for peer 1.
  5. Cancel the request to peer 2.
  6. Remove the latency tracker.

We've just penalized peer 1 for responding before peer 2.

On the other hand, the current code will penalize peer 1 and not peer 2 if peer 2 processes the cancel before sending the block.

The correct solution may be randomization. That is:

  1. Record the first latency and remove all other trackers.
  2. Make sure to randomly sample peers that aren't the fastest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes you're right, that's a tricky problem to solve 🤔

@Stebalien Stebalien merged commit f62bf54 into ipfs:master Aug 1, 2019
@Stebalien
Copy link
Member

We may also be able to reduce the per-peer logic/state by centralizing this logic a bit.

Jorropo pushed a commit to Jorropo/go-libipfs that referenced this pull request Jan 26, 2023
…r-memory-leak

fix: memory leak in latency tracker on timeout after cancel

This commit was moved from ipfs/go-bitswap@f62bf54
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants