-
Notifications
You must be signed in to change notification settings - Fork 110
Conversation
@@ -384,7 +384,7 @@ func (r *Retrieval) handleChunkDelivery(ctx context.Context, p *Peer, msg *Chunk | |||
} | |||
|
|||
// RequestFromPeers sends a chunk retrieve request to the next found peer | |||
func (r *Retrieval) RequestFromPeers(ctx context.Context, req *storage.Request, localID enode.ID) (*enode.ID, error) { | |||
func (r *Retrieval) RequestFromPeers(ctx context.Context, req *storage.Request, localID enode.ID) (*enode.ID, func(), error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to have a short explanation what is the purpose of the returned function, as now, it requires to go through the code. Or to name it in function signature.
network/retrieval/retrieve_test.go
Outdated
t.Fatal("not enough nodes up") | ||
} | ||
// allow the two nodes time to set up the protocols otherwise kademlias will be empty when retrieve requests happen | ||
time.Sleep(50 * time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that if this sleep is too short, the test passes but from different reason then this test is designed for? It would not have any peers in kademlia. Should there be some sort of check to see if peers are connected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would fail in both cases. i added a check nevertheless
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Unfortunately, now it's timing out https://travis-ci.org/ethersphere/swarm/jobs/652344397#L1236.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pushed a fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
We have a memory leak on production and it seems that this is the problem:
addRetrieval
RemoteFetch
times out on context or defined search timeoutPeer
never gets deletedPeer
struct leaksIt is not possible to create a regression test on
netstore
due to circular dependencies.EDIT: I have added a test that checks that
NoSuitablePeer
error is emitted correctly from netstore. To verify that the leak happens, modifyexpireRetrieval
to the following, then runTestNoSuitablePeer
test. it will panic because the record is found in memory, exacerbating the problem.