Skip to content
This repository has been archived by the owner on Feb 1, 2023. It is now read-only.

Too much duplicate data #366

Closed
Stebalien opened this issue Apr 22, 2020 · 6 comments
Closed

Too much duplicate data #366

Stebalien opened this issue Apr 22, 2020 · 6 comments
Assignees
Labels
kind/bug A bug in existing code (including security flaws)

Comments

@Stebalien
Copy link
Member

When testing on the live network, I'm getting 76% duplicate data when pinning dist.ipfs.io.

This should have been fixed so something is going very wrong here.

@Stebalien Stebalien added the kind/bug A bug in existing code (including security flaws) label Apr 22, 2020
@Stebalien
Copy link
Member Author

Unscientifically, turning off eagerly sending small blocks may help? I'm not sure.

However, it only reduced duplicate data from 76% -> 45%. And it seems like duplicate data becomes more of a problem as the query runs.

@Stebalien
Copy link
Member Author

Scientifically, that option shouldn't make a difference because I only made the change on my own node. But the nodes I'm downloading from should mostly be old.

@Stebalien
Copy link
Member Author

Stebalien commented Apr 22, 2020

I've tried removing the "don't have timeout" and I;m not seeing any improvement. However, it really looks like this problem gets worse and worse as the download progresses.

NOTE: I may have messed up when disabling the "don't have timeout"s. You should double check these results.

@dirkmc
Copy link
Contributor

dirkmc commented Apr 22, 2020

This was caused by two issues:

  • ipfs pin is using a separate session for each block, which essentially causes every want to be broadcast, so we end up with a lot of duplicates
  • when asking for a block from peers running an older version of bitswap, we wait for a short time before assuming they are not going to respond and then move on and ask another peer. However the timing was not conservative enough. Fix: Change timing for DONT_HAVE timeouts to be more conservative #371

@Stebalien
Copy link
Member Author

Ok, I found the bug in pin. We were wrapping the dagService and FetchGraph was doing a type assertion.

@Stebalien
Copy link
Member Author

@dirkmc thanks for investigating this. This looks fixed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug A bug in existing code (including security flaws)
Projects
None yet
Development

No branches or pull requests

2 participants