Skip to content
This repository has been archived by the owner on Feb 8, 2023. It is now read-only.

Implications of the Filecoin launch for IPFS #435

Open
mikeal opened this issue Jun 29, 2020 · 5 comments
Open

Implications of the Filecoin launch for IPFS #435

mikeal opened this issue Jun 29, 2020 · 5 comments
Labels
need/triage Needs initial labeling and prioritization

Comments

@mikeal
Copy link

mikeal commented Jun 29, 2020

I’m logging this as a note because it touches on a lot of areas that touch all implementations of IPFS.

lotus can now create a new storage deal from any CID that is available in the IPFS network. It’ll pull the entire graph out of the network for the user and store that data through the normal dealflow. This is almost the only way you can load data into Filecoin that isn’t just passing a single file you have locally (there’s also a more complicated offline flow that I won’t get into). Some of the applications being built to store data in Filecoin you may have heard of are also using this feature, effectively leveraging IPFS as the transport for creating a new storage deal.

This has some implications I don’t think IPFS has had time to fully consider because it’s going to create some new incentives in the IPFS network that weren’t there before.

1. There’s now a substantial penalty for forking the DHT.

In the past, the main penalty of forking the DHT was losing some of the ease of use IPFS provides. But if you were doing anything other than the default IPFS data model you already couldn’t leverage a lot of the simpler DX that IPFS ships with, so we saw some notable users fork the DHT.

That’s unlikely to happen in the future. In fact, we may see some of those forks come back.

2. A lot more people are going to be loading data into IPFS in order to get it in to Filecoin

We probably already expected this, but it’s worth looking at in a little more detail. The easiest way, by far, to get data into Filecoin if it’s anything other than a single file will be to load it into IPFS. That means a lot of new people using the default IPFS configuration will be loading data into IPFS just to get it into Filecoin.

We have a lot of non-default options that we push people towards when they hit a particular scale. I think it’s worth looking at these and making sure that we default to what is best for large scale data sets because many more new IPFS users will be almost immediately loading large amounts of data.

For example:

  • We still don’t default to raw leaves (lotus turns this on by default when it imports the file but that’s not default in ipfs).
  • We still put the CID for every block we import into the DHT and not just the CID’s we’ve pinned.

For the launch we prepared a bunch of CAR files with Bitcoin data. It’s only about 400GB of data, but it’s enough CID’s to overwhelm the DHT. The easiest way someone would have to setup a deal with this data would also insert millions of unnecessary records into the DHT.

@mikeal mikeal added the need/triage Needs initial labeling and prioritization label Jun 29, 2020
@momack2
Copy link
Contributor

momack2 commented Jun 30, 2020

Good alert Mikeal! I think it's a huge win that it's so easy to use IPFS and Filecoin together (IPFS users will rejoice about being able to just reference datasets already loaded onto IPFS when persisting with Filecoin!). However, I agree that now is a time to look at some of the user defaults and make sure they make sense for the influx of new users building on IPFS and Filecoin together. @aschmahmann / @petar - now may actually be the time to switch default pinning strategy to avoid the large data failure case Mikeal mentions above.

FWIW, I don't think the point 1 is actually a blocker/concern. AFAIK you can configure your lotus node (either client or miner) to instead/additionally listen on your special private IPFS DHT and load in data that way - so I don't think this will actually be decisive for overall network dynamics.

@aschmahmann
Copy link

aschmahmann commented Jun 30, 2020

I'll get the technical note out of the way before going into the higher level question here.

Technical: The main reason we haven't switched the defaults is that if you only advertise the root of the file and then you download the root block followed by your connection terminating then there's basically no way to resume the download. If you try ipfs get QmXYZ again then you'll get the root block locally and won't bother searching the DHT for QmXYZ. Since we would no longer be storing QmXYZ's children's provider records in the DHT there's no way to resume the lookup.

This is, and has been, an important problem to solve. If we expect it to be critical for onboarding a large number of new IPFS users in the near term then we can prioritize it (means other things will get deprioritized), but it requires understanding what the ask is.

Higher Level: I don't have enough context on what the expected experience is for Filecoin users that are planning to use go-ipfs or any IPFS libraries. Without understanding the user stories that are desired or expected from Filecoin users it's difficult to do anything other than give technical explanations of what is or isn't possible based on what's asked. This misses the forest for the trees, since there are probably other creative avenues to explore.

For example, are these users that are uploading large amounts of data to Filecoin storage miners also planning on utlizing their bandwidth to freely distribute that data? If not then we should probably create a simple tool that is the inverse of ipget that does things like:

  1. Uses the Filestore
  2. Enables GraphSync
  3. Disables adding provider records
  4. Uses the new peering system to connect directly with the party data is being uploaded to

I suspect there are a few issues on the boundary between Filecoin and IPFS that are worth discussing the details of to ensure that these new IPFS users will have a good experience. Similarly, I suspect there are some patterns and issues in Filecoin that could be solved by using solutions shared with IPFS. For example, filecoin-project/lotus#2152 sounds like a request for content routing in Filecoin. While IPFS's current default content routing system (i.e. the public DHT) may not be what Filecoin wants or needs there have been numerous discussions and proposals within the core team and in the community that may be of use.

@mikeal
Copy link
Author

mikeal commented Jun 30, 2020

AFAIK you can configure your lotus node (either client or miner) to instead/additionally listen on your special private IPFS DHT and load in data that way

Very cool! I didn’t know this ;)

I think this is probably enough that the existing DHT forks won’t come back, but I still think we’ll see less forks in the future because as people build services around this feature it’ll be a lot easier to keep your data in the main IPFS network rather than a fork.

@mikeal
Copy link
Author

mikeal commented Jul 10, 2020

Something to consider.

Could IPFS just have a default, but configurable, max limit on the number of CID’s it was trying to broadcast? A reasonable default here would greatly reduce the risk profile. I can’t think of a case in which a regular user would need to broadcast more than 10K CID’s, the user experience when someone does this would be so poor that it’s hard to imagine it’s something that someone would want to do.

Breaching the max limit would cause an import error that could point them towards the settings for only publishing the CID of your pins, which we expect is what they probably want to do with a graph this size.

This doesn’t solve every concern, but when someone does the wrong thing this would greatly reduce the potential harm it would cause and it would take many many more users doing the wrong thing all at once to arrive at the same load that, right now, only a few would need to do.

@ribasushi
Copy link

Linking another issue for continuinty: filecoin-project/lotus#2875

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
need/triage Needs initial labeling and prioritization
Projects
None yet
Development

No branches or pull requests

4 participants