Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: an API to find miners who have a certain Payload CID #2152

Open
ghost opened this issue Jun 26, 2020 · 6 comments
Open

Feature request: an API to find miners who have a certain Payload CID #2152

ghost opened this issue Jun 26, 2020 · 6 comments
Labels
area/api area/markets/retrieval effort/weeks Effort: Multiple Weeks kind/discussion Kind: Discussion kind/feature Kind: Feature need/team-input Hint: Needs Team Input P2 P2: Should be resolved

Comments

@ghost
Copy link

ghost commented Jun 26, 2020

Problem Statement

There are two categories of retrieval use cases:

  • User wants to get back data that she uploaded. In this case, the user knows which miners she made the deal with, so current Lotus APIs are sufficient.
  • User A uploaded a file to the Filecoin network, and User B wants to retrieve it. In this case, B is going to want to search the entire Filecoin network for a Payload CID, i.e., the hash of the data she cares about. (She doesn't know anything about Piece CIDs - that's an internal implementation detail from her perspective.) Right now, there's no way to do this. ClientFindData doesn't search the whole network, only its own connected peers, and we also can't learn this information from the public blockchain because it only stores Piece CIDs of previous deals. This use case is currently impossible.

Proposal
Lotus would extend ClientFindData to provide a list of miners who have a the specified Payload CID from the entire Filecoin network (not just its connected peers).

This could be achieved in a lot of ways. One approach is to index every Payload CID when it gets stored, with the data provided by Lotus instances gossiping to one other what Payload CIDs they have recently sealed (distributed index). There are also centralized solutions where PL just pays to maintain an index server somewhere, but that's not very Web3.

Notes

  • @mikeal pointed out an interesting DoS vector if we let retrieval clients request data by Payload CID alone without requiring a PieceCID (ie, the current behavior of lotus retrieve. But for my use case, we really need one CID that describes the user's data, so I'm against making PieceCID a required argument to Client.Retrieve.
  • There is also a question of whether this was an intentional design decision not to store Payload CIDs on the public blockchain. Possible a @jbenet question there.

@hannahhoward @jesseclay @jnthnvctr @whyrusleeping

[EDIT: ClientFindData will search all of its connected peers. But unless every peer is maintain a connection to every other peer, this isn't a search of the entire Filecoin network.]

@mikeal
Copy link

mikeal commented Jun 26, 2020

There is also a question of whether this was an intentional design decision not to store Payload CIDs on the public blockchain.

Since storing things on-chain is expensive it makes sense not to store anything that isn’t absolutely necessary, but there’s a broader question here than just what is on vs off chain.

Right now, there’s no way to search through the chain and actually read all the Payload CID’s in the network because you can’t query the node’s for the Payload CID of each Piece CID they have a storage deal for. Since you must have a Payload CID in order to do a retrieval, that means there’s no way to index the contents of the network beyond the deals you have more knowledge about than just what is on-chain.

What I don’t know is if this is intentional, if we explicitly want to keep people from knowing all the data for all the deals in the network and keep that information somewhat private or if this is just a tiny detail we didn’t realize would keep us from being able to query into all the data in the network.

@mikeal
Copy link

mikeal commented Jun 26, 2020

But for my use case, we really need one CID that describes the user's data, so I'm against making PieceCID a required argument to Client.Retrieve.

To my knowledge, requiring the Piece CID in the retrieval offer query shouldn’t prevent anything you want to do because there’s nowhere I know of that you will find information that associates a miner with a Payload CID that wouldn’t include the Piece CID. Right now, every method we have to associate a miner with a Payload CID comes from actually having created the deal with that miner, which means you have both. If there is some other way of associating Payload CID’s with miners then you can throw out this argument.

@ribasushi ribasushi assigned ribasushi and unassigned ribasushi Jun 26, 2020
@ghost
Copy link
Author

ghost commented Jun 26, 2020

every method we have to associate a miner with a Payload CID comes from actually having created the deal with that miner, which means you have both

I understand your point. But from a retrieval user/developer experience standpoint, if I want some data from the Filecoin network then I expect there to be a single identifier corresponding to my data. When you type cat A, you don't have to also specify A's inode so that the file A can be found. That would be weird.

@mikeal
Copy link

mikeal commented Jun 26, 2020

When you type cat A, you don't have to also specify A's inode so that the file A can be found. That would be weird.

Sure, I’m not saying we should change the DX, I’m saying we should change the protocol ;)

These commands currently don’t require a miner id, that gets resolved elsewhere and then used in the protocol, and what I’d like to see is that the Piece CID is also used.

Now, there may be a use case I’m not seeing in which you can resolve the miner ID but not the PieceCID which would prevent us from requiring it in the protocol, but so far I haven’t seen one. At the very least, our default workflow should grab the PieceCID when it grabs the miner id and use it even if it’s not required because just having in the default workflow will make attacks far less useful.

@hannahhoward
Copy link
Contributor

For reference, I have raised this issue previously: filecoin-project/specs#689

My impression, from responses I've received to raising it, is that this is a post launch, possibly out of core-product priority.

@arajasek arajasek added the effort/weeks Effort: Multiple Weeks label Nov 5, 2020
@jennijuju jennijuju added P2 P2: Should be resolved kind/discussion Kind: Discussion labels Jul 14, 2021
@jennijuju
Copy link
Member

This is nice to have - probably worth to be included in one of w3dt project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api area/markets/retrieval effort/weeks Effort: Multiple Weeks kind/discussion Kind: Discussion kind/feature Kind: Feature need/team-input Hint: Needs Team Input P2 P2: Should be resolved
Projects
None yet
Development

No branches or pull requests

7 participants