-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Providing content with a pin request #73
Comments
We should just PUT a CAR file with a single root (standard Filecoin CAR file). It’s pretty clean and we already have code to pin a CAR file in Go and the JS client libraries were just updated to be smaller and faster. We have nice library infrastructure to leverage here and we end up with some really thin client libraries you can build without IPFS in the client at all. |
This is a duplicate of ipfs/in-web-browsers#18 and ipfs/in-web-browsers#22 which are already closed and resolved. Problems 1 and 2 are solved if you assume the remote pinning service is dialable which IMO is both fair and is a prerequisite for this solution to work. Problem 3 I think is a non-issue. Given that you have a blockstore (FlatFS, badger, S3, a CAR file, etc.) spinning up a libp2p node that supports Bitswap and only makes a single connection to the pinning service node is not a huge ask and helps us plan for the future. |
To a Web developer working in the browser, this is a huge ask. |
I mean we could just provide a library that does this. If asking developers to include a library is a huge ask then I don't see how they get any real benefit out of IPFS, since they can't get the IPLD data/CAR file to send to the pinning service without using some library for working with IPLD data. |
Applications are a little more complicated than that. The client/user that needs to upload the content has different needs than the consumers of that data who need it available in a decentralized network. In some NFT use cases an artist needs to put their content into a website and then never touch the site again. There are other users who might bid, buy, trade and do other authentication against the NFT data. IPFS is still very much a critical part of this application, but the thing standing in the way of getting more content into IPFS is that loading content you’re just trying to hand to a remote provider to run IPFS for you requires such a substantial client. |
To be convinced of any value in this argument I'd need to see some evidence that the "substantial client" of a minimal libp2p client is much more burdensome than the IPLD (and likely UnixFS) libraries needed to process the data to the point that it would have a noticeable impact on a user's choices. Additionally, while there are some nice aspects in terms of ease of implementation using HTTP to send the data to be pinned we have issues like ipfs/in-web-browsers#9 and ipfs/in-web-browsers#7 which become impossible when you start adding in HTTP specific features (e.g. PUT for a CAR file) into this spec. You also lose out on any deduplication benefits of a protocol like Bitswap or GraphSync. Overall, IMO this proposal is pushing us in the wrong direction. If there is some major hurdle (e.g. with using js-libp2p in the browser) with the existing API's support for this use case, then I could potentially get behind this but otherwise I don't think it's worthwhile. Curious if @lidel has a different perspective here. |
Bit late to this party, but here are my condensed thoughts:
|
I have spend little more time thinking about this and I came to following conclusions that I would like get feedback on: There are two very different user groups that are potential pinning service API users:
I think current API does poor job at meeting either of those user needs. For group operating IPFS node libp2p based API would be a more effective and efficient avoiding lot of HTTP roundtrips etc... For a group that just wants to add file to IPFS network having to run IPFS node just to add a file is a huge burden and sometimes constraints of the runtime also get in the way (e.g. serverless is a good example). I think there is an opportunity to enable that second group and significantly reduce upfront costs (education, etc...) to get them into IPFS. I think doing it as part of pinning services is good idea because:
|
I’d like to pause this discussion for now. It would seem that the entire purpose of a remote pinning API would be to hire a third party to run IPFS for you, and that effectively requiring a client to run IPFS in order to get content into that remote would be a barrier to maximizing the usefulness of the remote pinning service. That said, you’re right to question how useful this would be and compare it to other work streams. I don’t think it would be a productive use of your time to send you all the NFT user research we’ve done and invite you to several more meetings where we’ve been discussing such things. If we’re confident enough of the user need we can just build something to satisfy it ourselves. Once it’s deployed and used by these target users we can iterate on it and take what we’ve learned back to you and recommend changes to the pinning API with much more confidence in their usefulness. Then we can update anything we have already deployed to match what ends up being formally specified. |
Revisiting: providing DAG archive with a pin requestWe marinated a bit for a few weeks, and it seems that adding the ability to upload precomputed DAG archive to a pinning service is something that, if implemented in a thoughtful way, does not go over raison d'être of this spec. @rvagg makes a good point that as long we talk about DAG archive, there is no change in existing separation of concerns:
"Uploading" DAG archive to pinning service is the ultimate "provider hint". It operates at the same abstraction level as bitswap session, does not introduce any complexity related to vs directory handling, chunking, hashes, which greatly simplifies things on the service end. Prior art: DAG import APIsRight now, we have two preexisting "DAG import" endpoints:
We want to make self-hosting of pinning services easier by adding support for this spec to ipfs-cluster (ipfs-cluster/ipfs-cluster#1213), but even when that happens, we will most likely need a separate namespace/port for "service" endpoints guarded by bearer access-token. Requirements and ConstraintsMain ones:
Initial API idea (looking for feedback 👀 )
Obligatory 🏚️ 🚲 question: how extensible we want this to be?
(A) sounds like more pragmatic approach, YAML will produce good docs, we could extend it if we wanted, but lmk if I missed something important at any stage of this exploration. cc @mikeal @olizilla @obo20 @hsanjuan @rvagg @ipfs/wg-pinning-services |
The pinning service is supposed to support the CID as the root of a graph:
So it should be fine, since a CAR can contain that whole graph. Which in theory should support the pattern nft.storage is leaning in to with their cluster API library: nftstorage/ipfs-cluster@054063e#diff-13876b4beb64b9f156474dc78f9c923952a7ca210d4507b6b3135bbe244f8a60 as long as dag-cbor is supported by the endpoint. So two additional questions raised:
|
I think so too. Is there an easy way to tell that CAR has more than one root, before ingesting entire thing?
I am tempted to say this is up to pinning service to decide, because it is "content routing details" of sorts. Service could return
iiuc a workaround for pinning of encrypted DAGs that can't be traversed by a pinning service would be to create an "envelope DAG" with opaque raw blocks as leaves. I believe paid services like Pinata will reject DAGs that can't be traversed, because they (1) only track root CIDs and pin recursively (2) calculate total size via |
@lidel Yes, you're correct with your understanding of things. Our biggest requirement for anything that we ingest is that we need to be able to calculate the size of it. The sooner we can do this in our upload pipeline the better, as it helps us make decisions on whether or not the content is allowed to be added to our systems. |
Can we left this requirement ? We are already running into issues in cloudless infrastructure that has memory limits per request. Current plan to overcome these limitations is to chunk up content and pin it over multiple requests. If car had to contain a single file that would be problematic as we may not be able to fit the whole file in one car file. P.S. I realize that thinking there was 1 file MAX, but I think we'd be better of not having such restrictions, especially since we'll have non file blocks as well. |
What the rational here ? If you import single car file why wrap it around with all the extra stuff that also needs to be parsed on the host. Seems like extra overhead for not to me unclear benefit. I would suggest POST with content of car file would be a better option here.
I think this would be a big mistake. The whole point of import is that it is atomic operation introducing queuing and fetching from DHT is going to make it non atomic. Also I would argue that missing blocks are either:
I do not think generalizing this endpoint is a good idea. I would much rather add number of endpoints or use something like content type header to extend interface than make a very generic API. My arguments are:
|
I would prefer having explicit parameter to tell if incomplete graph is ok or not. As there are valid cases for both. I would even go as far as make it non-optional parameter so so user has to think and make conscious decision. |
As in second I think this should be explicit! If user opted into incomplete graph import than answer is obvious. If user opted-into full graph only than I would expect host to error as it has no means to validate than invariant. |
I am not sure roots are useful beyond verification that all blocks made it. I would expect service to pin all the blocks that were in car file whether root or not. If I did not wanted that block pinned I would not included it in car file in first place. |
Here is my general feedback consolidated together:
Please note that I understand that there are valid use cases where you want to upload sugbgraph and make service fetch remaining blocks for you or maybe you already have uploaded that part of sub-graph. But I would suggest to not support such use cases yet because:
|
I am highly concerned about the ramifications of modifying this API to work with non-recursively pinned data which @Gozala is suggesting.
This expresses a desire for atomic pin operations.
This expresses a desire to break up the atomic pin operation into multiple atomic pin operations. This implies either that pin operations are for incomplete graphs or that we are not pinning data graphs but just sending bundles of raw blocks that are referenced in a graph-like fashion. Non-recursive PinsThe current API only works with pinning full graphs, it has no concept of a "best-effort" or partial DAG pin. This has been previously discussed and we removed even the semblance of partial pinning since nobody was working with data like that #17 (comment). If we feel there is value in expanding how we pin data to include non-recursive pins (e.g. pinning by selector, direct pins, depth pins, etc.) we can do that, but it seems both out of scope and something that should probably be explored in go-ipfs (or at least ipfs-cluster) before hoisting onto the community to deal with since they should have some reference implementation that is compliant. Storing bundles of blocksWe can do this and sometimes we might even have to do this. In the case of unknown codecs this might be the only option available. However, it has a few tradeoffs that are unfortunate.
The obvious distinctions are that there is no way to do partial sync in just sending a group of blocks as a CAR file and that sending the data itself takes away the providers choice to not read the data. However, one more subtle distinction is that with provider hints the pinning service is still separating out the "pinning" and "fetching" operations whereas this proposal combines them both. A true "here it is" equivalent of a provider hint would be allowing the user to send the pinning service a CAR file filled with blocks that they can choose to use or discard that would occur after a remote pin add operation had already been started. In this "provider hint" model it literally would not matter what data was in the CAR files as long as they were in a readable format (one/many roots, in/complete DAGs, un/supported codecs are all irrelevant). No this does not match the "atomic" property that some here are looking for, but the desire for this property is where the complexity is coming from. |
I think there’s a simpler path that we’re missing here. If we conceptualize this feature in simpler terms:
we can defer most of this complexity. Maybe we do need a way to pin partial graphs and a bunch of other features, but let’s not complicate this feature with a lot of new complex behavior. Let’s just avoid changing the behavior of pinning at all. If you send a partial graph then the request won’t return until we’ve pulled the rest of that graph out of the network. Because that’s how the pinner already works, this feature just lets you load some blocks in first. If the codecs aren’t available the pin call with fail, cause that’s how the pinner already works. Pinning partial graphs would be cool, but let’s have that conversation in a new feature request because that may involve a bigger rethinking of the pinner, or maybe not, but we can make progress here without taking that on. |
This is exactly what I am asking for, however there are few nuances that I do think need to be defined.
So it’s not about complex behavior, it’s about what does pinning service do if some blocks from the root are missing ? it can either
I think we should define expected behavior regardless of choice. I also think 3rd is the worst option. And I can see reasonable arguments in favor of 1st and 2nd. That is why I suggest to let user specify desired behavior between 1 and 2. If that is too much , I’d say 1st is less limiting & it puts bit more burden on user to assemble car file properly
I think this is really bad option. What if it can’t find blocks. I really think if user is omitting blocks it on user & there might be good reasons to do so, maybe they’re just coming with next request or were already pinned by last request. This is what introduces complexity here. It also makes untraversable graphs unpinnable. On the other hand just pinning provided blocks simplifies this and makes service codec agnostic, you hive it blocks it pins them that’s all.
i think this is wrong framing, it is not about partial graph support but rathe making service graph agnostic, you give blocks it pins, doesn’t need to know or care about graphs, codecs or any of that. |
I do share general concern here that multiple pins that collectively form a single graph is a real concern as relationships between them are not encoded. Originally my thinking here was that subgraps could be temporal until the rest of the graph is imported at which point those could be dropped, but now I am realizing that can not be accomplished without service understanding graphs which is what my motivation was. |
I have thought bit more about these and here few not fully fleshed out notes:
I wonder if some hybrid of 1 and 3 could provide a reasonable compromise with some transactional guarantees. E.g. what if:
This would
|
It seems the options might boil down to something like this, at a really rough level: Option 1 - "atomic" CAR uploads, "pin this thing please", root(s) say what to pin, graph is contained and complete, incomplete graph is a failure
Option 2 - allow incomplete graphs in a CAR, and solve for completeness
Option 3 - allow incomplete graphs in a CAR and make pinning atomic (i.e. pinning incomplete graphs now supported)
Option 4 - keep this out of the Pinning API, make it easier to temporarily park content elsewhere and use provider hints as they are - could involve making ipfs.io and other public gateways writable and with reasonable TTL or LRU -
|
It can only do the current behavior if we refrain from changing the pinner. There’s lots of good ideas in here but they should be their own feature requests against the pinner. As a first step we should just add the blocks and then call the existing pinner, with all of its current behavior and limitations. |
I think most people would tell us what they want is "here let me post you some files, you give me a CID or fail, take my money". Pinata and Infura offer that already, but through custom apis.... we could pave that cowpath, but we've become focused on car files. To verify what feels implicit in this thread, I think we are trying to nudge people to use car files here:
I'm a just about sold on using car files here, but we should be clear that we'd be offering users what we think they really need, not what they would tell us they want. We should be very clear about why we would do that and how we intend to message it. Can we state why POSTing files is out of scope? |
We're writing the thing to car up your files. We'll get that working nicely and then report back on if it's something we would want to put as a minimum requirement to playing the pinning service game. |
From reading through this thread, it seems like the original desire here was to allow users an easier way to directly upload their NFT data to a pinning service in a way that doesn't rely on async pinning through the network. I'm seeing a lot of added functionality/complexity being discussed here, and while I think there's a place for a lot of this, I worry that it's overcomplicating things in the near term. In addition, I'm also a little worried that we won't even be able to support something like this if it gets too complex. We might be able to, but the more complex things get, the harder things are going to be for us to workaround in order to get everything to work with our existing infra. I would really like to see a simple "golden path" where "user has a file, uploads it to pinning service, gets CID". My guess is that most users/devs aren't going to know what a .car file is, or care about anything such as partial pinning. And I don't think they should have to know. I don't necessarily care one way or the other if things like CAR files are used for the file format, but anything that's created is going to need to be massively automated to work "automagically" with easy-to-use libraries in order for this to be successful. |
This is definitely a sentiment I had when opening this issue. And I absolutely agree that if what you have is file or set of them nothing beats simple multipart/form-data POST. And that is pretty much what pinatas https://api.pinata.cloud/pinning/pinFileToIPFS provides. I think reason why car files got pulled into discussion is because building nft.storage we found ourselves needing to pin not just files but also non file dags and car files seem to provide a reasonable and simple way to upload those. That said maybe it should be a separate discussion / API extension because I do not see a a good reason for complicating simple case of uploading files with cars.
I think primary reason is it supports dags beyond unixfs There is a bit of extra utility as it eliminates inconsistencies that could arise from different chunking or hashing preferences.
I think it would be best:
|
Reducing complexityI share concerns around making DAG archive handling too complex. In my mind import+pin should be very simple: in case of DAG archive expect single root, complete DAG and instant pinned status. Everything else should return error. If the dag is bigger than it makes sense for a single upload, that should be solved by either regular bitswap, or userland sharding (importing subgraphs and then pinning true root + unpinning subroots). Ack that even if we keep DAG import simple, at the end of the day, people will still ask why there is no file import, so let's revisit... Why we had no FILE import in v1 if this spec
iirc original reasons were to:
I believe those things may no longer hold as strong as they did last year, ecosystem looks a bit different (Filecoin shipped, Brave and Opera shipped ipfs://, NFTs etc). Why we might revisit this and support FILE imports
Ack. Personally, hoped to fill this void by making writable gateways a thing, but there are unknowns around GC, and expecting people to do two operations is less friendly than atomic "import+pin".
We may produce better API if we design it around more than one import type or use case. I suggested use of This way we could have
The nice thing here is that we solve for both simple and advanced use cases, Would this be acceptable? I could open PR if this is not too controversial. Think about import+pin of JSON/CBOREven if we have FILE and DAG import+pin operation, working with json/cbor is still painful (requires creation of DAG archive).
If people could send JSON to pinning service, get a CID, and (soon) load it via gateway, then we removed a lot of complexity that is blocking people from using advanced IPLD features in the long term (cc @warpfork). |
I like the idea of
Few thing that need to be clarified there would be:
I would like this to be either separate endpint all together e.g. I also still would like following do be specified:
In an effort of keeping things simple while making dag import codec agnostic I would like to propose following requirements:
I believe this would enable dag import API:
I love it! Yet I still want to make pinning service swapping 0-cost & and unless we do make chunking & hashing predictable it is not going to be the case. Still I think for end user simple case can remain simple as those options would be encoded in the client itself.
This is interesting, I imagine multipart/form-data could be used to do more or less what car does, but omitting CIDs and encoding as JSON. However anything with links would get immediately complicated. Either way could that be yet another endpoint evaluated separately. I don't think overloading same endpoint is making things simpler or easier to evolve the just hide true size of API surface. |
ok, I'll throw my 2 cents on some discussion points:
In practice, when adding a CAR you can pin before adding. This protects any GC from removing whatever you are going to add I guess but it's just a trick. As of now cluster would fail this part because IPFS provides no way of warding off against automatic GC when you are doing something like adding blocks. Anyways, it is obvious that the pinning service should not be GC-ing stuff that is supposed to be pinned. For me, when integrating with go-ipfs, that is more an operational practice than an implementation requirement, given the state-of-the-art GC that it brings and the limited control around as a "client".
This should be unspecified. In practice, they get added to the blockstore and they do not get pinned. Pinning is recursive from a root. They get added because the pinning service has no interest in the complexity to do anything in a different way. We do not know in which order the blocks are (unspecified in the CAR spec I think). In theory we do not even need to parse them, let alone interpret their links, they can go straight into the blockstore.
Unspecified. Pin error likely. Import error perhaps. An error in all cases. I don't think the specification of this should be blocked on things affecting a 1% of people like:
The API supports pinning things 1 by 1. Therefore the semantics of pinning something with an "ultimate hint" (CAR attached) should be limited to adding 1 root-CID. Adding multiple things at once can share semantics with pinning multiple things at once, if the API is ever extended in that direction. And this will likely require new specific endpoints. To summarize. For now:
For later:
|
A relevant project proposal for adding "Chunked CAR Uploads" to nft.storage is at protocol/web3-dev-team#111 |
Any updates on this? |
Afaik nobody is working on this at the moment, To get things started, what we need is an
|
We were evaluating protocol/web3-dev-team#58 in the of protocol/web3-dev-team#62 and the subject of "where is pinning service going to get content from" came up. Assumption is that pinning service will fetch content from ipfs network raises some concerns:
I remember @lidel was telling me about de facto hack of encoding content in an identity hashed CID, which might overcome some of the above listed concerns but raises whole new ones:
Either way, uploading content as identity hashed CID encoded in base64 string in JSON feels like a very impractical solution to meet specific requirements. It seems like we need to consider extending this specification to support this use case or it will not be practical for cases where just putting content on IPFS is desired.
It is also worth pointing out here that e.g. pinata has own API for such a use case https://pinata.cloud/documentation#PinFileToIPFS
/cc @alanshaw @mikeal @jnthnvctr
The text was updated successfully, but these errors were encountered: