Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Versioned IPNS #191

Closed
RangerMauve opened this issue Oct 31, 2018 · 6 comments
Closed

Proposal: Versioned IPNS #191

RangerMauve opened this issue Oct 31, 2018 · 6 comments

Comments

@RangerMauve
Copy link

IPNS is great, but it's not perfect yet.
Due to the way it uses the DHT, it's slow to update, and because of the way the format works, there's no way to get the history for a given website.

IPRS is a potential alternative, but it's very complicated and is still a WIP.

After looking at the internals of the Dat project, I came up with the following changes that would give IPNS some of Dats advantages while using it's existing infrastructure and data models.

Adding a link to the previous version (IPNS entry)

This would require the addition of an extra field, previous, in the protobuf definition for IPNS entries.
This field would contain the multihash of the previously published IPNS entry.
This new field wouldn't need to be part of the signature because the entry is signed by the same key, and will have sequence - 1 as it's sequence if it is valid.

Since each IPNS entry is signed, and should have a decreasing sequence number over time, the only additional verification needed will be to check that the sequence number doesn't decrease by more than one.
Not having the field be part of the signature enables backwards-compatibility with older clients that don't know how to handle versions yet.

With this field in place, one can traverse the history by following the previous links.

Syncing the version by connecting to peers instead of searching the DHT

Looking up values in the DHT has the benefit of not needing the publisher to be online all the time in order to enable peers to disover their data as it will persist on other nodes in the DHT.
This doesn't hold up for long outages of the publisher since nothing else in the network will be posting to the DHT if they go down.
This means that once a peer goes down, unless somebody has already resolved their latest IPNS entry and pinned it locally, they won't be able to find their data anymore.
Not being able to find data once the publisher goes down makes the p2p aspect less useful and should be addressed.

The Dat project approaches this differently in that peers will discover other peers on the DHT, and establish connections to replicate the version metadata and actual file contents.
IPFS can do something similar by using the DHT to find peers for the given IPNS key and using a simple protocol to sync them together.

Peers will propogate any new changes that they encouter to their peers which will quickly propogate updates throughout the network.

This doesn't necessarily need to replace the existing DHT functionality, and it would be feasible to have both methods at once, with the same data.

Treat IPNS histories as "archives"

At the moment IPNS acts as a pointer to some sort of opaque resource, this is in contrast to Dat which uses a mutable filesystem as it's base API.
One of the benefits of treating IPNS URLs as archives over raw pointers is that there's now a clear way to monitor whether the user is actively engaging with the contents.

For example, in the Beaker Browser this enables them to seed Dat archives while a user is actively engaging with a p2p website, and letting the browser know to stop seeding after they have left the page.
This provides a natural method of applications sharing the load for resources without having to explicitly code for it.

In addition to tracking the "active" state of an archive, it provides an easy way for authors to reason about updating it's contents.
One could envision using an API similar to [https://docs.ipfs.io/guides/concepts/mfs/] which is scoped to a specific IPNS URL.
Applications can use the same interfaces for reading and writing from published content and are able to pass references to these filesystems around.

The minimal approach that would help here is to add ipfs.name.history(multihash) to get the history and ipfs.name.seed(multihash) / unseed(multihash) to control when to share the history.

Sync protocol description

Lookup:

  • Client wants to look up the latest history for an IPNS key
  • Client uses contentRouting.findProviders for the IPNS key to find peers
  • Client connects to a random subset of peers that it discovers using the /ipns/2.0.0/${multihash}
  • There will be a listener per IPNS key in order to differentiate between the connections without having to add more to the protocol.
  • If the client has loaded this key's history before, it will send over it's latest IPNS entry
  • It will listen for incoming IPNS entries from the other side.
  • Once it gets a record
    • It will check the validity of the signature
    • It will check if the sequence number is greater than what it has locally
    • If it isn't greater, it should be ignored
    • If it is greater, the client will traverse the history until it reaches it's latest sequence number
    • If the value in this chain isn't the same as what's stored locally, the connection will be closed and the block will be ignored
    • If it's the same, the latest version will be saved for reference
  • The client will keep the connection open in order to push any updates it receives to all peers
  • Something should hint to IPFS to attempt to load

Seeding:

  • IPFS should have some way of marking the IPNS resource as being "active" or not so that users loading a
  • They will start listening on /ipns/2.0.0/${multihash} and advertise themselves using contentRouting.provideKey
  • Once they get a connection, they will teat it the same way as an outgoing connection.

Interesting properties / concerns

The replication protocol is super simple because most of the complicated work is done by the existing IPFS stack.
In most cases only a single message will be exchanged to see if they're up to date, and if they aren't the DAG traversal of the versions will automatically download those blocks locally.

All connections between peers are secured by the existing encryption protocols used by libp2p between connections.

IPNS is used to look up the version history. Checking out a specific version of the history is as simple as using the IPFS hash linked to at that version instead, same as the non-versioned IPNS.

It's possible to build libraries to diff between versions of the FS by running diffs on the merkle dags for the folders.

Unlike Dat, content is automatically deduplicated across versions and file names thanks to IPFS's existing file storage capabilities.

The privacy model isn't as robust as Dat since all the files you're downloading are being broadcast via content discovery.

(Sorry if the format is weird, this was originally a blog post)

@Gozala
Copy link

Gozala commented Oct 31, 2018

👍 As I have brought this up on multiple occasions I'd very much interested in some way of having something like IPNS with version log. It can be different IPNS or extension to it, but I find it's one of the pieces that is lacking in the IPFS space.

@Stebalien
Copy link
Member

Due to the way it uses the DHT, it's slow to update, and because of the way the format works, there's no way to get the history for a given website.

To be pedantic, IPNS != DHT. We currently use it as a record store but we can use others.

We also use pubsub, experimentally, for push updates. You're suggested IPNS gossip protocol would be great!

FYI, the current pubsub protocol works by:

  1. Subscribing to the pubsub topic.
  2. Finding provider records for the pubsub topic and connecting to those peers.
  3. Waiting for new records from pubsub.

The addition here would be asking some of those peers for an initial record (unless I'm misunderstanding something).

This doesn't hold up for long outages of the publisher since nothing else in the network will be posting to the DHT if they go down.

So, that solves half of the problem but the other half is, unfortunately, replay of old records. That's really why IPNS records have embedded expiration dates. However, not all users will care those can be set arbitrarily in the future so users can opt out of this protection).

So, this is doable, I just wanted to give you the full picture.

Active/Archive

I'm not sure how the archive part plays a role here but the "active" part sounds like IPNS over pubsub. That is, if the IPNS over pubsub experiment is enabled, go-ipfs will auto-subscribe to the IPNS key using pubsub allowing new records to be pushed to the node.

Is that what you're getting at?

Adding a link to the previous version (IPNS entry)

What about encoding this in IPFS itself? A proposal for UnixFS-2.0 is to allow for a "previous version" field pointing to the previous version of a file/directory (in the file/directory). This way, IPNS records are still just signed pointers and the archival information is stored in IPFS itself.

@RangerMauve
Copy link
Author

The addition here would be asking some of those peers for an initial record

Instead of asking for the initial record, the peer would pre-emptively send it over. The idea there was to reduce the overhead in the protocol, but I could see a request working just as well.

Would you mind linking me to the details of where the pubsub usage is being developed?

replay of old records

Yeah, that's a really good point. The way Dat handles this is redundancy. You connect to multiple peers and take the latest record, and also keep track of the latest record data for future reference.

go-ipfs will auto-subscribe to the IPNS key using pubsub allowing new records to be pushed to the node

Yeah, that's pretty much the same as what I was advocating for.

A proposal for UnixFS-2.0 is to allow for a "previous version" field

One of the reasons I like having the version inside the pointer itself is that you get a history of what the pointer actually pointed to over time. I could see how handling this in the content would serve the same purpose, though.

What's the ETA on getting versioning in UnixFS-2.0, and will it be enabled by default?

IPNS is a relatively small API service compared to UnixFS, and I think it would be faster to add a version field there (and play around with it in the wild) than waiting for the new version stuff to be used in UnixFS everywhere. Plus, it would be backwards compatible to any other encodings.

If versioning is done in IPNS, then it doesn't need to be duplicated for each new data type that it's pointing to. Whereas if you rely on the contents to have their own ideas about versioning you now have a mix of IPNS-based resources that aren't versioned, or are versioned through one of several schemes.

@Stebalien
Copy link
Member

Would you mind linking me to the details of where the pubsub usage is being developed?

The pubsub router lives here: https://github.com/libp2p/go-libp2p-pubsub-router

Yeah, that's a really good point. The way Dat handles this is redundancy. You connect to multiple peers and take the latest record, and also keep track of the latest record data for future reference.

That's fine if you don't care about guarantees but doesn't really help against malicious parties who can just pull off an eclipse attack.

What's the ETA on getting versioning in UnixFS-2.0, and will it be enabled by default?

It's still in planning but is a priority this quarter as it's blocking a bunch of stuff. However, I can't really give an ETA. Work happening here: https://github.com/ipfs/unixfs-v2

IPNS is a relatively small API service compared to UnixFS, and I think it would be faster to add a version field there (and play around with it in the wild) than waiting for the new version stuff to be used in UnixFS everywhere.

Yes, IPNS is a relatively small API and we'd like to keep it that way :). It does one thing and one thing only: map a key to a path.

If versioning is done in IPNS, then it doesn't need to be duplicated for each new data type that it's pointing to. Whereas if you rely on the contents to have their own ideas about versioning you now have a mix of IPNS-based resources that aren't versioned, or are versioned through one of several schemes.

Yeah but it's really inflexible. For example, I can't create a directory of other versioned directories.

Really, this kind of polymorphism needs to come from a type system (which is currently a ways off).

@RangerMauve
Copy link
Author

K, it looks like you've got this stuff figured out so I should just sit tight and wait for the new pubsub and UnixFS stuff to roll out. 😅

@Stebalien
Copy link
Member

@RangerMauve if you have time and patience, you should go over to that unixfs repo and help push it forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants