Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify the IPNI federation protocol #27

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

masih
Copy link
Member

@masih masih commented Dec 19, 2023

Specify the initial IPNI federation protocol which aims to achieve eventually consistent index records across a collaborating set of nodes.

The federation protocol consists of four fundamental steps: Initialization, Periodic snapshot taking, Exchange of snapshots and Reconciliation. The protocol takes advantage of the immutability of advertisements exposed by each provider to resolve conflicts across indexers.

The specification lists a set of APIs exposed by a participating indexer in order to enable the implementation of the federation protocol.

See rendered document.

IPNI_FEDERATION.md Outdated Show resolved Hide resolved
IPNI_FEDERATION.md Outdated Show resolved Hide resolved
IPNI_FEDERATION.md Outdated Show resolved Hide resolved
Specify the initial IPNI federation protocol which aims to achieve
eventually consistent index records across a collaborating set of nodes.

The federation protocol consists of four fundamental steps:
Initialization, Periodic snapshot taking, Exchange of snapshots and
Reconciliation. The protocol takes advantage of the immutability of
advertisements exposed by each provider to resolve conflicts across
indexers.

The specification lists a set of APIs exposed by a participating indexer
in order to enable the implementation of the federation protocol.
Copy link
Member

@willscott willscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The basic snapshot proposal here looks generally good with one comment

IPNI_FEDERATION.md Outdated Show resolved Hide resolved
IPNI_FEDERATION.md Outdated Show resolved Hide resolved
IPNI_FEDERATION.md Outdated Show resolved Hide resolved
IPNI_FEDERATION.md Outdated Show resolved Hide resolved
@masih masih requested a review from gammazero December 20, 2023 12:54
IPNI_FEDERATION.md Outdated Show resolved Hide resolved
The Inter-Planetary Network Indexer (IPNI) offers a routing system that enables mass advertisement of content addressed
data and lookup performance in order of milliseconds.
This is achieved by a design where a single IPNI instance strives to maintain full network state knowledge.
Many IPNI instances can be instantiated across the globe to offer fast local ingestion and lookup of advertised content.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another goal of federation may be to allocate portions of the index content to different indexers. I think this should be talked about - even if only to say why we do not do it. People will want to index some stuff for some time, but not everything forever, and will want to be part of a federation for some of the other features.

Having local fast lookup is a good goal, but does that apply to everything? It may not be necessary to be fast and local for all storage providers, but just the highly accessed ones. This means we may want to consider a strategy that allows some subset of service providers, and even epochs within those providers' chains, to be assigned to various indexers. Infrequently accessed storage provider content can be indexed by fewer indexers.

Such a strategy may also be useful so that indexers can choose which providers and epochs are the best to index and not have to index everything, even from specific provider(s).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though useful, this feature seems sufficiently complicated to need its own specification?

Aside from that, I think what's documented in the current spec doesn't technically limit that behaviour. For example, there is nothing that would stop a group of collaborating indexers to agree on only indexing advertisements up to the depth 1000 within the last week. The federation protocol described here would still work as long as all the participating nodes only include providers with such criteria.

WDYT @gammazero ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that is the purpose of this spec - to make indexing consistent between indexers that cover identical index content. It means that if indexer-1 covers providers A and B, indexer-2 covers B and C, and indexer-3 covers C and A, these cannot be in the same federation even though they can check each other's consistency. Or, are they each part of 2 federations?

I can see index allocation, and incentivization as completely different specs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that is the purpose of this spec

It all boils down to the words "Collaborating IPNI indexers".
The current spec aggregates provider lists across all indexers. This means one indeed cannot have an indexer that only indexes a proportion of providers instead of all them.

In the case you described, at least 2 of those indexers need to agree on the heuristic for selecting providers. Right now that heuristic in cid.contact is simply "All that you can get your hands on but not the ones you have not seen for a week". If indexer-1, 2 and 3 follow the current heuristic and are part of the same federation, then the should all end up with A, B and C in their providers list.

The specification of what that heuristic is, IMO is beyond the scope of this spec. I think we can both imagine case where it can grow quite complicated, and could potentially lead into a lot of head scratching for users trying to look things up.

Considering the spec does not prohibit the federation of a set of indexers that agree upon the same heuristic to select providers, my vote would be to keep the scope small and aim for a federation of cid.contact-like indexers going first.

In that world, one could use DNS names to form separate federations across indexers that only care about "hot off the wire" records. For example, today.cid.contact could be a new indexer part of a separate federation of indexers that only index ads published within the last 24 hours.

for content propagation. As multiple IPNI instances exist across the globe, the potential for discrepancies in the state
of these chains is undeniable. It becomes essential, therefore, to devise strategies that can reconcile these
discrepancies effectively.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a use case to check that providers are indexing what they claim to index? An indexer may try to only index popular, dropping data it does not get any hits on after some amount of time. Another indexer (or index checker) could prove that an indexer is not indexing some content that it should be, and that could result in some penalty or reputation damage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there certainly is but checking indexers' behaviour is an independent problem: it applies to a single index too, not just a cluster of federated ones.

Adding that to the federation protocol itself seems like possible future extension. But I am not sure if it is a blocker for having a federation.

IPNI_FEDERATION.md Show resolved Hide resolved
advertisement head.

While traversing advertisement chains, instances should prefer pulling ad chain from eachother's mirror instead of going
directly to the provider where possible.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unfair to the faster indexer, because all the other indexers will pull from the mirror of the fastest. Maybe we need a strategy that strongly encourages other indexers to add content from others' mirrors to their own.

For example, an indexer cannot pull advertisement data directly from another mirror, but instead must sync some portion of the mirror and then get the advertisements from the local mirror after syncing. This forces the receiving indexer to have a copy of the data in its own mirror.

The original source indexer may then redirect others, for a short time, to the destination indexer for the synced content. Or, maybe there is some check that if one indexer copies advertisement data from another indexer's mirror, the indexer that received the data subsequently serves it from its mirror.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In think in practice, the fastest indexer would/should have rate limiting in place, and other indexers should scatter traffic across multiple sources simply to pull ads from different providers in parallel.

The key point I am hoping to get across with this sentence is to say: try going to mirrors first before going to providers. That doesn't mean the same mirror all the time.

The "fastest indexer" would also depend on where the providers and indexers physically are. I expect that the indexer network would naturally grow around the organisational structure of its providers. For example, if there is a heavy presence of providers in Australia , it is likely that there would be more than one indexer in Australia to cope with the demand. At this point I find myself heavily speculating which is probably a hint that we might be over thinking it.

Another factor here is the loosely defined concept of Mirror itself: there is no specification yet that documents the API for a mirror. We are in the process of figuring out its economics, rate limits etc.

So i think we would probably be OK revisiting this once there is more data.

Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything that encourages one indexer to help another, when those indexers are not operated by the same organization? I guess just that it allows the other indexer to supply the ad data and reduces load on the first.

In other blockchain indexing ecosystems, the indexers are interested in handling as many queries as they can for the data they index - so they do not want to help other indexers. In the IPNI ecosystem, it appears that indexers are interested in doing as little of the required work as possible - so want other indexers to handle as many queries as possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I am OK with allowing indexers to share their mirror as much or as little as possible, with whatever limits they see fit, and specifying this a some later time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Captured this #29

Copy link

@aschmahmann aschmahmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see the interest here 🙏. Apologies for the late review and hope it's of use.

A snapshot consists of:

* **Epoch**: a monotonically increasing value that represents the time at which snapshot was formed.
* **Vector Clock**: the monotonically increasing vector clock value that corresponds to the IPNI instance.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem like a vector clock since it's an integer and not a vector. Is the idea that despite there being no communication around the vector clock of all IPNI instances in the federation this is something nodes are tracking locally to understand how far behind they are?

**Resolution Strategy:**
This represents a divergence in the advertisement chain, and careful reconciliation is required:

1. Instance A and B determine which advertisement is the later one in the chain of advertisements provided by P1.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like #30 would make life a lot easier here. However, IMO it's also worth calling out what should happen if the advertisement chains diverge (e.g. P1 forks their advertisement chain and give different information to A and B in order to mess with their reputations).

A couple ways to do this:

  • Nice: Just choose one based on some criteria (e.g. based on sequence number and then lexicographical multihash ... although if you're worrying about keeping history, since the best chain could flip-flop between multiple, it could be annoying)
  • Harsh: Keep both signed advertisements around and propagate to the network as a "every just delete this provider, they don't play by the rules". Might require keeping those two records around for a while to make sure all the providers know to remove them. Would require some support here in the sync protocol to enable communicating about this.

Comment on lines +84 to +86
* **Strong consistency**: While the IPNI federation protocol aims to ensure data consistency and availability, it does
not aim to atomically replicate every piece of indexed data across all nodes. For example, it is possible for some
records to remain partially available on one node until deletions are propagate through the network.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this statement the same as the "real-time synchronization" non-goal, or is this saying that it's actually ok if the federation never reaches consistency (e.g. it's ok for a single federation to include ipni.china that only hold Chinese providers, and ipni.us that only hold US providers)?

Note: I get that technically the HTTP APIs here don't really care, but in practice a federation (e.g. the /mainnet federation) would have to care about this.

}
```

### GET `/ipni/v1/fed/{CID}`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any expectations on how frequently this is meant to be called and in what scenarios? Given that calling this on one IPNI instance doesn't tell you anything about the others it's a little confusing why and how many other IPNIs you'd ask for /fed/{CID} vs asking the many providers for /ad/head (assuming /ad/head came with the sequence number)? I can see why you'd rather ask the federation nodes vs the providers, but if you have to ask multiple federation nodes to gain confidence then you're also adding overhead.

#### Case 3: A provider is known by both but with different head advertisements

**Scenario:**
Instance A and Instance B both know Provider P1, but they have different latest advertisements (heads) for P1's chain.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it today it's unspecified what the conditions are for keeping and service a provider record. IIUC cid.contact tries to contact providers once a day and being unresponsive for some number of those will result in eviction.

If those rules are inconsistent across the federation, or certain providers are simply inaccessible by certain IPNI instances (e.g. Chinese providers inaccessible by US IPNI instances), it could lead to unexpected results. Whether here or in a document describing the behavior for /mainnet this should be specified. Maybe it's as simple as saying a provider is banned unless any member of the federation vouches for them being reachable, or maybe some "vote" is required to ban them.

This section clarifies aspects that the IPNI federation protocol will not address in its current iteration.
Here are the non-goals for the IPNI federation protocol:

* **Permissionless Membership**: This specification assumes a permissioned network, where the membership is controlled

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this is setting the stage for a permissioned network it seems like there should be a few more benefits to clients given they're easier to achieve than in a permissionless environment.

While it may be the case that this protocol can be meant for background sync (and helping fleets of nodes accumulate partial IPNI state such that an overlay network can compute the full state with a TBD protocol), my understanding from the motivation is that this is meant to help clients be able to use >1 indexer and evolve beyond the current state where in practice cid.contact is hard coded everywhere.

To that end some items that would be helpful include:

  1. How can I find the set of participants to query that are supposed to have the full IPNI state (given a root of trust like some bootstrap clients, a domain name, a blockchain identifier, ...)?
  2. At least for /mainnet some document that helps with rules/expectations for that network would be great. For example:
    • What are the expectations around initial ingestion time/rates (e.g. I understand there are concerns around people advertising long chains of tiny advertisements, here might be a place where those expectations can be set)
    • Expectations around whether every "full node" should actually reach the same state. e.g. is choosing to exclude a provider grounds for expulsion?
    • Expectations around how slow of a sync is slow enough to report an incident
  3. A description of how a basic client that's able to pull from multiple endpoints could work (if this is too rough to describe in text that's not great, but then a code reference implementation would be helpful)


* **Epoch**: a monotonically increasing value that represents the time at which snapshot was formed.
* **Vector Clock**: the monotonically increasing vector clock value that corresponds to the IPNI instance.
* **Provider to Ingest State Map**: a map of provider ID to the latest advertisement CID processed by the IPNI

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "processed" mean there's an expectation here that if a client see "height 10" that all of the advertisements up to height 10 will be queryable (by clients or other IPNI nodes), or does that mean they've simply heard about the latest CID/height. Similarly, must height have been validated?

Note: I get that you may not want to be prescriptive here around caching / loadbalancing strategies (e.g. could a client end up with inconsistent state due to a load balancer routing the data to different backends rather than being sticky), either way it'd probably be helpful to call this out for any client implementers.

consistently synchronized across all instances. Future enhancements may introduce additional layers of optimization or
fault-tolerance, but the essence will remain rooted in these foundational principles.

### Snapshot Exchange

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @hsanjuan mentioned elsewhere you might consider if a more prescriptive syncing scheme (e.g. having IPNI nodes post CRDT updates to a shared kv-store like ipfs-cluster uses https://github.com/ipfs/go-ds-crdt) would help you here.

This could reduce the amount of data emitted by snapshots as well as help with the synchronization among many nodes rather than leaving the "how to find the latest state" work up to the individual instances.

@hsanjuan
Copy link

Notes from going over:

  • instances can use metadata or summary vectors of their snapshots. These summaries allow instances to quickly ascertain if they need the full snapshot or if they're already up-to-date : I don't see how the state is represented or how "summaries" are represented. I don't think you can have a syncing mechanism that involves exchanging full snapshots. That doesn't scale with huge indexes nor indexes that are updated very frequently by several parties. There needs to be a notion of "delta", and how the state is represented is important to have a way to create those deltas. ie. perhaps representing the state as a merkle-dag can facilitate diffing and snapshot sharing: https://ieeexplore.ieee.org/document/9049566 (a perhaps better alternative to merkle-crdts from go-ds-crdt mentioned above).
  • Broadcast mechanisms are not detailed: how does a broadcast here work? Careless broadcast will become the first bottleneck on a federated system. Is that an HTTP POST endpoint? Is that other protocol?
  • How are snapshots authenticated? Is the CID a hash of the full snapshot? What happens when snapshots are huge?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants