Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sybil defence #4769

Open
dirvine opened this issue Oct 30, 2023 · 7 comments
Open

Sybil defence #4769

dirvine opened this issue Oct 30, 2023 · 7 comments

Comments

@dirvine
Copy link

dirvine commented Oct 30, 2023

Description

Sorry to bother you. I am wondering how much, if any, of this paper https://ssg.lancs.ac.uk/wp-content/uploads/ndss_preprint.pdf is implememented in this crate?

Much of the paper seems to make sense and I belive the go impl has made strides in this direction.

Motivation

Sybil defence at the network layer would benefit every decentralised project and allow a focussed approach.

Current Implementation

I am not familiar enough with the codebase to infer any insights here.

Are you planning to do it yourself in a pull request ?

No

@drHuangMHT
Copy link
Contributor

I don't think it is a valid topic here in libp2p. libp2p is built heavily around connections, not the application and network built on top of it.

@thomaseizinger
Copy link
Contributor

Thanks for posting this. Can you summarise what the mitigation strategy presented in the paper is? I tried to skim it but couldn't find a concise summary.

@dirvine
Copy link
Author

dirvine commented Nov 1, 2023

@drHuangMHT I understand your position, however this approach provides an API addition that upper layers can use for effective sybil defence. If we make this happen, the man projects could benefit from a network layer approach.

@thomaseizinger I have used Claude2 to help me summarise this. I find it more effective than a human.

Here is a summary of the Sybil mitigation strategy described in the paper:

  • The attack exploits a vulnerability in the Kademlia DHT used for content resolution in IPFS. An attacker can generate multiple Sybil identities that are closer to a target content ID (CID) than any honest peers. This allows the attacker to control the resolvers for that CID and prevent discovery of the content.

  • The mitigation strategy involves using region-based queries instead of contacting only the k closest peers to a CID. The region size is calculated to contain k peers on average, based on an estimate of the total network size.

  • With region queries, providers store records on and requesters retrieve records from approximately k honest resolvers, even if the attacker adds Sybil nodes.

  • The mitigation mechanism is only activated when the attack is detected using a statistical test based on the KL divergence between actual and expected peer ID distributions. This avoids overhead when no attack is present.

  • Detection uses the peer IDs returned for a CID to check if they appear suspiciously close to the CID, indicating potential Sybil nodes. No additional messages are required.

  • The mitigation technique ensures content discovery with high probability for any CID under attack. It increases costs for attackers linearly with the number of Sybils they introduce.

  • The solution can be deployed incrementally, requires no changes to the core DHT protocol, and maintains interoperability with existing IPFS nodes.

To implement the Sybil mitigation strategy in rust-libp2p, the main changes would be:

  • Add support for region-based queries in the Kademlia DHT implementation. This involves finding all peers within a region of a certain distance from a key, rather than just the k closest peers.

  • Implement the network size estimation mechanism. This is used to dynamically calculate the region size for the number of expected honest peers k.

  • Add attack detection using KL divergence of peer IDs. Can be implemented as part of the lookup for provider records.

  • Activate region queries only when attack is detected. Use normal lookup otherwise.

  • Provider and requester logic to use region queries when mitigation is active.

Some specifics:

  • The region query can be implemented via iterative calls to get_closest_peers() to find peers matching more prefix bits.

  • Network size estimation calculates model distribution of distances to estimate size.

  • KL divergence computes distance between empirical and model peer ID distributions.

  • Detection runs on providers and requesters. Can use a shared threshold.

  • Mitigation queries contact more peers, increasing costs for attackers.

This achieves censorship resistance for target CIDs under attack by ensuring providers and requesters communicate through regions guaranteed to have honest peers. The changes maintain compatibility with existing peers. Region queries are only used when needed to limit overhead. Overall, this provides an effective and practical solution to the DHT vulnerability.

@dirvine
Copy link
Author

dirvine commented Nov 1, 2023

As a follow up on KL divergance, I hope this is useful

The Kullback-Leibler (KL) divergence is a statistical measure used to quantify the difference between two probability distributions. It can be used to detect if an empirical distribution of samples differs significantly from an expected theoretical distribution.

In the context of detecting Sybil attacks in DHTs, here is an explanation of how KL divergence is used:

  • Let p(x) be the theoretical probability distribution of common prefix lengths between peer IDs and a target CID, assuming uniform random peer IDs. This can be calculated based on the estimated network size.

  • Let q(x) be the observed empirical distribution of common prefix lengths from the peers returned for a lookup for that CID.

  • If there is no attack, q(x) should closely match p(x). But under an attack with Sybils, q(x) will diverge from p(x).

  • The KL divergence D(q||p) quantifies the difference between the two distributions:

D(q||p) = Σx q(x) * log(q(x) / p(x))

  • It measures the information lost if p is used to approximate q. Higher values indicate larger divergence.

  • If D(q||p) is greater than a chosen detection threshold, we conclude the distributions are significantly different, indicating a likely Sybil attack.

  • The threshold balances false positives and negatives. A higher threshold leads to fewer false positives but more missed attacks.

So in summary, KL divergence provides a principled statistical test to detect Sybil attacks by quantifying the mismatch between the actual and expected peer ID distributions, without needing direct labels about which peers are Sybils.

@thomaseizinger
Copy link
Contributor

Thank you for this!

Region-based queries are not implemented in libp2p-kad and I don't know of anybody that is planning on implementing them. Happy to mentor you on the codebase though if you want to take a stab at it :)

@mxinden as the original author of iibp2p-kad should likely weigh in on this too.

From what I understand, this is entirely an implementation-detail (modulo perhaps a config parameter for the divergence threshold).

@dirvine
Copy link
Author

dirvine commented Nov 1, 2023

It's a pleasure. Thank you for the mentor offer. I am time limited, but I will ask the guys (maidsafe) and see if we can line up some resource when we see how Max feels about this as well. It also may be a nice one to fund as a grant of some kind. I think the freenet guys would also benefit. @sanity may also be interested in this approach.

@mxinden
Copy link
Member

mxinden commented Nov 20, 2023

@dirvine sorry for the delay. Thanks for starting this conversation.

Much of the paper seems to make sense and I belive the go impl has made strides in this direction.

@dennis-tra given your familiarity with the Go implementation and your recent work on it, can you add more details here?

I am wondering how much, if any, of this paper https://ssg.lancs.ac.uk/wp-content/uploads/ndss_preprint.pdf is implememented in this crate?

As Thomas said above, today neither the detection nor the mitigation mechanism is implemented in rust-libp2p. Unfortunately I am also not aware of any out-of-tree implementation in Rust.

this is entirely an implementation-detail

Agreed. This simplifies the implementation process significantly, given that no coordination with other peers is needed.

I am time limited, but I will ask the guys (maidsafe) and see if we can line up some resource when we see how Max feels about this as well. It also may be a nice one to fund as a grant of some kind.

Open for contributions! Would be great to see an implementation in Rust. Note however that I would only consider merging it when the gained security is worth the complexity it comes along with.

(@dennis-tra since you know the authors, they might enjoy the interest here.)

mergify bot added a commit that referenced this issue Aug 27, 2024
…5555)

## Description

This PR is to expose a kad query facility that allowing specify
num_results dynamically.
It is related to the [Sybil Defence
issue](#4769),
that during the attempt of implementation on higher level code, it is
find will be useful if libp2p-kad can expose such facility.

The PR try not to cause any interference to the existing work flow, only
introduce an `extra exposal`.

## Change checklist

<!-- Please add a Changelog entry in the appropriate crates and bump the
crate versions if needed. See
<https://github.com/libp2p/rust-libp2p/blob/master/docs/release.md#development-between-releases>-->

- [x] I have performed a self-review of my own code
- [x] I have made corresponding changes to the documentation
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [x] A changelog entry has been made in the appropriate crates

---------

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
TimTinkers pushed a commit to unattended-backpack/rust-libp2p that referenced this issue Sep 14, 2024
…ibp2p#5555)

## Description

This PR is to expose a kad query facility that allowing specify
num_results dynamically.
It is related to the [Sybil Defence
issue](libp2p#4769),
that during the attempt of implementation on higher level code, it is
find will be useful if libp2p-kad can expose such facility.

The PR try not to cause any interference to the existing work flow, only
introduce an `extra exposal`.

## Change checklist

<!-- Please add a Changelog entry in the appropriate crates and bump the
crate versions if needed. See
<https://github.com/libp2p/rust-libp2p/blob/master/docs/release.md#development-between-releases>-->

- [x] I have performed a self-review of my own code
- [x] I have made corresponding changes to the documentation
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [x] A changelog entry has been made in the appropriate crates

---------

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants