Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote denylists and watching system (proposal) #19

Open
hsanjuan opened this issue Oct 2, 2023 · 3 comments
Open

Remote denylists and watching system (proposal) #19

hsanjuan opened this issue Oct 2, 2023 · 3 comments

Comments

@hsanjuan
Copy link
Collaborator

hsanjuan commented Oct 2, 2023

The following are my thoughts on how to provide denylists so that they can be subscribed-to.

Server

  • Using HTTP (poll), essentially an IPFS-gateway served file:
    • Lists are made available over an http endpoint.
    • Range requests are supported an accepted.
    • eTag (set to CID )and caching headers
    • I'd like to expand this to use Server Push / Notifications, but it is actually simpler and ok if clients check regularly and check e-tag to see if content changed.
  • Using IPFS:
    • Denylist IPFS-host server publishes to pubsub topic denylist/<name>. Must be a signed message.
    • The message includes the CID of the latest version of the list. This is published every minute, or when it is updated.
    • The CID is the CID of the denylist which is a normal unixfs file (balanced chunking).
    • UnixFS files have support for seeking out of the box, and only the necessary blocks are downloaded when looking for specific bytes.

Client

  • HTTP: Client polls for the file every minute using a ranged request starting at the last byte read. A head request can be done in advance to check eTAG and decide if a GET request is needed. New bytes are appended to the file on disk.
  • IPFS: Client subscribes to pubsub topic. If a new CID comes in, we use unixfs to seek to the last byte read and then it is appended to the file on disk. The pubsub message can include more than the CID, for example a field to indicate if redownloading the full file and processing from the beginning is necessary.
@hsanjuan
Copy link
Collaborator Author

HTTP polling has been introduced at #22

@lidel
Copy link
Member

lidel commented Apr 22, 2024

We want to leverage this and switch ipfs.io and dweb.link to use RAINBOW_DENYLISTS=https://badbits.dwebops.pub/badbits.deny.

Did some initial triage today:

@hsanjuan
Copy link
Collaborator Author

Hey, nopfs watches denylists and reads any new lines appended to them. Adding updates in append-only fashion allows to do this without having to re-read the whole file.

I don't think #38 is a must. If the list upstream is append only, you:

  • Download it
  • Read from len(size_of_download file) for every update using the Range header. If no updates happened you will be reading a 0-range and it's essentially a no-op method (equivalent to checking for a If-Modified-Since tag), otherwise you obtain only the part of the content that you append to the local copy, and nopfs processes accordingly.

I don't know if you saw, but the badbits list is published in append-only format here: https://denyli.st/badbits.deny.txt and that is what I used for my defunct gateway.

I have a github action that reads https://badbits.dwebops.pub/badbits.deny and finds any new lines and turns appends them to https://denyli.st/badbits.deny... so far so good, it's been going for months.

So you can use RAINBOW_DENYLISTS=https://denyli.st/badbits.deny.txt already. In the meantime I would update badbits to be append only and not have to rely on a 3rd party.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants