Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support website hints to reroute HTTP query to IPFS #16

Closed
4 tasks done
lidel opened this issue Apr 11, 2015 · 30 comments
Closed
4 tasks done

Support website hints to reroute HTTP query to IPFS #16

lidel opened this issue Apr 11, 2015 · 30 comments
Labels
kind/discussion Topical discussion; usually not changes to codebase status/blocked/missing-api Blocked by missing API

Comments

@lidel
Copy link
Member

lidel commented Apr 11, 2015

Summary of Open Ideas

Semi-related:

@lidel lidel added kind/enhancement A net-new feature or improvement to an existing feature kind/discussion Topical discussion; usually not changes to codebase labels Apr 11, 2015
@Fil
Copy link

Fil commented Apr 11, 2015

What about URLs like (whatever)/resource?hash::HASH

browsers (or proxies) that don't know about ?hash:: would do the usual thing, others would be able to use ipfs, or download and check the signature, or even "seed" the resource — which means that a webmaster could get all the benefits of ipfs for their resources w/o installing ipfs themselves, just but by signing them

@lidel
Copy link
Member Author

lidel commented Apr 11, 2015

Yes, that also sounds good!
To make adoption easy, it should be self-explanatory and minimalistic, and ?ipfs:<path> would be a perfect alternative for people who can't use /.well-known/ipfs (shared hosting etc).

Cool idea by _fil_ from #ipfs (https://botbot.me/freenode/ipfs/msg/36341155/)

A well-known URI is a URI [RFC3986] whose path component begins with
the characters "/.well-known/", and whose scheme is "HTTP", "HTTPS",
or another scheme that has explicitly been specified to use well-
known URIs.
RFC-5785: Defining Well-Known Uniform Resource Identifiers (URIs)

Right now I plan to support in this add-on (redirect request to a custom gateway, if enabled):

  • /.well-known/ipfs/<path>
  • /.well-known/ipns/<path>
  • <url>?ipfs:<path>
  • <url>?ipns:<path>

This add-on already supports (v1.0.1) ipfs:<path> and ipns:<path> in address bar, so it would follow a single naming convention ✨

@lidel lidel changed the title Reroute any .well-known/ipfs/<path> query to a local server Reroute any /.well-known/ipfs/* query to a custom gateway Apr 11, 2015
@Fil
Copy link

Fil commented Apr 11, 2015

Wait. Reading https://tools.ietf.org/html/rfc5785#page-3 we need to make sure that we follow the rules

well-known URIs are not intended
for general information retrieval or establishment of large URI
namespaces on the Web. Rather, they are designed to facilitate
discovery of information on a site when it isn't practical to use
other mechanisms

(...)

Applications that wish to mint new well-known URIs MUST register
them, following the procedures in Section 5.1.

@lidel
Copy link
Member Author

lidel commented Apr 11, 2015

Ouch.. I am bit torn now.
On one hand “not intended for general information retrieval or establishment of large URI
namespaces”
, on the other “they are designed to facilitate discovery of information on a site when it isn't practical to use other mechanisms”.

I guess you could say ?ipfs:<path> parameter hints are those other mechanisms.
Anyway, I really like the idea of path-based hint, so if not /.well-known/, what other convention could make sense?

  • /gateway/ip(f|n)s
  • /cdn/ip(f|n)s

?

@Fil
Copy link

Fil commented Apr 11, 2015

I think something like ?ipfs:hash would be most useful: it does not disrupt your usual website functionality, and allows anyone to try and switch to /ipfs/

  • the CMS could just compute the hash and send it as a hint (seems easy to make a WP plugin that does this, for example); it could also add the resource to ipfs, but it seems like a lot more work.
  • web server, server-side proxies, user-side proxies could validate the hash, and themselves add the resource to /ipfs/ if it's not already there
  • browsers could validate hash, switch to file:///ipfs/ or localhost:8080/ipfs/

To make it unique and 'fs'-ish, we could use ?/ipfs/hash

I'm not so keen about /gateway/ip(f|n)s because it means a lot of changes to the website's structure.

@lidel
Copy link
Member Author

lidel commented Apr 11, 2015

You are right. Forcing URL convention would probably fail.

One way would be to provide a hint for browser add-ons without need to change already existing URLs. I thought about it and a special HTTP header sent by server could be a way to do it.

For example CMS, web server or even load-balancing proxy could add this header to HTTP response:

X-IPFS-Root: /ipfs/QmSsNVuALPa1TW1GDahup8fFDqo95iFyPE7E6HpqDivw3p/

Does it sound ok?

@lidel lidel changed the title Reroute any /.well-known/ipfs/* query to a custom gateway Support website hints to reroute HTTP query to IPFS Apr 11, 2015
@Fil
Copy link

Fil commented Apr 11, 2015

IMO having a response header makes the information come “too late”: we don't even want to query the web server if we can ask ipfs.

@lidel
Copy link
Member Author

lidel commented Apr 11, 2015

Yes, my thinking was that the special header would be returned only for the root document (I updated sample above), giving a hint that all page assets are under specific IPFS hash, eg.

  • /ipfs/<hash>/style.css
  • /ipfs/<hash>/logo.png

Not sure how useful it would be tho..

@dylanPowers
Copy link
Member

I agree with @Fil, a HTTP header would be too late and ruin the decentralized nature of ipfs. I think a better solution would be for the ipfs gateway to expose an api for doing dns queries. That would enable our extensions to check if a requested domain has a dns txt record for ipfs set and redirect to that if available. Then instead of typing /ipns/ipfs.git.sexy/ you could simply type ipfs.git.sexy/. Those without the extension would visit the centralized site, and those with the extension would visit the decentralized version.

@lidel
Copy link
Member Author

lidel commented Apr 11, 2015

I like this idea: detecting IPFS hint at the DNS level is early enough to avoid any HTTP request to original server.

I quickly glanced over Firefox APIs and it seems add-ons can only read A records (nsIDNSRecord).
To check TXT ones we will need help from local gateway, just like you suggested.

@jbenet
Copy link
Member

jbenet commented Apr 13, 2015

I want to remind everyone here that we're not actually limited by any rules. It is of course convenient and nice to work productively with everybody else, but there are certain mistakes we should not continue to make.

I'll give you an example of a break from tradition (or rather... a return to even older tradition). it is a strong goal to mend the rift between UNIX and the Web. That is, "ipfs links" should be exactly the same in both the Web, and UNIX. meaning: /ipfs/<hash>/<path>, NOT ipfs://<hash>/<path> (explicitly disobeying the scheme that the W3C insists on). Luckily for us, this is technically feasible, though it does have its bumps to work around. (and easiest done with a TLD :) ). That's fine for us as the upside of mending part of this awful rift is worth a lot.


@lidel

ipfs:<path> and ipns:<path>

please don't do this. please please please have identifiers exactly how we have them, everywhere. Simply /ipfs/... and /ipns/....

browsers (or proxies) that don't know about ?hash:: would do the usual thing, others would be able to use ipfs, or download and check the signature, or even "seed" the resource — which means that a webmaster could get all the benefits of ipfs for their resources w/o installing ipfs themselves, just but by signing them

I suggest: ?altref=<path> or a key that would not be likely to clash.

Also, I'm not sure this is the right approach. Instead of forcing people to change links, it may be easier to implement both an http proxy and an extension that checks a { url : path } map. if there is a hit, it can be served instead.


X-IPFS-Root: /ipfs/QmSsNVuALPa1TW1GDahup8fFDqo95iFyPE7E6HpqDivw3p/

much better idea, though note at this point the request already happened (as @Fil and @dylanPowers note). Also, we already return this:

X-IPFS-Path: /ipfs/QmSsNVuALPa1TW1GDahup8fFDqo95iFyPE7E6HpqDivw3p/

I think a better solution would be for the ipfs gateway to expose an api for doing dns queries.

I like this idea: detecting IPFS hint at the DNS level is early enough to avoid any HTTP request to original server.

Are you thinking of anything beyond the DNS TXT records we already use for resolving dns names in the ipfs namesys? (i.e. resolving TXT records to either ipfs-paths or ipns-paths).


Good discussion! 👍 👍

@dylanPowers
Copy link
Member

@jbenet nothing beyond what's already in the dns. We simply need access to dns and there aren't any browser api's that can do that

@lidel
Copy link
Member Author

lidel commented Apr 13, 2015

please have identifiers exactly how we have them, everywhere.
Simply /ipfs/... and /ipns/....

Keeping 'canonical' URI consistent across applications is a very good point.
I will track support for this kind of URI format in #13.

I suggest: ?altref=<path> or a key that would not be likely to clash.

Yeah, I could not find any pre-existing convention for this kind of hint.
I guess altref is generic enough, and add-on would explicitly look for [?&]altref=\/ip(f|n)s\/ just to be sure we don't break any legacy webapp using the key.

anything beyond the DNS TXT records we already use for resolving dns names in the ipfs namesys?

At this point we only need JSON API that consumes HTTP URL (or just a domain in FQDN format) and returns 404 (IPFS hash not found in DNS TXT) or 200 with IPFS URI in JSON body.

URI Example:

Request:

{ uri: "http://ipfs.git.sexy/index.html" }

Response:

{ uri: "/ipns/ipfs.git.sexy/index.html" }

FQDN Example:

Request:

{ fqdn: "ipfs.git.sexy" }

I think that instead of POST, a simple HTTP GET http://127.0.0.1:8080/dns/ipfs.git.sexy could be used for FQDN.

Response:

{ uri: "/ipns/ipfs.git.sexy/" }

Not sure which one is more practical, maybe the FQDN one would be enough?
DNS TXT is the same for the entire domain, so response could be cached for some time and IPFS root could be reused for all subsequent requests to the same hostname.

What are your thoughts on this?

@jbenet
Copy link
Member

jbenet commented Apr 13, 2015

@lidel i like this a lot. On URI vs FQDN, i think we could support both. i think for the exact apis, we'll want a few different formatted ones, too. For example:

  • it should be possible to HTTP GET http://127.0.0.1:8080/dns/ipfs.git.sexy and get 302-ed to the other url.
  • the json api 👍

@lidel
Copy link
Member Author

lidel commented Apr 14, 2015

Just got this idea: it would not hurt to return both /ipns/ and /ipfs/ in /dns/ API's responses.

Sample response for HTTP GET /dns/ipfs.git.sexy:

{ 
  ipns: "/ipns/ipfs.git.sexy/", 
  ipfs: "/ipfs/QmVyS3iAy7mvDA2HqQWm2aqZDcGDH3bCRLFkEutfBWNBqN/" 
}

Use case: one wants to share a link to a specific revision of IPNS resource.
Solution: add-on provides both "Copy /ipns/ Link" and "Copy /ipfs/ Link" in browser's contextual menu.

@jbenet
Copy link
Member

jbenet commented Apr 17, 2015

@lidel yep, that lgtm.

@jbenet
Copy link
Member

jbenet commented Apr 17, 2015

@lidel we should probably move this discussion to go-ipfs or somewhere to implement this.

@lidel
Copy link
Member Author

lidel commented Apr 17, 2015

Yes, we should continue DNS talk in ipfs/kubo#1054 (I've put a summary of this discussion there).

@Mithgol
Copy link

Mithgol commented Oct 27, 2015

Another possible hint from a web site is hosting IPFS content simply on /ipfs/… path from its own root, without any preceding /gateway or /cdn.

Example: http://ipfs.pics/ipfs/Qmen4mpfUAnsh98hCZPr1DKUjJFuuAYfsAzjNdYY9R637i

@lidel
Copy link
Member Author

lidel commented Dec 2, 2015

A related discussions: ipfs/notes#73 / ipfs/notes#92

@lidel
Copy link
Member Author

lidel commented Jan 13, 2016

@Mithgol regex-based URL matching will not work – it produces false-positives such as the one described in #43 (good thing that functionality is disabled by default and marked as experimental).

As a half-measure it is now possible to manually specify multiple "public gateway" hosts in the Properties screen:

2016-01-13-214617_509x83_scrot

If we want automatic detection of resources that can be redirected as-is, I think something like ipfs/notes#92 could be the solution for this particular use case.

Until then, we can implement dnslink support (I created #44 for this work).

@Mithgol
Copy link

Mithgol commented Jan 14, 2016

What if a regular expression is made very specific to prevent false positives?

For example, the current IPFS multihash (the one that uses SHA2-256) seems to be always 46 characters long, and starts with Qm, and all the characters are in base58.

It means that, while a simple regex (such as /\/ipfs\// for example) might pick false positives (such as https://github.com/ipfs/ in #43), a very specific regex (such as /\/ipfs\/Qm[1-9A-HJ-NP-Za-km-z]{44}/ for example) should prevent false positives and just work as expected.

@Kubuxu
Copy link
Member

Kubuxu commented Jan 14, 2016

Problem with rerouting is that some sites need special treatment, see: ipfs/notes#92 (comment) (I think I identified all cases (Edit 3)).

Also #45.

@Mithgol
Copy link

Mithgol commented Jan 16, 2016

Thank you for explaining, now I understand that some sites might really want to be not redirected to a local gate.

However, I still have some arguments for most sites to be served from local IPFS gates by default, while the suffering minority should be given an interface to opt out of such rerouting. See ipfs/notes#92 (comment) for details of these arguments.

@lidel lidel removed the kind/enhancement A net-new feature or improvement to an existing feature label Jan 31, 2016
@lidel
Copy link
Member Author

lidel commented Feb 5, 2016

FYI the latest version (1.5.0 at AMO) provides an experimental feature related to this discussion:

  • Detect host names with dnslink and redirect requests to IPFS (/ipns/<fqdn>)

It is disabled by default and needs to be enabled in Preferences:

2016-02-05-215508_341x89_scrot

Lookups are done via XHR requests to /api/v0/dns/<fqdn>
Addon uses in-memory cache of lookups to minimize performance hit.

This is just a PoC. Current performance is quite poor and it may slow down overall browsing experience.

@lidel
Copy link
Member Author

lidel commented Feb 10, 2016

FYI we now have a handy library for detecting IPFS urls and paths: is-ipfs.
It performs real multihash validation via js-multihash.
I switched from regex to it in v1.5.3

@lidel
Copy link
Member Author

lidel commented Sep 14, 2017

Let's revisit this 🌵🤔

Now that we have CID support in is-ipfs the browser extension can detect HTTP requests to URLs with paths starting with /ipfs/{CID} and be sure these are valid IPFS resources (each CID is validated).
Performance concerns (dylanPowers/ipfs-chrome-extension#12 (comment)) were addressed by checking if path "looks like" IPFS path before performing costly CID validation (IsIpfs.url(request.url)).

This means we don't need the list of "known public gateways" anymore and can redirect all /ipfs/{CID} requests to custom gateway (usually locally running daemon).
The only requirement is that website exposes correct CID under /ipfs/ path root.
In fact IPFS detection works this way in ipfs-companion v2.0.9beta1. Globally. On any site.

/ipns/{stuff} is bit more problematic: is-ipfs does not reliably validate IPNS paths, as those can include either PeerID (multihash?) or a string with DNS hostnames. Only simple regex is used.
This means https://github.com/ipns/some-repository produces false-positive IPNS path: /ipns/some-repository.

Given all of this I would appreciate some thoughts and ideas on:

  1. Are there any cons of redirecting all /ipfs/{CID}/* globally by default and removing the whitelist of "known public gateways" from Preferences screen?
    My thinking is that UX is much better if browser extension opportunistically detects IPFS resources on the internet and acts accordingly.

  2. Should we (a) remove IPNS support, (b) keep a list of "known public gateways" solely as "IPNS whitelist" or (c) is there a way of reducing the number of false-positives for /ipns/?
    I was thinking about checking if "foo" in /ipns/foo is PeerID (multihash?) or a valid (resolving) DNS domain.
    Would that be enough (at least for now)?

@lockedshadow
Copy link

This means we don't need the list of "known public gateways" anymore and can redirect all /ipfs/{CID} requests to custom gateway (usually locally running daemon).
The only requirement is that website exposes correct CID under /ipfs/ path root.
In fact IPFS detection works this way in ipfs-companion v2.0.9beta1. Globally. On any site.

Seems like pretty good news! Especially for those who want to maintain a public, but not wide public gateway.

Are there any cons of redirecting all /ipfs/{CID}/* globally by default and removing the whitelist of "known public gateways" from Preferences screen?
My thinking is that UX is much better if browser extension opportunistically detects IPFS resources on the internet and acts accordingly.

I suppose that this behaviour is preferable for nearly all cases (except cases described in ipfs/notes#104). But list of known public gateways is still useful in some use-cases — at least for Copy public gateway URL function. It still necessary as long as there are those who not yet joined the distributed web! Maybe it's possible to add an option to select which public gateway to substitute in generated URL, if there are more than one in Preferences? For now it seems to be hardcoded on ipfs.io.

Should we (a) remove IPNS support, (b) keep a list of "known public gateways" solely as "IPNS whitelist" or (c) is there a way of reducing the number of false-positives for /ipns/?

Definitely not (a).
And it not strictly related to (c), but maybe it makes sense to keep not [only] "IPNS whitelist", but a Blacklist too? And options for manual adding to it some un-gateways on-the-fly, in the same manner as, for example, Smart HTTPS do. This blacklist should be empty at start, and step-by-step filling with domains that look like gateways, but not recognised as gateways on closer examination.

Of course, this is just a suggestion.

I was thinking about checking if "foo" in /ipns/foo is PeerID (multihash?) or a valid (resolving) DNS domain.

On this solution, final stage of checking will be delegated to the IPFS daemon, same as addon acts now with DNSLINK Support enabled? And "resolving DNS domain" means that it returns just generic A record, or IPFS-specific TXT with dnslink and/or _dnslink. too?

Anyway, for now this solution seems to be enough in any of these cases, I think.

@lidel
Copy link
Member Author

lidel commented Sep 21, 2017

Good ideas!

  • I created Option to customize Default Public Gateway #284 to track customization of "default public gateway"
  • IPNS path validation
    • Blacklisting is tricky, you can't block someone forever (what if they want to run real IPFS tomorrow?).
      It requires orchestration that removes stale records after some time etc. Not sure if this increased complexity is worth the effort.
      • but.. we should cache positive results instead (if dns or peerId was valid once, chances are it will always pass :-)
    • If foo in /ipns/foo is a PeerID, then we could check if it is a valid multihash (in pure javascript alone).
    • If it is not a multihash, then we assume it is a domain name, so we delegate fqdn DNS validation in /ipfs/{fqdn} paths to the IPFS daemon, then we can check for both A and TXT. Local gateway exposes an API for reading dnslink at /api/v0/dns/{fqdn}.
      • Alternatively, we could just make a quick HTTP HEAD request to http://fqdn/ to confirm if fqdn is a real domain. This may be faster/easier, but does not validate presence of dnslink.

lidel added a commit that referenced this issue Oct 6, 2017
This commit removes false-positive redirects for paths
that start with /ipns/{ipnsRoot} by following these steps:
1. is-ipfs test (may produce false-positives)
2. remove false-positives by checking if ipnsRoot is:
   - a valid CID (we check this first as its faster/cheaper)
   - or FQDN with a valid dnslin in DNS TXT record
    (expensive, but we reuse caching mechanism from dnslink experiment)

This means we now _automagically_ detect valid IPFS resources on any
website as long as path starts with /ipfs/ or /ipns/, removing problems
described in
#16 (comment)

This commit also closes #69 -- initial load is suspended until dnslink
is read via API, then it is cached so that all subsequent requests are
very fast.
lidel added a commit that referenced this issue Oct 8, 2017
This commit removes false-positive redirects for paths
that start with /ipns/{ipnsRoot} by following these steps:
1. is-ipfs test (may produce false-positives)
2. remove false-positives by checking if ipnsRoot is:
   - a valid CID (we check this first as its faster/cheaper)
   - or FQDN with a valid dnslin in DNS TXT record
    (expensive, but we reuse caching mechanism from dnslink experiment)

This means we now _automagically_ detect valid IPFS resources on any
website as long as path starts with /ipfs/ or /ipns/, removing problems
described in
#16 (comment)

This commit also closes #69 -- initial load is suspended until dnslink
is read via API, then it is cached so that all subsequent requests are
very fast.
@lidel
Copy link
Member Author

lidel commented Oct 14, 2017

We are finally able to detect IPFS paths without obvious false-positives! 🎉

Latest version of our browser extension (v2.0.13) performs path validation on every request:

  • For ^/ipfs/ paths, CID must be valid (CIDv0 or CIDv1)
  • For ^/ipns/ peerid needs to be a valid CID or be a FQDN with dnslink in TXT record

If a path starts with a valid IPFS-enabled root, then request is redirected to a local gateway.
This is enabled by default and works on every website, without need for header-signalling or wasted round-trips.

We also kinda-support custom protocols thanks to #289:
(if someone wants to experiment with IPFS-only page)

Content script detects IPFS-related protocols in href and src attributes of Elements such as <a> or <img> and replaces them with URL at the user-specified public HTTP gateway.
If IPFS API is online, HTTP request will be redirected to custom gateway.

This normalization logic is enabled by "Catch Unhandled IPFS Protocols" experiment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/discussion Topical discussion; usually not changes to codebase status/blocked/missing-api Blocked by missing API
Projects
None yet
Development

No branches or pull requests

7 participants