Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trailing / for UnixFS directory normalization is missing #62

Closed
achingbrain opened this issue Feb 28, 2024 · 11 comments · Fixed by #156
Closed

Trailing / for UnixFS directory normalization is missing #62

achingbrain opened this issue Feb 28, 2024 · 11 comments · Fixed by #156
Labels
bug Something isn't working

Comments

@achingbrain
Copy link
Member

achingbrain commented Feb 28, 2024

Trying to click a footnote takes you back to the Wikipedia homepage instead of the footer of the current page:

image

This is because the footnote href is a relative link up one directory:

image

All local links seem to have this problem:

image
@achingbrain achingbrain changed the title Wikipedia footnotes are broken Wikipedia links are broken Feb 28, 2024
@lidel
Copy link
Member

lidel commented Feb 28, 2024

@achingbrain do you have specific article to reproduce? Check if this was also broken when loaded via dweb.link gateway.

iirc this ancient wikipedia mirror has JS that fixes relative links post-load, so if that JS file failed to load via SW, links will be broken.

@achingbrain
Copy link
Member Author

Sure, I posted a bunch in #61

E.g. click any linked text here: https://bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze.ipfs.sw.sgtpooki.com/wiki/Archaeology

iirc this ancient wikipedia mirror has JS that fixes relative links post-load

Ah, ok - there are some js errors in the console - a syntax error in the file, maybe that's it. Is there a later CID for the mirror?

@lidel
Copy link
Member

lidel commented Feb 28, 2024

Ok, I know what is the problem.
Yet another signal we should set up https://github.com/ipfs/gateway-conformance with proper coverage.

SW gateway does not normalize directory URLs. They must end with /. Gateway ensure that by redirecting UnixFS directory paths to URL that ends with trailing /.

Example:
https://bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze.ipfs.dweb.link/wiki/Archaeology returns HTTP 302 to https://bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze.ipfs.dweb.link/wiki/Archaeology/ and that makes relative links work correctly.

This is not specific to Wikipedia, this is a bug on all UnixFS directories (with index.html)

I suspect we need to fix this inside of verified-fetch because we need to know if the terminal element of content path is a directory. Support https:// URLs, and return HTTP 302 with fixed URL when Accept header in request includes text/html.

@lidel lidel added the bug Something isn't working label Feb 28, 2024
@lidel lidel changed the title Wikipedia links are broken Trailing / for UnixFS directory normalization is missing Feb 28, 2024
@achingbrain
Copy link
Member Author

achingbrain commented Feb 29, 2024

They must end with /

Why is this the case?

I think we should be able to serve a directory with or without a trailing slash. My thinking is that it's not a requirement for apache/nginx/etc so people will author their websites with either style, plus forcing redirects for (potentially) every directory request is expensive.

If this is necessary, a 301 would better so browsers don't re-request the invalid URL?

we need to know if the terminal element of content path is a directory

Unless I'm missing something we can detect this as we traverse the DAG - we need the terminal node to display the content, at which point we can interrogate the UnixFS metadata to find out if it's a directory or not, no need to guess from the path.

@lidel
Copy link
Member

lidel commented Feb 29, 2024

[..] that it's not a requirement for apache/nginx/etc so people will author their websites with either style

Correct, if one only cares about their single website, loaded from a single URL, this does not matter. If we want to build a reliable ecosystem where people can publish in permissionless fashion, and we have 3 addressing schemes, we need to normalize, otherwise we end up with bugs like the one described in this issue.

Trailing slash decides the anchor of relative links, and how things like service worker scoping works, has both practical and security ramifications.

This is also to ensure relative links work the same way on legacy path gateways: /ipfs/cid/ (where you technically can skip trailing /), and origin-isolated contexts: https://cid.ipfs.dweb.link (path is implicitly /) and ipfs://cid (where path is implicitly / and can't be skipped and turned to `` empty)

We have it as explicit part of gateway specs and conformance tests for this very reason:

Added reasoning why we do this in ipfs/specs#464.

we need the terminal node to display the content, at which point we can interrogate the UnixFS metadata to find out if it's a directory or not, no need to guess from the path.

Yes, in go-ipfs code we made the decision to redirect after we've read terminal element, one important thing is to preserve any query params:
https://github.com/ipfs/boxo/blob/89248079aaf6689e010d5ea1dd689ed717f0c8b6/gateway/handler_unixfs_dir.go#L52-L55 (ignore ?go-get, it is a legacy Kubo thing)

And yes, we want 301 as the final one, just easier to test with 302 so I always start with that :)

@achingbrain
Copy link
Member Author

Related: ipfs/helia-verified-fetch#5

@lidel
Copy link
Member

lidel commented Mar 4, 2024

Quick update on the state of things

Took {redirect:'manual'} from ipfs/helia-verified-fetch#5 for a spin in #82 but it turned out to be not enough because verifid-fetch does not support http(s):// requests yet.

Service worker gateway needs to convert http(s):// URLs to ip[fn]s:// before passing to verified-fetch.

What happens

With ipfs/helia-verified-fetch#5, resolving http://cid.ipfs.localhost:8080/dir can

  1. be requested as verifiedFetch('ipfs://cid/dir', {redirect:'manual'})
  2. but that produces Response with status 301 and header Location: ipfs://cid/dir/ (which is not supported by browser).

Ideal state

I believe we should not spend time on temporary translation glue code in this project, instead spend time on implementing support for http(s):// URLs upstream, in verified-fetch.

What we want:

  1. Request URL as-is: verifiedFetch('http://cid.ipfs.localhost:8080/dir', {redirect:'manual'})
  2. produces Response with status 301 and header Location: http://cid.ipfs.localhost:8080/dir/ and that is all.

I think we want to do this right, in a way that is future-proof.
This means fixing this bug depends on ipfs/helia-verified-fetch#11 landing upstream first, and then re-enabling {redirect:'manual'}

https://github.com/ipfs-shipyard/helia-service-worker-gateway/blob/440bf9152d3410ef90e69154c63f32778acb7e31/src/lib/heliaFetch.ts#L162

lidel added a commit that referenced this issue Mar 4, 2024
lidel added a commit that referenced this issue Mar 4, 2024
* chore: upgrade helia verified fetch
* chore: note on manual redirects
#62 (comment)

---------

Co-authored-by: Daniel N <2color@users.noreply.github.com>
Co-authored-by: Marcin Rataj <lidel@lidel.org>
@achingbrain
Copy link
Member Author

achingbrain commented Mar 6, 2024

If the user does verifiedFetch('ipfs://cid/dir', {redirect:'manual'}) would we want the redirect location to be the browser-incompatible ipfs://cid/dir/ or the slightly fictitious https://cid.ipfs.local/dir/?

At the moment ipfs://cid/dir will redirect to ipfs://cid/dir/ and https://cid.ipfs.foo/dir will redirect to https://cid.ipfs.foo/dir/

Tests to prevent regressions are at ipfs/helia-verified-fetch#15

@SgtPooki
Copy link
Member

SgtPooki commented Mar 7, 2024

this should be done, the verified-fetch version isn't updated in helia-service-gateway yet though?

@lidel
Copy link
Member

lidel commented Mar 7, 2024

Just to confirm, @achingbrain this is in https://www.npmjs.com/package/@helia/verified-fetch 1.1.0?

If so, we need to refactor this repo to pass original request URL (http(s)://)

ps. ah, yes, ipfs/helia-verified-fetch#15 is adding test to guard the behavior we need, thanks!

@SgtPooki
Copy link
Member

this will be fully resolved once #151 and ipfs/helia-verified-fetch#33 are merged and we update to the latest verified-fetch version

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants