Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

root-relative links ignored with default config #1265

Open
tgaff opened this issue Oct 9, 2023 · 6 comments
Open

root-relative links ignored with default config #1265

tgaff opened this issue Oct 9, 2023 · 6 comments
Labels
enhancement New feature or request

Comments

@tgaff
Copy link

tgaff commented Oct 9, 2023

Root-relative links in markdown like [link](/some/dir/file.md) don't get tested as expected, with the default (alternative) config (as opposed to [link](some/dir/file.md). By root-relative I mean those that begin with a / indicating the root of the domain. It appears to ignore those links altogether neither flagging them as good or bad.

I have a page like this:

- [bad link 1 - relative same directory](bad-link-1.md)
- [good link - ../ up to parent and down again](../nested/page.md)
- [bad link 2 - ../ up to parent and down again](../nested/bad-link-2.md)
- [good(ish) link - slash](/bad-links/nested/page.md)
  - note: this is a bad link if the site is *hosted* in a sub-directory like this one is
- [bad link 3 - slash](/bad-links/nested/bad-link-3.md)

I expect 3 failures on this page, but only get:

✗ [ERR] file:///home/runner/work/some-gh-pages-site/some-gh-pages-site/bad-links/nested/bad-link-1.md | Failed: Cannot find file
✗ [ERR] file:///home/runner/work/some-gh-pages-site/some-gh-pages-site/bad-links/nested/bad-link-2.md | Failed: Cannot find file
✔ [200] file:///home/runner/work/some-gh-pages-site/some-gh-pages-site/bad-links/nested/page.md

As you can see link-3 starts with a /.

reproduction repo

I setup a separate repo to test this. It has 4 bad links.

I'm also testing it in two ways. (1) the suggested alternate config from lychee-action, (2) with the addition of a --base . param.

I've found that setting --base . solves the problem. It finds the rest of the files which the default ignores. I think this should probably be the default / advertised config.

I have 3 test setups in my test repo.

  1. PR testing via default suggested settings ❌
  2. PR testing with --base . added and original args copied ✅
  3. a post-deploy test of the same code deployed via GitHub pages default deployment agent 🚱 (ignore this)
test 1: PRs as markdown with the default suggested config

Using config:

      - name: Link Checker
        uses: lycheeverse/lychee-action@v1.8.0
        with:
          fail: true  

Detects only some bad links

https://github.com/tgaff/some-gh-pages-site/actions/runs/6453247265/job/17516481918?pr=6

test 2: slightly altered config to specify --base .

All 4 bad links detected including bad-link-3:

✗ [ERR] file:///home/runner/work/some-gh-pages-site/some-gh-pages-site/bad-links/nested/bad-link-3.md | Failed: Cannot find file

https://github.com/tgaff/some-gh-pages-site/actions/runs/6453247263/job/17516481903?pr=6

Additional commentary

Yes I could adjust every link to not begin with slash and it would have zero ill-effects. However it seems odd that its necessary - especially given that the links are correct when clicking in the markdown on GitHub and also correct as part of the GitHub pages site. There's no guarantee that someone won't include such a link in their branch in the future - and it would silently slip through.

Would a PR adding --base . as part of the default be accepted?

@mre mre transferred this issue from lycheeverse/lychee-action Oct 10, 2023
@mre
Copy link
Member

mre commented Oct 10, 2023

FYI: moved the issue to the lychee main repo, because it's a better fit there.

@mre
Copy link
Member

mre commented Oct 10, 2023

Thanks for the thorough analysis.

Yes, --base . is currently the suggested way. Making it the default would make your case work, but break another case:
If the input is https://example.com and we find a link such as /about on that page, we assume it is pointing to https://example.com/about, which is a sensible assumption.

However, setting --base to . by default conflicts with that.
That's because we set the base from the source URL if --base is not set.
#358

If we change that behavior, we have to track (or infer) whether the source is a directory path or a URL. Sounds straightforward, but there are edge-cases like /about, which could be either a path or a URL relative to the domain root — depending on the context.

I do think that this can be resolved, it just requires more research and thorough testing to avoid any future surprises. I would love to get some help on this by mapping out all cases and maybe writing some tests.
For now, --base . avoids any ambiguities, at the cost of false negatives if the parameter isn't used.

@tgaff
Copy link
Author

tgaff commented Oct 10, 2023

Thanks for triaging this mre.
It makes sense that completely ignoring the link is a bug for this repo.

Would it make sense to add some small description to the README of the lychee-action repo so that others are aware they should add --base . when testing against local files rather than a server? It can be pretty non-obvious when it checks a few hundred links that a few were quietly skipped over.

@mre
Copy link
Member

mre commented Oct 17, 2023

Yes, absolutely. Pull requests to lychee-action (and possibly lychee) to clarify the behavior would be very welcome. I'm too involved in the tool creation to make this an easy task for me, as it's kinda obvious by now, but of course that's just Stockholm syndrome. 😆

@chipzoller
Copy link

To add my two cents here, although I think a README change is probably sufficient, it seems like there could be some type of enhancement that would at least let users know that root-level files were discovered and skipped. Even if that were only shown in a debug or verbose mode, that'd be fine and probably enough to point users in the right direction. My expectation as a user of lychee is that there's some setting which will tell me absolutely every link lychee "saw" and what its disposition is. Even if the --base . flag isn't set, it's still a genuine link. I'd want to be told what it did or didn't do about it.

@mre
Copy link
Member

mre commented Oct 27, 2023

Fair. As a first step we could print the ignored potential links in verbose mode.

@mre mre added the enhancement New feature or request label Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants