Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: lychee can not detect error relative url #1480

Closed
awang-01 opened this issue Aug 7, 2024 · 4 comments
Closed

Bug: lychee can not detect error relative url #1480

awang-01 opened this issue Aug 7, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@awang-01
Copy link

awang-01 commented Aug 7, 2024

for this site: https://awang-01.github.io/testing/, there is an image with src="testing/images/lychee.png" that I was expecting to fail, but

lychee -v https://awang-01.github.io/testing/
✔ [200] https://awang-01.github.io/testing/images/lychee.png

🔍 1 Total (in 0s) ✅ 1 OK 🚫 0 Errors

the https://awang-01.github.iotesting/images/lychee.png should fail, but lychee automatically add a / before the src

@awang-01 awang-01 changed the title lychee can not detect error relative url bug: lychee can not detect error relative url Aug 7, 2024
@awang-01 awang-01 changed the title bug: lychee can not detect error relative url Bug: lychee can not detect error relative url Aug 7, 2024
@mre
Copy link
Member

mre commented Aug 7, 2024

Just checked out your sample page, and it seems to work as expected.

When clickinging on the link, it brings me to https://awang-01.github.io/testing/images/lychee.png, which seems to be the correct URL.

The page responds with a 404, though.

However, when I open it on the command-line with curl, it returns a 200:

 curl -vvv https://awang-01.github.io/testing/images/lychee.png

It gives me:

 < HTTP/2 200
< server: GitHub.com
< content-type: image/png
< permissions-policy: interest-cohort=()
< last-modified: Wed, 07 Aug 2024 00:08:14 GMT
< access-control-allow-origin: *
< strict-transport-security: max-age=31556952
< etag: "66b2baee-176be"
< expires: Wed, 07 Aug 2024 00:33:29 GMT
< cache-control: max-age=600
< x-proxy-cache: MISS
< x-github-request-id: C8F3:2D8599:2BDA2D6:2D0C67A:66B2BE81
< accept-ranges: bytes
< age: 0
< date: Wed, 07 Aug 2024 00:23:29 GMT
< via: 1.1 varnish
< x-served-by: cache-fra-etou8220027-FRA
< x-cache: MISS
< x-cache-hits: 0
< x-timer: S1722990209.287670,VS0,VE101
< vary: Accept-Encoding
< x-fastly-request-id: 68cf52bf04fa7335da5ee09db44282dbdfce6794
< content-length: 95934
<
Warning: Binary output can mess up your terminal. Use "--output -" to tell
Warning: curl to output it to your terminal anyway, or consider "--output
Warning: <FILE>" to save to a file.
* Failure writing output to destination
* Connection #0 to host awang-01.github.io left intact

That's very similar to what lychee sees, I guess.
Note that lychee doesn't differentiate between images and other content when resolving links.
For lychee, all that matters is the response code. In this case, the page returns a 200.
I don't know why it returns a 200 on the CLI, maybe some GitHub pages bot detection?
I'm not aware of such a mechanism.

@nobkd
Copy link

nobkd commented Aug 7, 2024

No, mre. You got that wrong.

The image on https://awang-01.github.io/testing/ has a relative src of testing/images/lychee.png and should result in an absolute URL of https://awang-01.github.io/testing/testing/images/lychee.png (see duplicated testing) which would fail, because it does not exist (which is expected here).

But lychee uses https://awang-01.github.io/testing/images/lychee.png which is the correct path to the image, but wrong in this context, because the relative image URL was resolved as relative to the hostname instead of the current location, I think.

I think this could be a duplicate of #1296


Edit: Simple reproduction:

(For the other way around. Pages noted as missing, when they're there.)

File tree:

root
└── test
    ├── index.html
    └── next.html

root/test/index.html:

<a href="next.html">next</a>

root/test/next.html:

just needs to exist.

Serve a site from root (e.g. python3 -m http.server -d . 3000)

lychee http://localhost:3000/test/
# or
lychee http://localhost:3000/test/index.html

Results in:

> lychee http://localhost:3000/test/index.html                                                                                                                                                                                                               
  1/1 ━━━━━━━━━━━━━━━━━━━━ Finished extracting links                                                                                                                                                                                                                                       Issues found in 1 input. Find details below.

[http://localhost:3000/test/index.html]:
✗ [404] http://localhost:3000/next.html | Failed: Network error: Not Found

🔍 1 Total (in 0s) ✅ 0 OK 🚫 1 Error

As you can see, the relative link is not resolved correctly by lychee.
You can open the entry page in a browser of your choice and see that you can access the next page.


Just as a note:

> lychee --version
lychee 0.15.1

Note

Also, running the above example like lychee . where . == root, means, testing on file system instead of http(s), works correctly.

@mre mre added the bug Something isn't working label Sep 27, 2024
mre added a commit that referenced this issue Oct 26, 2024
This commit introduces several improvements to the file checking process and URI handling:

- Extract file checking logic into separate `Checker` structs (`FileChecker`, `WebsiteChecker`, `MailChecker`)
- Improve handling of relative and absolute file paths
- Enhance URI parsing and creation from file paths
- Refactor `create_request` function for better clarity and error handling

These changes provide better support for resolving relative links, handling different base URLs, and working with file paths.

Fixes #1296 and #1480
@mre
Copy link
Member

mre commented Oct 26, 2024

It's fixed now. 🎉

lychee -v https://awang-01.github.io/testing/                                                                                          ✘ 
     [404] https://awang-01.github.io/testing/testing/images/lychee.png
     [200] https://awang-01.github.io/testing/assets/styles.css

Issues found in 1 input. Find details below.

[https://awang-01.github.io/testing/]:
     [404] https://awang-01.github.io/testing/testing/images/lychee.png

🔍 2 Total (in 0s) ✅ 1 OK 🚫 1 Error

@mre mre closed this as completed Oct 26, 2024
@mre
Copy link
Member

mre commented Oct 26, 2024

Forgot to mention that it's fixed in master only for now and will be shipped in our next release, 0.17.0.

mre added a commit that referenced this issue Oct 27, 2024
This commit introduces several improvements to the file checking process and URI handling:

- Extract file checking logic into separate `Checker` structs (`FileChecker`, `WebsiteChecker`, `MailChecker`)
- Improve handling of relative and absolute file paths
- Enhance URI parsing and creation from file paths
- Refactor `create_request` function for better clarity and error handling

These changes provide better support for resolving relative links, handling different base URLs, and working with file paths.

Fixes #1296 and #1480
mre added a commit that referenced this issue Oct 27, 2024
This commit introduces several improvements to the file checking process and URI handling:

- Extract file checking logic into separate `Checker` structs (`FileChecker`, `WebsiteChecker`, `MailChecker`)
- Improve handling of relative and absolute file paths
- Enhance URI parsing and creation from file paths
- Refactor `create_request` function for better clarity and error handling

These changes provide better support for resolving relative links, handling different base URLs, and working with file paths.

Fixes #1296 and #1480
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants