Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameter --max-redirects did not take effect #715

Closed
Jiahui-Gu opened this issue Aug 3, 2022 · 12 comments
Closed

Parameter --max-redirects did not take effect #715

Jiahui-Gu opened this issue Aug 3, 2022 · 12 comments
Labels
bug Something isn't working

Comments

@Jiahui-Gu
Copy link

I set --max-redirects is 30 and tested a 7 redirects example lychee reports too many redirects.

My CMD is:
lychee /path/to/links.html --max-redirects 30 --format detailed

The result is:

📝 Summary
---------------------
🔍 Total............1
✅ Successful.......0
⏳ Timeouts.........0
🔀 Redirected.......0
👻 Excluded.........0
❓ Unknown..........0
🚫 Errors...........1

Errors in /path/to/links.html
✗ [ERR] https://account.microsoft.com/family | Network error: error following redirect for url (https://account.microsoft.com/auth/complete-silent-signin?ru=https%3A%2F%2Faccount.microsoft.com%2Ffamily%3Frefd%3Daccount.microsoft.com): too many redirects

links.html content is:
<a href="https://account.microsoft.com/family"></a>

@lebensterben
Copy link
Member

𝛌> lychee test.html --max-redirects 10000
Issues found in 1 input. Find details below.

[test.html]:
⧖ [TIMEOUT] https://account.microsoft.com/family | Timeout

🔍 1 Total ✅ 0 OK 🚫 0 Errors

I don't think this is lychee's issue.

You can also try this with curl

𝛌> curl -iL https://account.microsoft.com/family --verbose

@Jiahui-Gu
Copy link
Author

Jiahui-Gu commented Aug 3, 2022

But in fact, this URL does not time out, lychee still gives an error result, you can click on it to see.

I think the root cause may be: the actual URL is only redirected 7 times, but for some reason in lychee redirected so many times that it either exceeds the set --max-redirects or timeout

@lebensterben
Copy link
Member

the actual URL is only redirected 7 times

How do you know this?

@lebensterben
Copy link
Member

There are definitely more than 30 redirects.

I've managed to get TIMEOUT with 100 redirects and firefox's user-agent

𝛌> lychee test.html --max-redirects 100 --user-agent "Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/81.0"
Issues found in 1 input. Find details below.

[test.html]:
⧖ [TIMEOUT] https://account.microsoft.com/family | Timeout

🔍 1 Total ✅ 0 OK 🚫 0 Errors

@lebensterben
Copy link
Member

It turns out that if you don't provide a user-agent it returns too many redirects error. With user-agent it times out.

@Jiahui-Gu
Copy link
Author

Jiahui-Gu commented Aug 3, 2022

the actual URL is only redirected 7 times

How do you know this?

linkchecker told me that:

"Redirected to `https://login.live.com/login.srf?wa=wsignin1.0&rpsnv=13&checkda=1&ct=1659358278&rver=7.0.6738.0&wp=MBI_SSL_SHORT&wreply=https:%2F%2Faccount.microsoft.com%2Fauth%2Fcomplete-silent-signin%3Fru%3Dhttps%253A%252F%252Faccount.microsoft.com%252Ffamily&lc=1033&id=292666'.

Redirected to `https://account.microsoft.com/auth/complete-silent-signin?ru=https%3A%2F%2Faccount.microsoft.com%2Ffamily'.

Redirected to `https://account.microsoft.com/family'.

Redirected to `https://account.microsoft.com/family/about?ru=https%3A%2F%2Faccount.microsoft.com%2Ffamily'.

Redirected to `https://www.microsoft.com/microsoft-365/family-safety?ocid=family_signin'.

Redirected to `https://www.microsoft.com/en-sg/microsoft-365/family-safety?ocid=family_signin&rtc=1'."

Then return 200

linkchecker's user-agent is Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36. But I still get too many redirects after add the same user-agent to lychee.

@mre
Copy link
Member

mre commented Aug 11, 2022

Very weird, I get the same error.

 echo 'https://account.microsoft.com/family' | lychee --user-agent 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/
103.0.0.0 Safari/537.36' --max-redirects 100 -
Issues found in 1 input. Find details below.

[stdin]:
⧖ [TIMEOUT] https://account.microsoft.com/family | Timeout

🔍 1 Total ✅ 0 OK 🚫 0 Errors

I don't know what could be causing this. If someone wants to experiment with it, feel free to add some logs/printlns or step through it with a debugger to find the root cause.

@mre mre added the bug Something isn't working label Aug 11, 2022
@lebensterben
Copy link
Member

I've no idea why browsers are able to get correct http response, but not curl of lychee.

@mre
Copy link
Member

mre commented Aug 11, 2022

Apparently it works for linkchecker, though, and that's also a cli tool? 🤔

@mre
Copy link
Member

mre commented Oct 24, 2022

Differences I could see for linkchecker:

  • Opens a session, arguably for cookie support (?)
  • Sets the Referer (sic) header.

I tried to emulate this behavior with curl...

curl --location-trusted -c cookie-jar.txt --max-redirs 0 -L --referer ';auto' -A 'Mozilla
/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36' 'h
ttps://account.microsoft.com/family'

... but I run into a timeout without any output.

(The ;auto automatically updates the referrer to the last parent.)

@mre mre mentioned this issue Jul 8, 2023
@mre
Copy link
Member

mre commented Jul 8, 2023

It works with cookie support; no user agent needed.

This is the output with #1146:

echo 'https://account.microsoft.com/family' | lychee --cookie-jar cookies.json -vvv --no-progress --max-redirects 100 -                                                                                                                 
[DEBUG] Redirecting to https://login.live.com/login.srf?wa=wsignin1.0&rpsnv=15&checkda=1&ct=1688856366&rver=7.0.6738.0&wp=MBI_SSL_SHORT&wreply=https:%2F%2Faccount.microsoft.com%2Fauth%2Fcomplete-silent-signin%3Fru%3Dhttps%253A%252F%252Faccount.microsoft.com%252Ffamily&lc=1033&id=292666
[DEBUG] Redirecting to https://account.microsoft.com/auth/complete-silent-signin?ru=https%3A%2F%2Faccount.microsoft.com%2Ffamily
[DEBUG] Redirecting to https://account.microsoft.com/family
[DEBUG] Redirecting to https://account.microsoft.com/family/about?ru=https%3A%2F%2Faccount.microsoft.com%2Ffamily
[DEBUG] Redirecting to https://www.microsoft.com/microsoft-365/family-safety?ocid=family_signin
[DEBUG] Redirecting to https://www.microsoft.com/de-de/microsoft-365/family-safety?ocid=family_signin&rtc=1
✔ [200] https://account.microsoft.com/family

🔍 1 Total (in 1s) ✅ 1 OK 🚫 0 Errors
[INFO ] Saving cookie jar

If you want to test it, you have to compile lychee from this branch until the changes get released.

@mre mre closed this as completed Jul 8, 2023
@mre
Copy link
Member

mre commented Jul 8, 2023

As a side note, cookies are also why it works in linkchecker.

mre added a commit that referenced this issue Jul 13, 2023
This is a very conservative and limited implementation of cookie support.

The goal is to ship an MVP, which covers 80% of the use-cases.
When you run lychee with --cookie-jar cookies.json, all cookies will be stored in cookies.json, one cookie per line.
This makes cookies easy to edit by hand if needed, although this is an advanced use-case and the API for the format is not guaranteed to be stable.

Fixes: #645, #715
Partially fixes: #1108
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants