Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lychee shows network error: forbidden for valid links #733

Closed
Rizwan-Hasan opened this issue Aug 14, 2022 · 41 comments
Closed

lychee shows network error: forbidden for valid links #733

Rizwan-Hasan opened this issue Aug 14, 2022 · 41 comments

Comments

@Rizwan-Hasan
Copy link

Rizwan-Hasan commented Aug 14, 2022

Suddenly lychee shows this error for valid links. But these links are valid and also accessible from the browser.

❯ lychee --max-concurrency 1 --no-progress --verbose "work/ok.txt"
✗ [403] https://catboost.ai/ | Network error: Forbidden
✗ [403] https://catboost.ai/en/docs/concepts/python-reference_datasets_msrank | Network error: Forbidden

Issues found in 1 input. Find details below.

[work/ok.txt]:
✗ [403] https://catboost.ai/ | Network error: Forbidden
✗ [403] https://catboost.ai/en/docs/concepts/python-reference_datasets_msrank | Network error: Forbidden

🔍 2 Total ✅ 0 OK 🚫 2 Errors (HTTP:2)

Contents of work/ok.txt

https://catboost.ai/
https://catboost.ai/en/docs/concepts/python-reference_datasets_msrank

Lychee version

❯ lychee --version
lychee 0.10.1
@mre
Copy link
Member

mre commented Aug 14, 2022

Hum, looks like it's an issue on your end. 🤔 At least it works over here:

❯❯❯ lychee --max-concurrency 1 --no-progress --verbose "work/ok.txt"
✔ [200] https://catboost.ai/
✔ [200] https://catboost.ai/en/docs/concepts/python-reference_datasets_msrank

🔍 2 Total ✅ 2 OK 🚫 0 Errors

Can you try from a different network? Or maybe reconnect to your wifi? Maybe it also was just a temporary hickup?

@Rizwan-Hasan
Copy link
Author

Rizwan-Hasan commented Aug 14, 2022

Hi @mre ,

It originally happened on github action. Same happened after I tried it on my local pc. Ran it several time on github action too and still the same.
Here's the recent github action log https://github.com/Rizwan-Hasan/clearml-docs/runs/7825833483
And this is 3 days earlier https://github.com/Rizwan-Hasan/clearml-docs/runs/7787255012

We need it to work on GitHub Action.

@mre
Copy link
Member

mre commented Aug 14, 2022

Oh I see. No clue what's going on there yet.

@Rizwan-Hasan
Copy link
Author

Any idea how to make it work?

@Rizwan-Hasan
Copy link
Author

Rizwan-Hasan commented Aug 14, 2022

I just tried it on another cloud server and same error.

@mre
Copy link
Member

mre commented Aug 14, 2022

What's strange is that it works for me. So it must be something that's wrong with the client-side.
Some thoughts:

  • Did you try a different user-agent?
  • Has any configuration been changed for catboost.ai recently? A firewall rule or a server config for example?
  • If the catboost.ai docs aren't self-hosted, maybe your hosting provider changed a setting.
  • Can you check the catboost.ai logs if you have access?
  • When was the last time it worked? Was it a permanent error from then on or did it occasionally work for a while?

I'm afraid there ain't much I can do on my end other than asking questions. 😅

@Rizwan-Hasan
Copy link
Author

Rizwan-Hasan commented Aug 14, 2022

On your thoughts,

  • Tried with a different user-agent (used the same user-agent my browser has) and the error continues.
  • We don't have any access to catboost.ai as it's not ours. So, can't tell if anything has changed there.
  • I've just checked one GitHub Action log of 2 months ago and the error on these two links was present there.

@Rizwan-Hasan
Copy link
Author

I've created a new git repository to make a mock test. And the result is negative.
Repository: https://github.com/Rizwan-Hasan/test-lychee-links
Logs: https://github.com/Rizwan-Hasan/test-lychee-links/runs/7826437363

@Rizwan-Hasan
Copy link
Author

Hi @mre ,

Any update?

@mre
Copy link
Member

mre commented Aug 16, 2022

Nope, no update. I don't think I can help much here. 🙁 It works over here, so it's definitely a problem with the server preventing certain clients from getting access. It's not something we can fix in lychee.

@Rizwan-Hasan
Copy link
Author

Rizwan-Hasan commented Aug 16, 2022

Understood. But it's not working on my local PC and Linode server also. I'm confused. So, three places in a row it's not working.

@mre
Copy link
Member

mre commented Aug 16, 2022

Yeah that's definitely weird. If anyone can help out narrow this down, please run the following command on your machine and report back the result.

echo 'https://catboost.ai' | lychee -

@Rizwan-Hasan
Copy link
Author

Rizwan-Hasan commented Aug 16, 2022

With lychee it's not working

❯ echo 'https://catboost.ai' | lychee -
Issues found in 1 input. Find details below.

[stdin]:
✗ [403] https://catboost.ai/ | Network error: Forbidden

🔍 1 Total ✅ 0 OK 🚫 1 Error (HTTP:1)

On the other hand working with curl

❯ curl -I 'https://catboost.ai'        
HTTP/1.1 200 OK
Content-Length: 79368
Content-Security-Policy: default-src 'none'; script-src 'unsafe-eval' 'unsafe-inline' 'nonce-EzZqyXy1OG/i2ADhnTKeSw==' mc.yandex.ru social.yandex.ru yastatic.net; style-src 'unsafe-inline' mc.yandex.ru yastatic.net; img-src 'self' data: avatars.yandex.net avatars.mds.yandex.net avatars.mdst.yandex.net mc.yandex.ru ext.captcha.yandex.net yastatic.net; connect-src 'self' mc.yandex.ru; frame-src www.youtube.com video.yandex.ru player.video.yandex.net; media-src ext.captcha.yandex.net; font-src yastatic.net; report-uri https://csp.yandex.net/csp?from=promo-catboost-2017&yandex_login=undefined&yandexuid=undefined;
Content-Type: text/html; charset=utf-8
Date: Tue, 16 Aug 2022 09:35:03 GMT
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-XSS-Protection: 1; mode=block

@mre
Copy link
Member

mre commented Aug 16, 2022

Does this work for you?

 echo 'https://catboost.ai' | lychee --user-agent 'curl/7.79.1' -

@Rizwan-Hasan
Copy link
Author

Not working

❯  echo 'https://catboost.ai' | lychee --user-agent 'curl/7.79.1' -
Issues found in 1 input. Find details below.

[stdin]:
✗ [403] https://catboost.ai/ | Network error: Forbidden

🔍 1 Total ✅ 0 OK 🚫 1 Error (HTTP:1)

@mre
Copy link
Member

mre commented Aug 16, 2022

No clue, really. The last thing that comes to mind is

echo 'https://catboost.ai' | lychee --user-agent 'curl/7.79.1' --headers 'Accept=*/*' --

That's the only thing I can see when I inspect the curl command.
You can compare which headers curl is sending in your case with curl -v https://catboost.ai.

@Rizwan-Hasan
Copy link
Author

Rizwan-Hasan commented Aug 16, 2022

❯ echo 'https://catboost.ai' | lychee --headers 'Accept=*/*' --user-agent 'curl/7.79.1' -
Issues found in 1 input. Find details below.

[stdin]:
✗ [403] https://catboost.ai/ | Network error: Forbidden

🔍 1 Total ✅ 0 OK 🚫 1 Error (HTTP:1)

With my curl version

❯ echo 'https://catboost.ai' | lychee --headers 'Accept=*/*' --user-agent 'curl/7.84.0' - 
Issues found in 1 input. Find details below.

[stdin]:
✗ [403] https://catboost.ai/ | Network error: Forbidden

🔍 1 Total ✅ 0 OK 🚫 1 Error (HTTP:1)
❯ curl --version
curl 7.84.0 (x86_64-pc-linux-gnu) libcurl/7.84.0 OpenSSL/1.1.1q zlib/1.2.12 brotli/1.0.9 zstd/1.5.2 libidn2/2.3.3 libpsl/0.21.1 (+libidn2/2.3.0) libssh2/1.10.0 nghttp2/1.48.0
Release-Date: 2022-06-27
Protocols: dict file ftp ftps gopher gophers http https imap imaps mqtt pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp 
Features: alt-svc AsynchDNS brotli GSS-API HSTS HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz NTLM NTLM_WB PSL SPNEGO SSL threadsafe TLS-SRP UnixSockets zstd

@mre
Copy link
Member

mre commented Aug 16, 2022

@lebensterben could you run a test on your side if you find the time?

@lebensterben
Copy link
Member

@mre

𝛌> echo 'https://catboost.ai' | ./target/debug/lychee --user-agent 'curl/7.79.1' --headers 'Accept=*/*' -
Issues found in 1 input. Find details below.

[stdin]:
✗ [403] https://catboost.ai/ | Failed: Network error: Forbidden

@lebensterben
Copy link
Member

This issue is similar to #715

We need to build a minimal curl-like client with reqwest to check whether the problem is in reqwest or lychee.

@mre
Copy link
Member

mre commented Aug 16, 2022

The strange thing is that it works for me with lychee. 🤔

I sort of hacked together a curl/reqwest client here: https://github.com/lycheeverse/geturl-test
Can you folks test again?

# reqwest (the default)
~/C/s/r/geturl ❯❯❯ cargo run -- 'https://catboost.ai'
    Finished dev [unoptimized + debuginfo] target(s) in 0.04s
     Running `target/debug/geturl 'https://catboost.ai'`
200 OK

# libcurl
~/C/s/r/geturl ❯❯❯ cargo run --features curl -- 'https://catboost.ai'
    Finished dev [unoptimized + debuginfo] target(s) in 0.05s
     Running `target/debug/geturl 'https://catboost.ai'`
200 OK
~/C/s/r/geturl ❯❯❯ 

@lebensterben
Copy link
Member

𝛌> cargo run -- 'https://catboost.ai'
    Finished dev [unoptimized + debuginfo] target(s) in 0.04s
     Running `target/debug/geturl 'https://catboost.ai'`
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: unexpected HTTP response code 403 Forbidden for URL https://catboost.ai', src/bin/geturl.rs:9:59
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@lebensterben
Copy link
Member

@mre
Copy link
Member

mre commented Oct 23, 2022

I've added support for it to getcurl-test and it indeed works:

> cargo run --no-default-features --features curl -- 'https://catboost.ai'
get_url: https://catboost.ai
response: Response { url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("catboost.ai")), port: None, path: "/", query: None, fragment: None }, status: 200, headers: {"content-length": "79366", "content-security-policy": "default-src 'none'; script-src 'unsafe-eval' 'unsafe-inline' 'nonce-06bmt7YDganaJlR0CEne6Q==' mc.yandex.ru social.yandex.ru yastatic.net; style-src 'unsafe-inline' mc.yandex.ru yastatic.net; img-src 'self' data: avatars.yandex.net avatars.mds.yandex.net avatars.mdst.yandex.net mc.yandex.ru ext.captcha.yandex.net yastatic.net; connect-src 'self' mc.yandex.ru; frame-src www.youtube.com video.yandex.ru player.video.yandex.net; media-src ext.captcha.yandex.net; font-src yastatic.net; report-uri https://csp.yandex.net/csp?from=promo-catboost-2017&yandex_login=undefined&yandexuid=undefined;", "content-type": "text/html; charset=utf-8", "date": "Sun, 23 Oct 2022 23:21:38 GMT", "x-content-type-options": "nosniff", "x-frame-options": "DENY", "x-xss-protection": "1; mode=block"} }
status: 200 OK

Tested locally and inside a Github codespace. Can you both test it on your machines as well?
Just clone the project and run the command above.

If it works I really don't know if we should add reqwest-impersonate to the project. Might be a maintenance issue down the road as it could diverge from reqwest and is maintained by a single (yet awesome) person.

@lebensterben
Copy link
Member

I won't be able to test it for a while because I get covid (again).

@mre
Copy link
Member

mre commented Oct 23, 2022

Get well soon. 🤗

@mre
Copy link
Member

mre commented Oct 24, 2022

@Rizwan-Hasan in case you find the time, maybe you can test this: #733 (comment)

@Rizwan-Hasan
Copy link
Author

Rizwan-Hasan commented Oct 28, 2022

@mre Hi, of course.

I pulled a docker image docker pull rust, pulled the repository geturl-test then ran as the readme guided. Here're the log

root@22c7cf489de3:/geturl-test# cargo run -- 'https://catboost.ai'
    Updating git repository `https://github.com/4JX/h2.git`
    Updating git repository `https://github.com/4JX/hyper.git`
    Updating crates.io index
    Updating git repository `https://github.com/4JX/reqwest-impersonate.git`
    Updating git repository `https://github.com/4JX/boring`
    Updating git submodule `https://github.com/google/boringssl.git`
    Updating git submodule `https://github.com/google/boringssl.git`
  Downloaded bitflags v1.3.2
  Downloaded memchr v2.5.0
  Downloaded quote v1.0.21
  Downloaded mio v0.8.4
  Downloaded serde_urlencoded v0.7.1
  Downloaded reqwest v0.11.12
  Downloaded syn v1.0.103
  Downloaded unicode-ident v1.0.5
  Downloaded ipnet v2.5.0
  Downloaded hyper-tls v0.5.0
  Downloaded httparse v1.8.0
  Downloaded http-body v0.4.5
  Downloaded foreign-types-shared v0.1.1
  Downloaded foreign-types v0.3.2
  Downloaded tower-service v0.3.2
  Downloaded tokio-util v0.7.4
  Downloaded openssl v0.10.42
  Downloaded tokio-native-tls v0.3.0
  Downloaded tracing v0.1.37
  Downloaded fnv v1.0.7
  Downloaded num_cpus v1.13.1
  Downloaded pin-utils v0.1.0
  Downloaded native-tls v0.2.10
  Downloaded tracing-core v0.1.30
  Downloaded want v0.3.0
  Downloaded try-lock v0.2.3
  Downloaded bytes v1.2.1
  Downloaded log v0.4.17
  Downloaded pin-project-lite v0.2.9
  Downloaded indexmap v1.9.1
  Downloaded once_cell v1.15.0
  Downloaded url v2.3.1
  Downloaded slab v0.4.7
  Downloaded form_urlencoded v1.1.0
  Downloaded ryu v1.0.11
  Downloaded httpdate v1.0.2
  Downloaded itoa v1.0.4
  Downloaded unicode-bidi v0.3.8
  Downloaded mime v0.3.16
  Downloaded percent-encoding v2.2.0
  Downloaded tinyvec_macros v0.1.0
  Downloaded serde v1.0.147
  Downloaded http v0.2.8
  Downloaded tinyvec v1.6.0
  Downloaded proc-macro2 v1.0.47
  Downloaded openssl-macros v0.1.0
  Downloaded anyhow v1.0.66
  Downloaded base64 v0.13.1
  Downloaded autocfg v1.1.0
  Downloaded cfg-if v1.0.0
  Downloaded cc v1.0.73
  Downloaded futures-core v0.3.25
  Downloaded idna v0.3.0
  Downloaded futures-task v0.3.25
  Downloaded futures-sink v0.3.25
  Downloaded futures-io v0.3.25
  Downloaded pkg-config v0.3.25
  Downloaded openssl-probe v0.1.5
  Downloaded futures-channel v0.3.25
  Downloaded futures-util v0.3.25
  Downloaded socket2 v0.4.7
  Downloaded openssl-sys v0.9.77
  Downloaded hashbrown v0.12.3
  Downloaded unicode-normalization v0.1.22
  Downloaded tokio v1.21.2
  Downloaded libc v0.2.135
  Downloaded encoding_rs v0.8.31
  Downloaded 67 crates (5.6 MB) in 2.05s (largest was `encoding_rs` at 1.4 MB)
   Compiling autocfg v1.1.0
   Compiling libc v0.2.135
   Compiling cfg-if v1.0.0
   Compiling log v0.4.17
   Compiling proc-macro2 v1.0.47
   Compiling pin-project-lite v0.2.9
   Compiling cc v1.0.73
   Compiling once_cell v1.15.0
   Compiling memchr v2.5.0
   Compiling pkg-config v0.3.25
   Compiling quote v1.0.21
   Compiling futures-core v0.3.25
   Compiling bytes v1.2.1
   Compiling unicode-ident v1.0.5
   Compiling syn v1.0.103
   Compiling futures-task v0.3.25
   Compiling itoa v1.0.4
   Compiling openssl v0.10.42
   Compiling fnv v1.0.7
   Compiling foreign-types-shared v0.1.1
   Compiling futures-util v0.3.25
   Compiling futures-io v0.3.25
   Compiling native-tls v0.2.10
   Compiling pin-utils v0.1.0
   Compiling tinyvec_macros v0.1.0
   Compiling httparse v1.8.0
   Compiling futures-channel v0.3.25
   Compiling hashbrown v0.12.3
   Compiling bitflags v1.3.2
   Compiling futures-sink v0.3.25
   Compiling percent-encoding v2.2.0
   Compiling openssl-probe v0.1.5
   Compiling try-lock v0.2.3
   Compiling serde v1.0.147
   Compiling encoding_rs v0.8.31
   Compiling httpdate v1.0.2
   Compiling tower-service v0.3.2
   Compiling unicode-bidi v0.3.8
   Compiling ryu v1.0.11
   Compiling anyhow v1.0.66
   Compiling mime v0.3.16
   Compiling ipnet v2.5.0
   Compiling base64 v0.13.1
   Compiling tracing-core v0.1.30
   Compiling tokio v1.21.2
   Compiling slab v0.4.7
   Compiling indexmap v1.9.1
   Compiling http v0.2.8
   Compiling foreign-types v0.3.2
   Compiling tinyvec v1.6.0
   Compiling openssl-sys v0.9.77
   Compiling form_urlencoded v1.1.0
   Compiling tracing v0.1.37
   Compiling want v0.3.0
   Compiling http-body v0.4.5
   Compiling unicode-normalization v0.1.22
   Compiling num_cpus v1.13.1
   Compiling mio v0.8.4
   Compiling socket2 v0.4.7
   Compiling idna v0.3.0
   Compiling url v2.3.1
   Compiling serde_urlencoded v0.7.1
   Compiling tokio-util v0.7.4
   Compiling h2 v0.3.14 (https://github.com/4JX/h2.git?branch=imp#90af7b9d)
   Compiling openssl-macros v0.1.0
   Compiling hyper v0.14.18 (https://github.com/4JX/hyper.git?branch=v0.14.18-patched#abf28d87)
   Compiling tokio-native-tls v0.3.0
   Compiling hyper-tls v0.5.0
   Compiling reqwest v0.11.12
   Compiling tectonic_geturl v0.0.0-dev.0 (/geturl-test)
warning: variable does not need to be mutable
 --> src/bin/geturl.rs:9:13
  |
9 |         let mut response = backend.get_url(&url, &mut status).unwrap();
  |             ----^^^^^^^^
  |             |
  |             help: remove this `mut`
  |
  = note: `#[warn(unused_mut)]` on by default

warning: `tectonic_geturl` (bin "geturl") generated 1 warning
    Finished dev [unoptimized + debuginfo] target(s) in 2m 59s
     Running `target/debug/geturl 'https://catboost.ai'`
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: unexpected HTTP response code 403 Forbidden for URL https://catboost.ai', src/bin/geturl.rs:9:63
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
root@22c7cf489de3:/geturl-test# 
root@22c7cf489de3:/geturl-test# cargo run --no-default-features --features curl -- 'https://catboost.ai'
  Downloaded curl v0.4.44
  Downloaded libz-sys v1.1.8
  Downloaded curl-sys v0.4.56+curl-7.83.1
  Downloaded 3 crates (5.5 MB) in 2.91s (largest was `curl-sys` at 3.0 MB)
   Compiling curl v0.4.44
   Compiling libz-sys v1.1.8
   Compiling curl-sys v0.4.56+curl-7.83.1
   Compiling socket2 v0.4.7
   Compiling tectonic_geturl v0.0.0-dev.0 (/geturl-test)
error[E0599]: no method named `status_code` found for struct `std::io::Cursor` in the current scope
  --> src/bin/geturl.rs:21:40
   |
21 |             let status_code = response.status_code();
   |                                        ^^^^^^^^^^^ method not found in `std::io::Cursor<Vec<u8>>`

For more information about this error, try `rustc --explain E0599`.
error: could not compile `tectonic_geturl` due to previous error
root@22c7cf489de3:/geturl-test# 
root@22c7cf489de3:/geturl-test# cargo run --no-default-features --features reqwest_impersonate -- 'https://catboost.ai'
  Downloaded libloading v0.7.3
  Downloaded clang-sys v1.4.0
  Downloaded cexpr v0.6.0
  Downloaded cmake v0.1.48
  Downloaded brotli-decompressor v2.3.2
  Downloaded glob v0.3.0
  Downloaded foreign-types-shared v0.3.1
  Downloaded linked_hash_set v0.1.4
  Downloaded foreign-types-macros v0.2.2
  Downloaded crc32fast v1.3.2
  Downloaded flate2 v1.0.24
  Downloaded linked-hash-map v0.5.6
  Downloaded alloc-stdlib v0.2.2
  Downloaded adler v1.0.2
  Downloaded async-compression v0.3.15
  Downloaded nom v7.1.1
  Downloaded antidote v1.0.0
  Downloaded lazycell v1.3.0
  Downloaded regex-syntax v0.6.27
  Downloaded tower-layer v0.3.2
  Downloaded peeking_take_while v0.1.2
  Downloaded alloc-no-stdlib v2.0.4
  Downloaded regex v1.6.0
  Downloaded minimal-lexical v0.2.1
  Downloaded rustc-hash v1.1.0
  Downloaded foreign-types v0.5.0
  Downloaded bindgen v0.60.1
  Downloaded lazy_static v1.4.0
  Downloaded shlex v1.1.0
  Downloaded miniz_oxide v0.5.4
  Downloaded brotli v3.3.4
  Downloaded 31 crates (3.0 MB) in 1.39s (largest was `brotli` at 1.4 MB)
   Compiling glob v0.3.0
   Compiling minimal-lexical v0.2.1
   Compiling bindgen v0.60.1
   Compiling regex-syntax v0.6.27
   Compiling lazy_static v1.4.0
   Compiling lazycell v1.3.0
   Compiling rustc-hash v1.1.0
   Compiling peeking_take_while v0.1.2
   Compiling shlex v1.1.0
   Compiling alloc-no-stdlib v2.0.4
   Compiling foreign-types-shared v0.3.1
   Compiling crc32fast v1.3.2
   Compiling adler v1.0.2
   Compiling linked-hash-map v0.5.6
   Compiling tower-layer v0.3.2
   Compiling antidote v1.0.0
   Compiling libloading v0.7.3
   Compiling cmake v0.1.48
   Compiling alloc-stdlib v0.2.2
   Compiling miniz_oxide v0.5.4
   Compiling linked_hash_set v0.1.4
   Compiling clang-sys v1.4.0
   Compiling brotli-decompressor v2.3.2
   Compiling nom v7.1.1
   Compiling foreign-types-macros v0.2.2
   Compiling tokio-util v0.7.4
   Compiling flate2 v1.0.24
   Compiling regex v1.6.0
   Compiling h2 v0.3.14 (https://github.com/4JX/h2.git?branch=imp#90af7b9d)
   Compiling brotli v3.3.4
   Compiling foreign-types v0.5.0
   Compiling cexpr v0.6.0
   Compiling hyper v0.14.18 (https://github.com/4JX/hyper.git?branch=v0.14.18-patched#abf28d87)
   Compiling async-compression v0.3.15
   Compiling boring-sys v2.0.0 (https://github.com/4JX/boring?rev=2a7463a#2a7463aa)
error: failed to run custom build command for `boring-sys v2.0.0 (https://github.com/4JX/boring?rev=2a7463a#2a7463aa)`

Caused by:
  process didn't exit successfully: `/geturl-test/target/debug/build/boring-sys-0708443bc13991d2/build-script-build` (exit status: 101)
  --- stdout
  cargo:rerun-if-env-changed=BORING_BSSL_PATH
  CMAKE_TOOLCHAIN_FILE_x86_64-unknown-linux-gnu = None
  CMAKE_TOOLCHAIN_FILE_x86_64_unknown_linux_gnu = None
  HOST_CMAKE_TOOLCHAIN_FILE = None
  CMAKE_TOOLCHAIN_FILE = None
  CMAKE_GENERATOR_x86_64-unknown-linux-gnu = None
  CMAKE_GENERATOR_x86_64_unknown_linux_gnu = None
  HOST_CMAKE_GENERATOR = None
  CMAKE_GENERATOR = None
  CMAKE_PREFIX_PATH_x86_64-unknown-linux-gnu = None
  CMAKE_PREFIX_PATH_x86_64_unknown_linux_gnu = None
  HOST_CMAKE_PREFIX_PATH = None
  CMAKE_PREFIX_PATH = None
  CMAKE_x86_64-unknown-linux-gnu = None
  CMAKE_x86_64_unknown_linux_gnu = None
  HOST_CMAKE = None
  CMAKE = None
  running: "cmake" "/usr/local/cargo/git/checkouts/boring-b574da2d10d0f762/2a7463a/boring-sys/deps/boringssl" "-DCMAKE_INSTALL_PREFIX=/geturl-test/target/debug/build/boring-sys-6cc3a56ed015afe6/out" "-DCMAKE_C_FLAGS= -ffunction-sections -fdata-sections -fPIC -m64" "-DCMAKE_C_COMPILER=/usr/bin/cc" "-DCMAKE_CXX_FLAGS= -ffunction-sections -fdata-sections -fPIC -m64" "-DCMAKE_CXX_COMPILER=/usr/bin/c++" "-DCMAKE_ASM_FLAGS= -ffunction-sections -fdata-sections -fPIC -m64" "-DCMAKE_ASM_COMPILER=/usr/bin/cc" "-DCMAKE_BUILD_TYPE=Debug"

  --- stderr
  thread 'main' panicked at '
  failed to execute command: No such file or directory (os error 2)
  is `cmake` not installed?

  build script failed, must exit now', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/cmake-0.1.48/src/lib.rs:975:5
  note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
root@22c7cf489de3:/geturl-test# 

@mre
Copy link
Member

mre commented Oct 28, 2022

Okay thanks. The second one should not have failed. It's an error on my end. However I do expect it to fail just like the first test with request.
At least the results have always been consistent between them on my end.

For the last one, which is the most promising one. You need to install boringssl for that first.

@mre
Copy link
Member

mre commented Oct 28, 2022

...and for boringssl you need to install cmake first. https://cmake.org/install/

@Rizwan-Hasan
Copy link
Author

I had to install cmake and clang both to work. Here's the log

root@e4e91d5d237d:/geturl-test# cargo run --no-default-features --features reqwest_impersonate -- 'https://catboost.ai'
warning: unused variable: `redirect_policy`
  --> src/reqwest_impersonate.rs:59:13
   |
59 |         let redirect_policy = Policy::custom(move |attempt| {
   |             ^^^^^^^^^^^^^^^ help: if this is intentional, prefix it with an underscore: `_redirect_policy`
   |
   = note: `#[warn(unused_variables)]` on by default

warning: `tectonic_geturl` (lib) generated 1 warning
warning: variable does not need to be mutable
 --> src/bin/geturl.rs:9:13
  |
9 |         let mut response = backend.get_url(&url, &mut status).unwrap();
  |             ----^^^^^^^^
  |             |
  |             help: remove this `mut`
  |
  = note: `#[warn(unused_mut)]` on by default

warning: `tectonic_geturl` (bin "geturl") generated 1 warning
    Finished dev [unoptimized + debuginfo] target(s) in 0.08s
     Running `target/debug/geturl 'https://catboost.ai'`
get_url: https://catboost.ai
response: Response { url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("catboost.ai")), port: None, path: "/", query: None, fragment: None }, status: 200, headers: {"content-length": "79367", "content-security-policy": "default-src 'none'; script-src 'unsafe-eval' 'unsafe-inline' 'nonce-/lIFeLdebbwR139V24X0Xg==' mc.yandex.ru social.yandex.ru yastatic.net; style-src 'unsafe-inline' mc.yandex.ru yastatic.net; img-src 'self' data: avatars.yandex.net avatars.mds.yandex.net avatars.mdst.yandex.net mc.yandex.ru ext.captcha.yandex.net yastatic.net; connect-src 'self' mc.yandex.ru; frame-src www.youtube.com video.yandex.ru player.video.yandex.net; media-src ext.captcha.yandex.net; font-src yastatic.net; report-uri https://csp.yandex.net/csp?from=promo-catboost-2017&yandex_login=undefined&yandexuid=undefined;", "content-type": "text/html; charset=utf-8", "date": "Fri, 28 Oct 2022 19:51:08 GMT", "x-content-type-options": "nosniff", "x-frame-options": "DENY", "x-xss-protection": "1; mode=block"} }
status: 200 OK
root@e4e91d5d237d:/geturl-test# 

@mre
Copy link
Member

mre commented Oct 28, 2022

Dang. It works. 😞

That means if we integrate that backend into lychee it would solve your issue.
Two questions (@lebensterben)

  • do we want to do this?
  • why did it work on my machine before (with the normal backend)?

@lebensterben
Copy link
Member

@mre

I don't know the answer for the second question.

For the first one, I suggest to test it on other related issue where browsers are able to open a URL but curl and lychee are not.

  • If those similar issues can be fixed also, then definitely it worth integrating it despite the cost of additional dependency.

@mre
Copy link
Member

mre commented Nov 5, 2022

Good idea. Did some tests but haven't found other use-cases yet where this fixes the problem. I'll keep this on hold until then.

@mre
Copy link
Member

mre commented Nov 11, 2022

Similar case that got resolved with browser impersonation:
#819

Thinking about adding reqwest-impersonate as a fallback to handle such cases.

@Rizwan-Hasan
Copy link
Author

Rizwan-Hasan commented Jan 11, 2023

Hi @mre,

Is there any update regarding the issue❔
We're still facing it. 😥

@mre
Copy link
Member

mre commented Jan 11, 2023

No updates, but if I find the time I will create a pull request to integrate reqwest impersonate as a fallback backend. It will be an optional library feature, but it will be enabled by default in the binary.
It's a great match because I want to refactor the client code anyway soon.
Thanks for the reminder.

@Rizwan-Hasan
Copy link
Author

Rizwan-Hasan commented Jan 11, 2023

That's great. Looking forward to the release then. 🙂

@mre
Copy link
Member

mre commented Mar 3, 2023

Bad news. I wanted to integrate this, but I don't think it's possible right now.
reqwest-impersonate patches some dependencies (e.g. hyper) and therefore cannot be published on crates.io. If we integrate it into lychee, that means we couldn't publish the library on crates.io either even if we put reqwest-impersonate behind a feature flag, which is disabled by default.
See rust-lang/cargo#6738.
Is there a possibility that I don't see right now?

@mre
Copy link
Member

mre commented Mar 16, 2023

As much as I would like this to be part of lychee, I don't think there is an easy solution right now.
Unless the upstream crate can get published on crates.io we don't stand a chance at integrating it.
I've created an upstream issue at 4JX/reqwest-impersonate#2.

@mre
Copy link
Member

mre commented Apr 18, 2023

Seems like there isn't much upstream traction, and it's not something we can fix on our side, so I'm gonna go ahead and close this. If the upstream issue gets fixed, we can reopen and integrate reqwest-impersonate.
Apologies if this is not the outcome y'all were hoping for, but I think we need to find another way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants