-
-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lychee don't panic, invalid URL's are fine. #539
Comments
Go for it! Just post questions here and we'll try to help. I like the bonus task, but don't stress it too much if it takes longer than expected. We can always to a follow-up PR. 😉 |
If the cause of panic is known (reqwest), we can use catch-unwind somewhere in the code. https://doc.rust-lang.org/stable/std/panic/fn.catch_unwind.html |
Oh that's a really good idea! There's probably no performance overhead except in case of panic, which is perfect for our use-case. @vipulgupta2048 I don't know how far you got already, but could you try wrapping the client call in a catch_unwind as @lebensterben mentioned? 🥺 |
I will give it a shot soon, haven't checked it out yet. |
Heya folks @mre @vipulgupta2048 - any luck with the changes? Looks like the panic is still a thing (and the PR is taking some time to be reviewed). Is there a temporary workaround one can apply? |
I don't mean to hold this up, I was expecting to get some time to write a patch for this but haven't been lucky enough. I will unassign this from me in order to unblock it for others to try out till then. |
@vipulgupta2048, no worries and thanks for looking into it. Originally I thought this was possible after the discussion with @lebensterben above: let result = panic::catch_unwind(|| self.reqwest_client.execute(request));
let fut = match result {
Ok(s) => s,
Err(_) => return Status::Error(ErrorKind::Panic),
};
match fut.await {
Ok(ref response) => Status::new(response, self.accepted.clone()),
Err(e) => e.into(),
} but I'm getting an error: --> lychee-lib/src/client.rs:482:22
|
482 | let result = panic::catch_unwind(|| self.reqwest_client.execute(request));
| ^^^^^^^^^^^^^^^^^^^ `UnsafeCell<tokio::util::linked_list::Pointers<tokio::time::driver::entry::TimerShared>>` may contain interior mutability and a reference may not be safely transferrable across a catch_unwind boundary
| So I searched around a bit and this was recommended: match self.reqwest_client.execute(request).catch_unwind().await? {
Ok(ref response) => Status::new(response, self.accepted.clone()),
Err(e) => e.into(),
} It polls the future in the background. error[E0277]: the type `UnsafeCell<tokio::util::linked_list::Pointers<tokio::time::driver::entry::TimerShared>>` may contain interior mutability and a reference may not be safely transferrable across a catch_unwind boundary
--> lychee-lib/src/client.rs:482:22
|
482 | let result = panic::catch_unwind(|| self.reqwest_client.execute(request));
| ^^^^^^^^^^^^^^^^^^^ `UnsafeCell<tokio::util::linked_list::Pointers<tokio::time::driver::entry::TimerShared>>` may contain interior mutability and a reference may not be safely transferrable across a catch_unwind boundary There is comment in the original thread:
(emphasis mine) No clue how to resolve this. :/ |
@dend, the only workaround we have at the moment is to exclude the panicking URLs. 😞 |
At this moment we need to wait for the upstream to merge the fix or use a patched dependency. |
ACK - thank you so much, @mre and @lebensterben. There are hundreds of URLs that panic for me (mostly ones from Archive.org that are of the form |
Well, the other alternative is to parse the URIs like reqwest would do and return an error before passing it to reqwest. That would be a waste of resources and be useless for most URLs, but at least it wouldn't panic. That was our plan in the beginning. |
I've created a PR to work around the panics. Would love to get some feedback. @dend, |
@mre here is an example that I saw pop up today: thread 'tokio-runtime-worker' panicked at 'a parsed Url should always be a valid Uri: InvalidUri(InvalidUriChar)', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/reqwest-0.11.8/src/into_url.rs:70:14
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at 'Cannot send request: SendError(Ok(Request { uri: Uri { url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("cdn.den.dev")), port: None, path: "/images/postmedia/intercepting-ios-traffic/trust-warning.png", query: None, fragment: None } }, source: FsPath("./dennisdel/content/blog/intercepting-iphone-traffic-mac-for-free.md"), element: Some("img"), attribute: Some("src") }))', lychee-bin/src/commands/check.rs:85:14 Or another one: thread 'tokio-runtime-worker' panicked at 'a parsed Url should always be a valid Uri: InvalidUri(InvalidUriChar)', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/reqwest-0.11.8/src/into_url.rs:70:14
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at 'Cannot send request: SendError(Ok(Request { uri: Uri { url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("docs.microsoft.com")), port: None, path: "/dotnet/api/system.diagnostics.debug.print", query: None, fragment: None } }, source: FsPath("./dennisdel/content/blog/using-the-google-weather-api-pros-and-cons-so-far.md"), element: Some("a"), attribute: Some("href") }))', lychee-bin/src/commands/check.rs:85:14 |
I don't understand why the URLs are invalid. |
Absolutely the same - they are resolving properly and work with no issue in the browser. |
Yeah they look fine. Maybe the issue is with a different link that was sent before. It panics and then Tokio closes the channel and after that it tries to send the legit links above but the channel is already closed. |
Here's an example of how this is panicking with an Archive.org URL today: thread 'tokio-runtime-worker' panicked at 'a parsed Url should always be a valid Uri: InvalidUri(InvalidUriChar)', /cargo/registry/src/github.com-1ecc6299db9ec823/reqwest-0.11.8/src/into_url.rs:70:14
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at 'Cannot send request: SendError(Ok(Request { uri: Uri { url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("web.archive.org")), port: None, path: "/web/20070312014327/https://developer.yahoo.com/weather/", query: None, fragment: None } }, source: FsPath("./dennisdel/content/blog/using-the-google-weather-api-pros-and-cons-so-far.md"), element: Some("a"), attribute: Some("href") }))', lychee-bin/src/commands/check.rs:85:14
+ exit_code=101 |
Weird, it works for me.
Can you try that? |
btw, this reminds us we should no longer ignore panic docs. tbh I don't even realize it could panic here. |
@lebensterben do you mean like this? #561 |
@mre |
BTW, it's entirely possible that my GitHub action is just not configured properly because I am trying to exclude a folder from being checked: - name: Load Excludes
run: |
LYCHEE_FILE_STACK=$(find . -not \( -path ./dennisdel/content/blog/drafts -prune \) -name '*.md' -print0 |xargs --null)
echo "LYCHEE_FILE_STACK=$LYCHEE_FILE_STACK" >> $GITHUB_ENV
- name: Link Checker
uses: lycheeverse/lychee-action@master
with:
args: --verbose --accept=200,429 --format markdown --exclude-file .lycheeignore --exclude-mail -- ${{ env.LYCHEE_FILE_STACK }}
env:
GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}} I don't think that's the root cause, but maybe it's part of the challenge? My exclude file: https://gist.github.com/dend/5da053ec400004776f24b6d2872f8475 |
I don't think that's the cause. The panics consistently results from making a invalid request. though I've no idea why they are invalid... |
@dend do you still encounter this issue with the latest |
Closing. Feel free to reopen if the problem still exists. |
With the upgrade to `reqwest` 0.12, we can finally handle a long-standing issue, when Urls could not be parsed to Uris. Previously, we would panic, but we can now handle that situation gracefully and return an error instead. I've also renamed `Status::is_failure` to `Status::is_error`, because the notion of failures no longer exists in the codebase and we use the term "error" consistently throughout the codebase instead. This is technically a breaking change in the API, but it's fine since we have not released a stable version yet. More information about the URI parsing issue: - #539 - seanmonstar/reqwest#668
With the upgrade to `reqwest` 0.12, we can finally handle a long-standing issue, when Urls could not be parsed to Uris. Previously, we would panic, but we can now handle that situation gracefully and return an error instead. I've also renamed `Status::is_failure` to `Status::is_error`, because the notion of failures no longer exists in the codebase and we use the term "error" consistently throughout the codebase instead. This is technically a breaking change in the API, but it's fine since we have not released a stable version yet. More information about the URI parsing issue: - #539 - seanmonstar/reqwest#668
From the pleasant discussion with @mre here lycheeverse/lychee-action#82, I intend to work on trying to parse the input URL before passing it to
reqwest
in order to prevent lychee from panicking, skip the invalid URL and move on to the next one.Bonus:
Looking for feedback and @vipulgupta2048 will work on this soon (The email will remind me to work on this)
The text was updated successfully, but these errors were encountered: