Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to lychee for link checking #291

Merged
merged 29 commits into from
Jun 20, 2021
Merged

Migrate to lychee for link checking #291

merged 29 commits into from
Jun 20, 2021

Conversation

MichaIng
Copy link
Owner

@MichaIng MichaIng commented Dec 19, 2020

as liche has been deprecated by its developer. But there are a few issues that need to be addressed first:

as liche has been deprecated by its developer
@MichaIng MichaIng added the meta Topics that do not deal with the actual docs content, e.g. GitHub actions label Dec 19, 2020
since Rust applications are by default placed into a custom directory outside of PATH.
since those required an external base URL currently, local file existence instead is not possible. But we do not want to burst our server with hundreds of concurrent requests for now. Let's see of this feature is implemented soon, until then we stay with still functional liche.
as building it takes too long

Also fix URL glob, as "build/docs/**.html" does not match html files in sub directories, it seems.
as the previous Rust setup has been removed
@ravenclaw900
Copy link
Collaborator

Thoughts on using continue-on-error: true instead of if: always() && steps.etc.outcome == 'success'?

@MichaIng
Copy link
Owner Author

MichaIng commented Feb 3, 2021

Good to know about that option. However, often one step is required for the following step but not for the step afterwards, e.g. checking spelling can only work if pyspelling got successfully build, but link checking can still be done. I think this can currently only be correctly handled the way it is done now.

@ravenclaw900
Copy link
Collaborator

Also, (more related to this PR), why not use the actual lychee action? https://github.com/lycheeverse/lychee-action

@MichaIng
Copy link
Owner Author

MichaIng commented Feb 4, 2021

The question is why should we? 🙂
It adds more complexity outside of our control and implies some overhead to download and start the Docker container. I generally like to use the actual tools directly to have full control over the environment, version and command options.

Those GitHub action containers are generally nice to get started fast and easy without doing things like manual builds (or download and extract in case of lychee), but finally it does nothing more than executing the same tool, so from that state on has no benefit.

@MichaIng
Copy link
Owner Author

MichaIng commented Jun 2, 2021

Since currently local file checks seems to be no prioritised feature in lychee, an alternative is to use mkdocs serve to run a local webserver. Then for local links, we can set a base URL to http://127.0.0.1/. I'm a bit annoyed by the constantly failing tests because of liche timeouts and hope this gets better with lychee.

... since it does not support local file checking for internal links yet: lycheeverse/lychee#21
@MichaIng MichaIng marked this pull request as ready for review June 9, 2021 18:17
The base URL is not suffixed with the file path, hence it is wrong as fast as sub directories are processed. It is hence required to loop through all directories and check those individually with the directory as base path each.

Remove verbose flag. It is ignored when STDOUT and STDERR are redirected.

Do not serve redirects, so we can find outdated links.
as otherwise there are too many excludes required. lychee GET requests are quite efficient, so that is not an issue.

Add further excludes, required mainly because lychee checks also URLs in code tags :(.
@MichaIng
Copy link
Owner Author

It required some more changes and more complex integration, but now it works fine. It's worth it to reduce checks process time and timeout errors that liche suffered from. Positive side-effects are:

  • that "mkdocs serve" does not do automatic redirects from URLs without trailing / to with trailing /, so it that all URLs are given a way that no redirect is required (slightly reduced overhead, for checks and also for search engine crawlers etc)
  • that now outdated URLs which only worked because of our migration redirects are failing the test, so we see and can update them

Negative is that URLs in code blocks are checked as well, which required some additional exclusions when we used things like http://192.168.0.100:8123. But it forces us to consequently use http://<your.IP>:8123 as a standard, which is good. There are a few cases left where we use localhost or loopback IP correctly to explain how to access InfluxDB via CLI or a Syncthing instance running on a Windows machine from the browser. Excludes based on tags is a feature that would be great for lychee. liche didn't have that issue since it only checked URLs in <a tags and href= attributes of <link tags, or so, not every URL found anywhere in the document.

Since we need to check every directory with a separate call, we have many summary prints in the checks output. Not awesome, but acceptable.
One failing check yesterday is solved now: https://detechter.com/the-battle-of-the-web-servers-apache-vs-Nginx-vs-lighttpd-2/
The last one is for https://www.audiophonics.fr, where OpenSSL reports "Verification error: unable to verify the first certificate", while my browsers don't see any issue and the certificate + chain looks all correct. I'll exclude it for now so that we get green checks, finally 😄.

@MichaIng MichaIng added the correction Content, spelling or syntax corrections label Jun 10, 2021
OpenSSL sees a problem with their certificate, browsers on Windows not. For now simply exclude it.
also to force another checks run, as I want to have it green :).
@MichaIng
Copy link
Owner Author

It's not as fast as on the main website, I guess mostly since we need to do a single call for each HTML to have internal links checked correctly. This also means that same URLs are checked doubled. And it's about 3000 overall links, so MUCH more 😄.

Ready from my end.

@MichaIng MichaIng merged commit 9576054 into dev Jun 20, 2021
@MichaIng MichaIng deleted the dev-MichaIng-lychee branch June 20, 2021 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
correction Content, spelling or syntax corrections meta Topics that do not deal with the actual docs content, e.g. GitHub actions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants