Skip to content

Caching mdbook links? #66355

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mark-i-m opened this issue Nov 13, 2019 · 8 comments
Closed

Caching mdbook links? #66355

mark-i-m opened this issue Nov 13, 2019 · 8 comments
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.

Comments

@mark-i-m
Copy link
Member

if it is possible to take advantage of mdbook-linkcheck caching now?

Azure has a beta preview of caching, but the rust repo does not use it. Typically s3 buckets are used, but I do not know anything about them or whether it would be appropriate to use.

Originally posted by @ehuss in #66338 (comment)

Splitting this off from the other thread so as not to clutter it.

@pietroalbini Is something like the above possible? Basically, we just want to cache the src/doc/rustc-guide/book/linkcheck directory between builds. This directory is generated by the linkchecker. Doing this would significantly reduce timeouts and other networking issues when linkchecking... We already use this functionality on the rustc-guide repo with Travis caching...

@csmoe csmoe added the T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. label Nov 13, 2019
@jonas-schievink jonas-schievink added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Nov 13, 2019
@pietroalbini
Copy link
Member

Wouldn't this defeat the purpose of linkcheck?

@mark-i-m
Copy link
Member Author

Hmm that's a valid point. The main reason for putting linked in this CI was to find the PRs that break links in the first place.

If the timeout of cached links is set to ~12 hours, and a build takes 4 hours, then we would reduce the chance of spurious failure by about 2/3, but would know within roughly 3 PRs when the link broke.

@mark-i-m
Copy link
Member Author

cc @rust-lang/wg-learning

@spastorino
Copy link
Member

I think caching and maybe having to figure out over a collection of PRs is fine. Shouldn't be that hard.

@Michael-F-Bryan
Copy link

I think caching and maybe having to figure out over a collection of PRs is fine. Shouldn't be that hard.

90% of the time you should be able to do a git blame on the offending line and that'll tell you when it was broken.

I think the biggest benefit of caching results is to avoid GitHub's rate limiting. GitHub seems to limit you to, say, a hundred GET requests per minute, whereas the rustc-dev-guide might have several hundred links to GitHub pages. That means you often need to run mdbook-linkcheck to mark+cache the first hundred links as valid and get 429's for the rest, then run it again to mark the next hundred as valid, and so on.

@mark-i-m
Copy link
Member Author

@spastorino @JohnTitor Is there a reason to keep this issue open?

@spastorino
Copy link
Member

To be honest I'm not sure what's the current status of things about link checking and caching.

@mark-i-m
Copy link
Member Author

I don't either... I haven't really kept up with it. But given that nothing has happened here for a few years, it seems like this issue is not doing much good by staying open, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants