Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lots of broken links, invalid HTML and so on (checklink) #561

Closed
AlexDaniel opened this issue May 31, 2016 · 49 comments
Closed

Lots of broken links, invalid HTML and so on (checklink) #561

AlexDaniel opened this issue May 31, 2016 · 49 comments
Assignees
Labels
big Issue consisting of many subissues docs Documentation issue (primary issue type)

Comments

@AlexDaniel
Copy link
Member

AlexDaniel commented May 31, 2016

I've used checklink utility on doc.perl6.org. Here is the output. I believe that all of these issues have to be fixed.

Some of the errors are repeated many times, that's why the output is so large.

Current status

Number of lines in the output Date
126591 2016-05-31
128555 2016-06-03
132005 2016-06-09
128771 2016-06-11
126707 2016-06-12
130491 2016-06-16
134073 2016-06-23
131489 2016-06-25
134542 2016-07-06
35561 2016-07-13
35005 2016-08-01
33926 2016-08-02
20540 2016-08-03
17432 2016-08-04
17002 2016-08-05
8253 2016-08-06
8615 2016-08-10
7374 2016-11-12
8157 2016-12-20
208521 2017-06-30
207664 2017-07-01
211899 2017-11-04
213547 2018-01-05
184904 2018-03-03
55393 2018-05-05
54232 2018-05-06
57626 2018-11-04
39726 2020-07-26

checklink_graph

@zoffixznet
Copy link
Contributor

Are you sure you ran the program correctly? In the first item, http://doc.perl6.org/type/X::TypeCheck::Splice, the output says

x:/js/main.js   
  Line: 703
  Code: 501 Protocol scheme 'x' is not supported
 To do: Could not check this link: method not implemented or scheme not
    supported.

Yet, no scheme x: is in use on that line: <script type="text/javascript" src="/js/main.js"></script>

I checked a couple of other errors, and they similarly are false positives.

@zoffixznet
Copy link
Contributor

zoffixznet commented May 31, 2016

@dogbert17
Copy link

I’ll take a look at the 404’s, that should keep me occupied for a while. I suspect some of them will not be easily fixed, e.g. #155

From: Zoffix Znet [mailto:notifications@github.com]
Sent: den 31 maj 2016 17:52
To: perl6/doc
Subject: Re: [perl6/doc] Lots of broken links, invalid HTML and so on (checklink) (#561)

There's a bazillion markup errors though: https://validator.w3.org/nu/?doc=http%3A%2F%2Fdoc.perl6.org%2Ftype%2FX%3A%3ATypeCheck%3A%3ASplice


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub #561 (comment) , or mute the thread https://github.com/notifications/unsubscribe/ARklMk4v7pJ-1cu0eVQPU-YiAMupz_lqks5qHFktgaJpZM4IqGXO . https://github.com/notifications/beacon/ARklMlCJiQn1CkeROy2G3MMRyR8OAugMks5qHFktgaJpZM4IqGXO.gif

@AlexDaniel
Copy link
Member Author

AlexDaniel commented May 31, 2016

I checked a couple of other errors, and they similarly are false positives.

Saying that these are false positives is very optimistic. Sometimes the error message is LTA, other times the tool got confused because of markup errors. I'm not sure what's the issue in this particular case, but if you think that it is working incorrectly (that is, if it gets confused with correct HTML) then please file a bug report here.

@dogbert17
Copy link

I have fixed quite a few broken links. Unless I have missed some, which I probably have, the ones that remain point to the following, undocumented, entities:
NaN
HOW
mro

@AlexDaniel
Copy link
Member Author

I've added a graph so that we can track the progress. However, until someone fixes html problems we are not going to get any meaningful data there.

@zoffixznet
Copy link
Contributor

If it helps anyone, I tried to debug the double-links on headers and gotten only as far as the three %routines-by-type{*}.list in this section stuff links into headers and then when pod is generated the headers get wrapped into links by Pod::To::Html. Since complete content is included anyway, the %routines-by-type{*}.list needn't to have links in the content.

passes the torch

AlexDaniel added a commit that referenced this issue Jun 11, 2016
Wikipedia is now redirecting all http requests to https. This means that any
current link causes an unnecessary (permanent) redirect when the user clicks
on it.

The link checker also complains about it, so it is going to help with #561
a bit.
AlexDaniel added a commit that referenced this issue Jun 11, 2016
This is going to help with #561 a bit.

There are probably better ways to fix these links, but that's good enough.
@AlexDaniel
Copy link
Member Author

AlexDaniel commented Jun 12, 2016

Actually, I think that most problems are now either fixed or have a corresponding issue filed. We need #584 to be fixed.

@coke coke added the site label Jun 24, 2016
@AlexDaniel AlexDaniel added docs Documentation issue (primary issue type) big Issue consisting of many subissues labels Jun 24, 2016
@coke
Copy link
Collaborator

coke commented Dec 19, 2016

@AlexDaniel - looks like #584 might be fixed; can you re-run this scan?

@AlexDaniel
Copy link
Member Author

@coke Just did.

@Altai-man
Copy link
Member

Altai-man commented Dec 20, 2016

Mostly there are issues caused by "broken" method names.

The whole thing is like this:

  1. We attach a type name to the method name to create unique link on a page like this:
    #(Metamodel::AttributeContainer)_method_rw
  2. But when we make a link to method/routine/sub, it becomes just method_rw - so the link is invalid.

Without a type name there will be duplication of ids with routine/method/sub pages.

The priority task is to find a way to avoid duplication and to avoid breaking fragments. About 3 months ago, there was a talk with gfldex++ about how one can use number suffixes to avoid duplication in Pod::To::Html, but this part is hard.

There also a few easier issues such as links to items removed from Glossary, etc. They can be just removed/fixed and it will be okay, I think.

@Altai-man
Copy link
Member

Altai-man commented Dec 21, 2016

Some further investigation notes:

@AlexDaniel
Copy link
Member Author

Still looks pretty bad.

@AlexDaniel
Copy link
Member Author

Updated the graph again.

@AlexDaniel
Copy link
Member Author

FWIW this is the command I'm using:

checklink -b -D 25 -q doc.perl6.org | tee "$(date '+%F')"

@JJ
Copy link
Contributor

JJ commented Feb 10, 2018

What about listing broken links by file and go at them, one by one? Some links might be errors, some of them might actually have disappeared due to bitrot...

@AlexDaniel
Copy link
Member Author

@JJ well, you can just search the output for “404 Not Found” error and you'll find all broken links. Everything else is probably broken html. The goal here is to get the output to 0 lines (we were almost there in 2016).

P.S. Updated the graph and the output again.

@AlexDaniel AlexDaniel assigned JJ and unassigned JJ May 5, 2018
@JJ
Copy link
Contributor

JJ commented May 7, 2018

@abraxxa thanks a lot for the report, but there are so many, that it's difficult to know where to start...

@JJ JJ removed the JJ TPF Grant label May 14, 2018
@JJ
Copy link
Contributor

JJ commented Aug 13, 2018

Most 404 are now done, but now heading titles have drifted from their links, like in #2146 . Whenever you're working on a page, please check outgoing anchors for correctness.

JJ added a commit that referenced this issue Aug 13, 2018
JJ added a commit that referenced this issue Aug 28, 2018
Refs #561 #1838, but mainly it's been a very extensive reflow, there were very long lines here. Main intention was to work towards #2283
@JJ JJ mentioned this issue Sep 4, 2018
JJ added a commit that referenced this issue Sep 4, 2018
Which will close #2296, and also refers to #561. There might be other links in the same page, will have to revise it.
@JJ
Copy link
Contributor

JJ commented Nov 4, 2018

@AlexDaniel can you run this again?

@AlexDaniel
Copy link
Member Author

@JJ thanks for pinging, updated.

@JJ
Copy link
Contributor

JJ commented Nov 5, 2018

So it got worse...

antoniogamiz added a commit that referenced this issue Jul 30, 2019
antoniogamiz added a commit that referenced this issue Aug 1, 2019
antoniogamiz added a commit that referenced this issue Aug 1, 2019
@JJ
Copy link
Contributor

JJ commented Jul 26, 2020

Maybe we should check this again...

@AlexDaniel
Copy link
Member Author

@JJ updated. But there's not much of use updating it unless somebody wants to actively work on fixing the issues.

@AlexDaniel
Copy link
Member Author

A lot of broken links seem to be caused by a different directory structure:

Things like:

https://github.com/perl6/doc/blob/master/doc/Type/X::Proc::Async::AlreadyStarted.pod6
-> https://github.com/Raku/doc/blob/master/doc/Type/X::Proc::Async::AlreadyStarted.pod6
Line: 224
Code: 301 -> 404 Not Found

It doesn't mind the redirect (although we should avoid it anyway), but the link should have Type/X/Proc/Async/AlreadyStarted.pod6 (without ::).

@antoniogamiz
Copy link
Contributor

Where have you found that link?

@AlexDaniel
Copy link
Member Author

@antoniogamiz you can see it in the output yourself.

In this case it's the 🖉 link (Edit this page) from https://docs.raku.org/type/X::Proc::Async::AlreadyStarted.

@JJ
Copy link
Contributor

JJ commented Jul 26, 2020 via email

@antoniogamiz
Copy link
Contributor

In reality, output is a little bit smaller, because if you take a look at:

Processing	https://docs.raku.org/type.html


List of broken links and other issues:
https://webchat.freenode.net/?channels=	
  Line: 50
  Code: 200 OK
 To do: Some of the links to this resource point to broken URI fragments
	(such as index.html#fragment).
The following fragments need to be fixed:
	raku                          	Line: 50

https://github.com/Raku/doc/blob/master/CONTRIBUTING.md	
  Line: 78
  Code: 200 OK
 To do: Some of the links to this resource point to broken URI fragments
	(such as index.html#fragment).
The following fragments need to be fixed:
	reporting-bugs                	Line: 78

You that is reporting

But these links work perfectly. I do not know what those appear as broken fragments.

@JJ
Copy link
Contributor

JJ commented Jul 27, 2020 via email

@antoniogamiz
Copy link
Contributor

antoniogamiz commented Jul 27, 2020

Hum, that's a weird behavior. Anyway, I'm using this command

 checklink -s -b -r --exclude "(webchat|CONTRIBUTING|perl6\.html)" docs.raku.org

And errors in edit links will be solved in the next Documentable release.

@AlexDaniel
Copy link
Member Author

Yeah, I'm not sure about these particular ones. The link to webchat can be changed to have %23raku instead, it will both work and be a bit more correct I think.

@Altai-man Altai-man self-assigned this Nov 7, 2020
@coke
Copy link
Collaborator

coke commented Feb 3, 2023

Will track in Raku/doc-website#72 on new site.

@coke coke closed this as completed Feb 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
big Issue consisting of many subissues docs Documentation issue (primary issue type)
Projects
None yet
Development

No branches or pull requests

8 participants