Useful HTML validators
What counts here: how many useful errors are reported, how many false positives are reported, how much time is needed to use this tools.
I want to find all broken links on my website. I also would want to check and detect orphaned pages (pages not reachable from my main page by following links).
Note that unlinked orphaned pages will not be checked by this tools, unless noted explicitly.
I want a script that
- runs locally
- can validate locally present files - not only online websites (I want to check for dead links before publishing)
- but also can validate published websites
- works without hangups/crashes/mishandling UTF8
- can be used to detect orphaned pages
- detects dead link in
<a href=
,<img src=
, linked css/html files and more - can be used as is or is easily modifiable by me
I made a project full of test cases for easy testing of potential tools.
Note that local html files can be served on localhost in a relatively simple way, then any link checker running on your computer can check them. Without any extra support for reading files.
One of solutions is below. Not entirely happy about it but it works:
sudo npm install http-server -g
BTW, is there way to install node modules without sudo and have it within PATH
? If yes, please open an issue in this project or in other way.
http-server
in directory with html files
then use for example site-graph tool
cd site-graph
p site_graph.py http://127.0.0.1:8080/ --visit-external --force
- html-proofer by gjtorikian
- Link check fails when
example
is linked instead ofexample.html
while it works at Github Pages. Requires an extra parameter to stop requiring explicit.html
htmlproofer /home/path_to_entire_folder/ --assume-extension --check-html --check-favicon --log-level warn
htmlproofer /home/mateusz/Desktop/kolejka/portfolio/test_cases_for_detecting_link_rot/ --assume-extension --check-html --check-favicon --log-level warn
htmlproofer ../test_cases_for_detecting_link_rot/ --assume-extension --check-html --check-favicon --log-level warn
- Link check fails when
- this site-graph tool is promising as a base, I am contributing to it
- remember to use
--visit-external
- it is disabled by default!
- remember to use
- link-checker
- linkchecker works very nicely
linkchecker https://matkoniecz.github.io/dead_links_testing_site/
- outputting site graph is one of listed features! So detecting orphaned pages should be feasible...
linkchecker https://matkoniecz.github.io/dead_links_testing_site/ --verbose -o csv
seems parsable to detect orphaned pages- but has problems with utf-8 support - but this should be fixed now!
- another option is wget and parsing its log. Mentioning for completeness but it looks like a nasty quagmire for me.
wget --spider -o wget.log -e robots=off --wait 1 -r -p https://matkoniecz.github.io/dead_links_testing_site/
cat wget.log | grep 404
- https://github.com/LukasHechenberger/broken-link-checker-local - but it is a dead buggy project, last commit in 2021, it is known to hang randomly (reported in 2017, remains unfixed as of 2024)
blcl -ro . --filter-level 3
blcl -ro . --filter-level 3 | grep 'BROKEN'
- UTF-8 support has some issues - see an upstream issue - reported in 2021, as of 2024 still has "needs confirmation" label and remains unfixed
- w3c link checker may look promising
- but its installation instructions are broken
Test made by Google. Especially important as hopefully what is reported here is similar to factors considered by Google for ranking mobile-friendly websites higher.
Following suggestions (like using viewport) from it may save time on what would be otherwise wasted on unneeded debugging.
Checker of grammar and language. Not very smart and has plenty of false positives but sometimes catches real problems. I consider it worth using to avoid wasting time and attention of a human proofreader on obvious things. Accepts markdown and raw html as input. Requires registration to work properly, strongly pushes a paid version.
Not a HTML-focused tool (it checks any text), with plenty of problems and still turned out to be the more useful than most automatic validators.
I use simple script to check all text at once.
For example remember to update your leaflet .js and .css files. (is there a way to automate that?)
htmlproofer folder_to_validate --check-html --check-favicon
is the only automatic validator that I found so far that reminds about favicons.
Link check fails when example
is linked instead of example.html
while it works at Github Pages.
On the other hand it found some actual dead links...
Scriptability-friendly validator. So far it reported no user-visible problems, but installation (pip install html5validator
) and running (html5validator --show-warnings --root folder_to_validate
) is easy so it may be worth using.
https://github.com/validator/validator via java .jar file - relatively easy to install (npm install --save vnu-jar
, move .jar file to known location) and use, reported some minor but user-visible problems (pages with text and without any <h1> tags) that helped to improve the site.
I use it as follows (command executed in main folder of .html and .css files):
find . -name ".html" -exec java -jar /path_to_vnu_jar/vnu.jar --also-check-css --also-check-svg --verbose {} ; find . -name ".css" -exec java -jar /path_to_vnu_jar/vnu.jar --also-check-css --also-check-svg --verbose {} ;
Runs online on https://validator.w3.org/nu/
https://github.com/gjtorikian/html-proofer
I use it like this /usr/local/bin/htmlproofer . --check-html --check-favicon --log-level warn
, from root folder of a project.
https://github.com/oscardelben/rawler
Appears to be able to check only live websites.
/usr/local/bin/rawler https://mapsaregreat.com | /bin/grep -v "] INFO -- : 200 - "
https://github.com/stylelint/stylelint/blob/master/docs/user-guide/cli.md
Looks potentially useful, not worth configuring effort for me at this moment.
https://github.com/w3c/css-validator + http://jigsaw.w3.org/css-validator/
Not investigated for now, but looks like something useful.