A super simple Python utility to check for dead links in a website.
PyLich is available on PyPI and can be installed using pip:
pip install pylich
Simply provide the URL of the sitemap and pylich
will crawl through links in the pages and check their status. pylich
can be used as a command line tool or as a Python package.
pylich https://www.example.com/sitemap.xml
The command will exit with a status code of 1 if any dead links are found and 0 otherwise.
Flag | Arguments | Description |
---|---|---|
-v |
N/A | Verbose mode. Print progress to the console as well as a summary of the dead links at the end. |
-i |
List of integer HTTP response codes | Ignore links with the specified HTTP response codes. |
pylich https://www.example.com/sitemap.xml -v -i 404 500
PyLich can also be used as a Python package.
from pylich import LinkChecker
checker = LinkChecker(
"https://www.example.com/sitemap.xml",
verbose=True,
ignored_status_codes=[404, 500]
)
urls = checker.get_sitemap_urls()
broken_links = checker.check_links(urls)
checker.print_dead_links()
Pull requests are welcome.
Package and dependency management is done using Poetry. To install the dependencies and the package in development mode, run:
poetry install
To run the tests, run:
pytest
Pre-commit hooks are available to run code formatting and linting. To install the hooks, run:
pre-commit install