This web crawler project aims at creating a program that is capable of mapping websites by finding hyperlinks, pointing to other subpages of the current domain, on the page. The crawler itself will be written in Go, and support tools are most likely going to be written in Python.
The crawler should be able to:
- request a webpage using a hyperlink,
- parse webpage content,
- locate hyperlinks on the page,
- spread using the found hyperlinks,
- catalog pages visited.
Will be added later.
For the most part, the crawler will be tested using small, controlled environments; most likely a set of interlinked, text based pages located behind localhost adresses.
Additional analytics tools will be tested using unit testing. testesttest