Follow robots.txt yes/no #9

eklem · 2014-11-04T08:35:57Z

-f --followrobotstxt <yes/no> if you want your fetcher to play nice or not

eklem · 2014-11-04T08:50:55Z

I guess there are two things to check for.
1: User agent and if it matches specific or * is used.
2: Make an array of parts of site to not follow and check each link that the crawler wants to follow against this array

eklem · 2014-11-04T08:55:37Z

And default to "yes". The user-agent string connects to this, but it's not necessary to develope this one.
#10

eklem added the enhancement label Nov 4, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Follow robots.txt yes/no #9

Follow robots.txt yes/no #9

eklem commented Nov 4, 2014

eklem commented Nov 4, 2014

eklem commented Nov 4, 2014

Follow robots.txt yes/no #9

Follow robots.txt yes/no #9

Comments

eklem commented Nov 4, 2014

eklem commented Nov 4, 2014

eklem commented Nov 4, 2014