If you were designing a web crawler, how would you avoid getting into infinite loops?

hash according to url(sometimes www.careercup.com?foobar=hello is the same as www.careercup.com.)
hash by content(can't work on dynamic page)
deprioritize by similarity

Provide feedback

Saved searches