If you were designing a web crawler, how would you avoid getting into infinite loops?
- hash according to url(sometimes www.careercup.com?foobar=hello is the same as www.careercup.com.)
- hash by content(can't work on dynamic page)
- deprioritize by similarity