Skip to content

Latest commit

 

History

History
5 lines (4 loc) · 264 Bytes

5.md

File metadata and controls

5 lines (4 loc) · 264 Bytes

If you were designing a web crawler, how would you avoid getting into infinite loops?

  1. hash according to url(sometimes www.careercup.com?foobar=hello is the same as www.careercup.com.)
  2. hash by content(can't work on dynamic page)
  3. deprioritize by similarity