Skip to content

ExoDAO-Network/Rorur-ExoDAO-SearchDAO-Crawler

Repository files navigation

Version 2 of the crawler for the Decentralized Search project http://www.searchdao.net (a collaboration between Rorur and ExoDAO). Extremely high performance it runs on distributed clusters. With around 1000 well connected machines (AWS EC2 T3) it can crawl much of the indexable Internet in roughly 10 days. It is limited to the "classical Web", does not render and does not support pages driven by dynamic microservices (such as Web 2.0) given their personalized variability.

Build each of the top level files and run them as systemd daemons ( see example.service file).

Create file /etc/dse/dse.conf with the following information:

directory where data structures will be located

available disk space in GB ( at the moment, we do only namespace crawl so this number must be amortized: multiply your actual disk space by 10^4)

cpu usage ( target cpu usage for single processor)

bandwidth in MB/s ( target BW usage)

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages