GitHub - ExoDAO-Network/Rorur-ExoDAO-SearchDAO-Crawler

Version 2 of the crawler for the Decentralized Search project http://www.searchdao.net (a collaboration between Rorur and ExoDAO). Extremely high performance it runs on distributed clusters. With around 1000 well connected machines (AWS EC2 T3) it can crawl much of the indexable Internet in roughly 10 days. It is limited to the "classical Web", does not render and does not support pages driven by dynamic microservices (such as Web 2.0) given their personalized variability.

Build each of the top level files and run them as systemd daemons ( see example.service file).

Create file /etc/dse/dse.conf with the following information:

directory where data structures will be located

available disk space in GB ( at the moment, we do only namespace crawl so this number must be amortized: multiply your actual disk space by 10^4)

cpu usage ( target cpu usage for single processor)

bandwidth in MB/s ( target BW usage)

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
src		src
README.md		README.md
adder.go		adder.go
config.go		config.go
crawler.go		crawler.go
example.service		example.service
go.mod		go.mod
go.sum		go.sum
inbox.go		inbox.go
links.go		links.go
monitor.go		monitor.go
network.go		network.go
outbox.go		outbox.go
shredder.go		shredder.go
stats.go		stats.go
uqueue.go		uqueue.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

ExoDAO-Network/Rorur-ExoDAO-SearchDAO-Crawler

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages