Skip to content

Commit

Permalink
Added more informational content
Browse files Browse the repository at this point in the history
Since we link it from the paper we probably should add some more text
here anyway
  • Loading branch information
jogli5er committed Nov 7, 2018
1 parent f27ffcc commit a99454c
Showing 1 changed file with 18 additions and 3 deletions.
21 changes: 18 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,20 @@
[![Build Status](https://travis-ci.org/decrypto-org/spider.svg?branch=master)](https://travis-ci.org/decrypto-org/spider)
<br>
This is a first proof of concept for fetching .onion websites from the tor network.
Further, one has to extract links from the webpage (only .onion links) and
adding them to a Postgres DB
# Darknet Spider

This is still work in progress and most likely contains one or two bugs. If you find one please report it through the issue tracker. Note that this can also be more undrestood as a framework, which can be used to crawl, process and analyse web data with the ability to apply it to the Tor network. Depending on the proxy used, it can also be run on other networks.

<br>

The Darknet Spider consists of several modules (which are represented by the different subfolders in the project). Below the three most important submodules

## Crawler
The darknet spider is a program that crawls through the Tor network, following links recursively. In its current state it collects each link once and supports different prioritisation modes for the crawling process.
<br>

## Storing the data
The software requires a Postgres DB to be configured to store the collected data for further analysis.

## Analysing the data
The darknet spider contains two additional modules, one for preprocessing the collected data and another one for applying machine learning techniques on the collected and preprocessed material. Within the /classifier, one can include its own algorithms to be applied on the data.

0 comments on commit a99454c

Please sign in to comment.