-
Notifications
You must be signed in to change notification settings - Fork 10
/
README.txt
166 lines (122 loc) · 6.36 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
SDBF (Smart DNS Brute-Forcer) package propose 3 tools to generate
and discover domain names or dubdomains of a domain names.
Release:
sdbf-1.0
Date :
15-07-2012
Authors:
Cynthia Wagner, University of Luxembourg, Luxembourg
Samuel Marchal, University of Luxembourg, Luxembourg
Jérôme François, University of Luxembourg, Luxembourg
Radu STATE, University of Luxembourg, Luxembourg
Thomas Engel, University of Luxembourg, Luxembourg
Organism attachement:
SnT- Interdisciplinary Centre for Security, Reliability and Trust
Université du Luxembourg
Campus Kirchberg- Office F210
6, rue Richard Coudenhove- Kalergi
L- 1359 Luxembourg
License:
GNU General Public License 3.0
**First, use markov.pl script to generate 2 files base on a list
of domain names contained in a txt file (1 domain per line) :
- a "distribution" file which contains character frequencies,
length distribution of domains, length distribution of names
for differents domain levels...
- a "transition" file which contains probability of transition
from a n-gram to the following character
usage : ./markov.pl REAL_DOMAIN_LIST N-GRAM_SIZE OUTPUT_DISTRIBUTON_FILE OUTPUT_TRANSITION_FILE
ex : ./markov.pl domains.txt 3 distribution.txt transition.txt
Requirements:
Install pybloom http://pypi.python.org/pypi/pybloom
**Second, use sdbf.py script to generate new domain names based
on the "distribution" and the "transition" files previously generated
(sample "distribution" and "transition" files are given in the package)
Options :
-h, --help show this help message and exit
-d FILE, --distribution=FILE
general distribution file (with length frequencies)
-t FILE, --transition=FILE
character transition matrix
-e EPS, --epsilons=EPS
epsilon values for empty values in transition matrix
-b EPS_START, --epsilons-start=EPS_START
epsilon values for empty values in starting character
distribution
-l EPS_LENGTH, --epsilons-length=EPS_LENGTH
epsilon values for empty values in length distribution
-n NUMBER_GENERATE, --number-to-generate=NUMBER_GENERATE
number of names to generate
-s SUFFIX, --suffix=SUFFIX
suffix value
-p PREFIX, --prefix=PREFIX
prefix value
-w LEVELS, --word-level=LEVELS
word levels to generate
--cw=CWORDS, --custom-words=CWORDS
length (in words) of the custom words (prefix and
suffix)
--mxw=MXW, --max-length-words=MXW
maximal word lengths (may be adjusted regarding the
training)
--miw=MIW, --min-length-words=MIW
minimal word lengths (may be adjusted regarding the
training)
-f FILE, --features=FILE
if specified the program will generate the different
feature for the domains contained in the mentionned
file
-o FILE, --output=FILE
output file with accessible names or feature
ex :
./sdbf.py -d distribution.txt -t transition.txt -n 1000 -o results.txt
# probe of 1000 generated domain names and store positive results in results.txt
./sdbf.py -d distribution.txt -t transition.txt -n 1000 -p www. -w "0 1" -o results.txt
# probe of 1000 generated domain names starting with "www." and of size 3 (level 0 and 1 generated)
(ex : www.amazon.com) and store positive results in results.txt
./sdbf.py -d distribution.txt -t transition.txt -n 5000 -s google.com --cw 2 -w 2 -o results.txt
# probe of 5000 generated domain names ending with "google.com" and of size 3 (level 2 generated)
(ex : mail.google.com) and store positive results in results.txt
!!!!!!!!!!!!!!! To use the third tool of the package you need to download :
* DISCO at http://www.linguatools.de/disco/disco-1.2.jar
and the following disctionnaries (follow the instructions given on the website for
dictionnary installation)
- http://www.linguatools.de/disco/disco-languagedatapackets_en.html#enwiki
- http://www.linguatools.de/disco/disco-languagedatapackets_en.html#frwiki
- http://www.linguatools.de/disco/disco-languagedatapackets_en.html#degen
create a new folder called "Disco" at the same level of the scripts and copy
the jar file and the 3 folder corresponding to each dictionnary
**Third, use semanticexp.py script to discover subdomains of a given domain based on a
list of existing subdomains (of a common domain) contained in a txt file (1 domain per line)
this list can be generated from sdbf.py for instance as with the last example.
three different tools available :
- semantic exploration (-d)
- incremental discovery (-p)
- splitter (-s)
Options:
-h, --help show this help message and exit
-d, --disco use DISCO semantic tool
-s, --splitter use word splitter tool (combined with DISCO)
-p, --increment use incremental discovery tool
-i FILE, --input=FILE
domains previously discovered
-o FILE, --output=FILE
output file with accessible names or feature
-n COUNT, --horizontal=COUNT
horizontal depth : number of domains tested per domain
in the initial dataset
-v VERTICAL, --vertical=VERTICAL
vertical depth : number of iteration over the new
domain lists (only for -d)
-e, --english use the english dictionnary (only for -d)
-g, --german use the german dictionnary (only for -d)
-f, --french use the french dictionnary (only for -d)
ex :
./semanticexp.py -d -egf -i scan_google.com -o semanticexp_google.com -n 100 -v 3
# use DISCO semantic tool with the 3 dictionnaries to discover new subdomains
100 domains are tested for each subdomains in the initial dataset
and a maximum of 3 iterations are made (we apply at most 3 times the technique
on newly discovered domain) results are stored in semanticexp_google.com
./semanticexp.py -sp -i scan_google.com -o semanticexp_google.com -n 50
# use splitter and incremental discovery tool to discover new subdomains
50 domains are tested for each subdomains in the initial dataset