stormtrooper

A machine learning approach to fingerprinting web traffic

Software stack from pcap to machine learning:

*1. traffic capture

capture data on the wire, filter on local IP address, export/save as .pcap file

- store pcap in data_collection/streams/pcaps 

- use site_get.sh or site_get_multiple.sh for collection

*2. split TCP streams

data_collection/streams/pcap_split.sh <input_pcap> <output_directory>

- input_pcap: from step 1

- output_directory: new directory name, will be created in data_collection/streams

- operation: split pcap into a number of streams as separate pcap files

*3. sanity checking

- data_collection/pcap_ips.py <stream dir> <number of files>

- stream dir: directory of split stream pcaps from step 2

- number of files: number of files created in step 2, +1

  *note: you can run this while step 2 is still running, just limit the number of files to the number that have thus far been processed

*4. format for ML

-data_collection/formatter.py <stream_dir> <number of files> <whitelist[file/0]> <threshold packet count>

- stream_dir: directory of split pcaps from step 2

- number of files: number of files to process from directory (upper bounded, non inclusive, so add 1)

- whitelist: file with lines of the form:

    <ip_prefix> <classification>

 will match all IPs that begin with the prefix with the specified class
 classification should be one word, with only the first letter capitalized:
 ex: Google, Baidu, Telex-google, Umich_EECS

 ensure that if this class has been used before, you use the same exact spelling

*5. BLJL

- run classifier: ./bljl <training_file> <test_file>

  files are of the form outputted by formatter.py

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
data_collection		data_collection
ml		ml
README.md		README.md
google_and_baidu_traffic_500.pcap		google_and_baidu_traffic_500.pcap
google_pageload_std.pcapng		google_pageload_std.pcapng
google_pageload_std_v2.pcapng		google_pageload_std_v2.pcapng
google_pageload_std_v3_osx.pcapng		google_pageload_std_v3_osx.pcapng
google_pageload_stdx.txt		google_pageload_stdx.txt
google_pageload_telex.pcapng		google_pageload_telex.pcapng
google_pageload_telex.txt		google_pageload_telex.txt
google_pageload_telex_v2.pcapng		google_pageload_telex_v2.pcapng
initial_google_baidu_classifier.data		initial_google_baidu_classifier.data
initial_google_beidu_classifier_v2.data		initial_google_beidu_classifier_v2.data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

stormtrooper

About

Releases

Packages

Contributors 2

Languages

saghevli/stormtrooper

Folders and files

Latest commit

History

Repository files navigation

stormtrooper

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages