1. Overview

PLC registers reading
Modbus message captures
Data processing
Interactive graphs and statistical analysis
Invariant inference
Business process mining

2. Requirements

Operating system: Unix-like environments, including Linux, Mac OS X, and Windows Subsystem for Linux (WSL)
Python 3.8 and PIP 3

sudo apt update
sudo apt upgrade
sudo apt install python3.8
sudo apt install python3-pip

Python3.8 libraries: pandas, matplotlib, numpy, ray, json, glob, modbus_tk, scipy

pip3 install -r requirements.txt

Java JDK version 8 or higher.

sudo apt-get install openjdk-8-jdk

Gradle Build Tool : installation
perl 5

sudo apt install perl

TShark - Wireshark 3.4.8

sudo apt install wireshark

To install from source

wget https://www.wireshark.org/download/src/wireshark-3.4.8.tar.xz -O /tmp/wireshark-3.0.0.tar.xz tar -xvf /tmp/wireshark-3.4.8.tar.xz cd /tmp/wireshark-3.0.0 sudo apt update && sudo apt dist-upgrade sudo apt install cmake libglib2.0-dev libgcrypt20-dev flex yacc bison byacc \ libpcap-dev qtbase5-dev libssh-dev libsystemd-dev qtmultimedia5-dev \ libqt5svg5-dev qttools5-dev cmake . make sudo make install

Daikon 5.8.10 : installation
Fluxicon Disco 3.2.4 : installation
Disco is not supported by Unix-like operating systems. The users can make use of Wine or Darling to install and run this software.

3. Information gathering

3.1 PLC registers reading

Execute the script main.py to generate the data logs of the PLCs registers

 python3 main.py simTime samplingTime

simTime is the simulation time of the CPS model in seconds.
samplingTime is the sampling frequency in seconds.

Ray framework was used to get simultaneous data from the PLCs and to seamlessly scale to a distributed attack architecture (eg. Botnet) if needed. The output are JSON Files, with the following naming convention:

 {name_of_the_PLC}-{ip_of_the_PLC}-{port_of_the_PLC}@{timestamp}.json

These files are saved in the folder historian/ contained in the main directory.

3.2 Modbus message capture

In parallel with main.py, Tshark has to be started. To start capturing packets a capture interface has to be specified, Tshark will treat the first interface as the default interface and capture from it by default. In other words, tshark aliases to tshark -i 1 To list all the interfaces available to Tshark and select another one

tshark -D

Run the capture

tshark  -i 1 -w modbusPackets.pcap-ng

While running, the total number of captured packets will appear on the console. Tshark generates a pcap-ng files that contains all the information about the captured packets. Once the pcap-ng file is created it can be translated int a CSV file by running

tshark -r modbusPackets.pcap-ng -T fields -E occurrence=f -e m -e t -e s -e d -e p -e L -e Cus:modbus.func_code:0:R -e Cus:modbus.bitval:0:R -e Cus:text:0:R -e Cus:modbus.regval_uint16:0:R -e Cus:mbtcp.trans_id:0:R -e i

4. Information processing

4.1 Data processing

The goal of the data processing is to convert the resulted files from the information gathering into datasets acceptable by invariant detection and business process mining tools.

Executethe script convertoCSV.py by specifying an integer value of the variable numberofPLCs that indicates the number of PLCs controlling the CPS model.
Execute mergeDatasets.py to convert the JSON files to a CSV datasets. The column hold the values of the registers for each PLC with the following naming convention {name_of_the_PLC}_{name_of_the_Register}.
The outputs are two CSV files saved in the directories PLC_CSV and process-mining/data.

 python3 convertoCSV.py numberofPLCs
 python3 mergeDatasets.py

The file saved in process-mining/data is a timestamped dataset, it will be used for the business process mining.
The file saved in PLC_CSV is an enriched dataset with a partial bounded history of registers, and additional informations such as stable states, slope values of measurements and relative setpoints. This dataset will be used for the invariant detection.

4.2 Interactive graphs and statistical analysis

Execute the script runChartPlots.py :

  python3 runChartPlots.py var1 var2 .... varn

The outputs of this execution are run-sequence plots of the specified variables in function of the simulation time.

Execute the script histPlots_Stats.py :

  python3 histPlots_Stats.py var

The outputs of this execution are a histogram and statistical informations of the variable var.
These informations include :

The mean, median, standard deviation, the maximum and minimum values.
Two tests are performed for the statistical distribution : Chi-squared test for uniformity and Shapiro-Wilk test for normality.

4.3 Invariant inference

The invariant generation is done using the front-end tool of Daikon for CSV dataset. To install Daikon follow the guide.
Execute the bash script runDaikon.sh to generate the invariants.

  ./runDaikon.sh

This script offers a query system to target specific invariants and to specify conditional invariants.
The users have the possibility to insert a variable name in order to display the associated invariants.
The users can customize the splitter info file Daikon_Invariants/Inv_conditions.spinfo by specifying the conditions that Daikon should use to create conditional invariants.
Spinfo file example :

PPT_NAME aprogram.point:::POINT
VAR1 > VAR2
VAR1 == VAR3 && VAR1 != VAR4

The results of the invariant analysis will be saved in the location Daikon_Invariants/daikon_results.txt.
The conditional invariant will be saved in the location Daikon_Invariants/daikon_results_cond.txt.

4.4 Business process mining

This step relies on Disco to generate graphs representing the business process. Disco takes as input a CSV file containing the exchanged messages between the PLCs of the CPS model and the values of the PLCs registers.
To create this CSV file we use a java program to convert the pcap files and the CSV dataset generated from the previous steps.

The first step is to compile our java program. Within the directory process-mining run the command:

./gradlew build

The second step is to convert the pcap file and the csv dataset into an admissible format by Disco:

./gradlew runMessages
./gradlew runReadings

The final step is to combine the resulting files in a single one to generate the business process graphs:

./gradlew Merge

The output files are saved in directory process-mining/data.

To generate the business process graphs:
Launch Disco > Open File > Select the file MergeEvents.csv > Define each column role > Click Start Import

Experimental Data

PLC registers captures (JSON) Extract the JSON files to the directory /historian.
Timestamped Dataset register values (CSV) Place the CSV file in the directory process-mining/data.
Dataset register values (CSV) Place the CSV file in the directory daikon/Daikon_Invariants.
Network capture (CSV) Place the CSV file in the directory process-mining/data.
Network capture (PCAPNG) Convert the pcap file to CSV by using the tshark commands.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

1. Overview

2. Requirements

3. Information gathering

3.1 PLC registers reading

3.2 Modbus message capture

4. Information processing

4.1 Data processing

4.2 Interactive graphs and statistical analysis

4.3 Invariant inference

4.4 Business process mining

Experimental Data

Files

README.md

Latest commit

History

README.md

File metadata and controls

1. Overview

2. Requirements

3. Information gathering

3.1 PLC registers reading

3.2 Modbus message capture

4. Information processing

4.1 Data processing

4.2 Interactive graphs and statistical analysis

4.3 Invariant inference

4.4 Business process mining

Experimental Data