Note: This is an unofficial implementation.
- Extract packets from gzipped data using tshark
- Decodes packets using C++
This project implements high performance parsing of IEX's depth of book "DEEP" historical data using C++.
Market Data can be obtained from
Data follows IEX DEEP v1.6 specification
https://iextrading.com/docs/IEX%20Transport%20Specification.pdf
https://iextrading.com/docs/IEX%20DEEP%20Specification.pdf
This code extracts and decodes historical data from IEX DEEP specification.
Check if TShark is already installed
tshark --version
Install TShark if required
sudo apt update
sudo apt-get install tshark
Run the following code to get raw payload of the pcap packets
tshark -r <gz_file> -T fields -e data > <extracted_file_name>
Check if there is an existing installation of cmake
cmake --version
Install CMake if required
sudo apt update
sudo apt-get install cmake
mkdir build
cmake -S ./ -B build && cmake --build build
cd build && ctest
Decode payloads into csv files
./iex_deep_parser <input_file> <output_file_dir>
Check if GNU Parallel is already installed
parallel --version
Install GNU Parallel if required
sudo apt update
sudo apt install parallel
Run the following commands in terminal to extract payloads from gzipped pcap files parallely
# Define number of parallel jobs
export jobs=<Number_of_Parallel_Jobs>
# Define staggeration delay in seconds between jobs launches
# We add delay to avoid IO overload
export delay=<Delay_in_Starting_Jobs>
# Define input and output directory
# All paths are absolute
export input_dir=<gz_dir>
export output_dir=<output_dir>
# CD into input dir
cd $input_dir
# Parallel code to extract payload into output dir
ls *.gz | parallel "basename {} .pcap.gz" | parallel --retry-failed --shuf --jobs $jobs --delay $delay --eta --progress "tshark -r $input_dir{}.pcap.gz -T fields -e data > $output_dir{}"
Run the following commands in terminal to decode the payloads and save them to csv parallely
# Define number of parallel jobs
export jobs=<Number_of_Parallel_Jobs>
# Define staggeration delay in seconds between jobs launches
# We add delay to avoid IO overload
export delay=<Delay_in_Starting_Jobs>
# Define input and output directory
# All paths are absolute
export input_dir=<payload_dir>
export output_dir=<output_dir>
export iex_deep_parser_path=<location_of_parser_base_folder>
# CD into input dir
cd $input_dir
# Parallel code to decode payloads
ls * | parallel --retry-failed --shuf --jobs $n_jobs --delay $delay --eta --progress '$iex_deep_parser_path/build/iex_deep_parser $input_dir{} $output_dir{}'
@software{chandrasekaran_anirudh_bhardwaj_2021_6764244,
author = {Chandrasekaran Anirudh Bhardwaj},
title = {{Anirudhsekar96/IEX\_DEEP\_HISTORICAL\_DATA\_PARSER:
v0.1}},
month = may,
year = 2021,
publisher = {Zenodo},
version = {v0.1},
doi = {10.5281/zenodo.6764244},
url = {https://doi.org/10.5281/zenodo.6764244}
}