Skip to content

Latest commit

 

History

History
54 lines (34 loc) · 1.56 KB

README.md

File metadata and controls

54 lines (34 loc) · 1.56 KB

malwareclustering

MalwareClustering with ApiVector

Starting from pure python, it will be shown multiprocessing, numpy, cython, dask, arriving to dask-cuda with cupy: A NumPy-compatible matrix library accelerated by CUDA. The study explored also differents places to store and retrieve data such as Neo4j, MongoDB, PostgreSQL and different data format like strings, numpy vectors and numpy packbits vectors.

As today we got best results using dask-cuda, cupy and zarr.

algorithm

Presentation with benchmark and results is available here: https://ldo-cert.github.io/MISP-Summit-05/#/home

language 1 vs 1 1 vs many many vs many
python x x x
numpy x x x
numexpr x x x
numba x x x
pybind11 x x x
cython x x x
pythran x x x
dask x x x
tensorflow x x x
dask-cuda with cupy x x x
data source size times
Neo4J x x
MongoDB x x
PostgreSQL x x
Zarr x x
data data type size times
ApiScout string x x
numpy vector binary x x
numpy packbits vector binary x x
zarr arrays binary x x

algorithm