Our framework is DyGETViz, which stands for Dynamic Graph Embedding Trajectories Visualization.
[Project Page] [Demo] [Data]
conda create -n dygetviz python=3.9 -y
conda activate dygetviz
pip install --upgrade pip # enable PEP 660 support
pip install -e .
If you want to manually install the dependencies, run:
conda install scikit-learn pandas numpy matplotlib plotly
conda install -c conda-forge dash dash-daq dash-bootstrap-components biopython
pip install umap
Please refer to the homepage of PyTorch, PyTorch Geometric, and PyTorch Geometric Temporal to install these 3 packages, respectively.
git pull
pip install -e .
Please check our demo at our website.
- Download all the data from Google Drive
- Put both
data/
andoutputs/
under the root directory of this repo.
-
Step 1: Discrete-Time Dynamic Graph (DTDG) embedding training
- We use the GConvGRU model from PyTorch Geometric Temporal to train embeddings of all datasets
- We extended the dataloader so that we can use a wide variety of data input formats. The original dataloader only used static input at each snapshot.
- Note: This part is not included in the code yet. For now, we directly provide the embeddings.
-
Output: DTDG embeddings of shape (T, N, D)
- T: The number of timestamps / snapshots
- N: The number of nodes
- D: Embedding dimension
Step 2: Embedding Trajectories Generation
-
Input: DTDG embeddings of shape (T, N, D)
-
Output: JSON file that store the embedding trajectory for Dash
Step 3: Visualizing in a Dash app interactively using the JSON file
-
Users should be able to incrementally add node trajectories / all nodes under a certain category (e.g., normal users v.s. anomalous users) to the visualization
-
highlighted_nodes: List of nodes to be highlighted in the visualization. We need to specify these nodes because we only show the names of a small number of nodes in the plotly visualization. Otherwise, the generated plot will be too messy.
-
plot_dtdg.py: Script for generating the visualization
Generate the visualization using the command:
python dygetviz/plot_dtdg.py --dataset_name <DATASET_NAME> --model GConvGRU
Currently, DATASET_NAME
can be selected from one of: Ant
, Chickenpox
, DGraphFin
, Reddit
python dygetviz/plot_dtdg.py --dataset_name Chickenpox --model GConvGRU
python dygetviz/plot_dash.py --dataset_name Chickenpox --model GConvGRU
dygetviz supports all temporal networks in [Stanford Large Network Dataset Collection] (https://snap.stanford.edu/data/index.html). Basically, each row is a tuple of (source, target, timestamp) representing an edge in the graph snapshot,
edges.tsv
SRC DST TIME
1 2 1082040961
3 4 1082155839
5 2 1082414391
6 7 1082439619
8 7 1082439756
9 10 1082440403
...
An optional nodes.tsv can be provided to indicate the node names. If not provided, the node names will be automatically generated as integers starting from 0.
ID NAME
0 Anna
1 Bob
2 Charlie
3 David
4 Emma
...
You can also specify an additional column to indicate the node label, such as whether the user is a normal user or an anomalous user.
ID NAME LABEL
0 Anna 0
1 Bob 1
2 Charlie 0
3 David 0
4 Emma 1
...
DG
: Dynamic Graphs, which can be categorized into DTDG and CTDGDTDG
: Discrete-Time Dynamic Graphs (the type of graphs we are dealing with)CTDG
: Continuous-Time Dynamic GraphsEmbedding Trajectories
: Please refer to the JODIE paper (KDD2019) for more details
We provide the following dataset to be viewd in our visualization tool:
Ant
: The ant movement dataset from Tracking individuals shows spatial fidelity is a key regulator of ant social organization (Science 2013)Chickenpox
: The chickenpox dataset from the paper Chickenpox Cases in Hungary: a Benchmark Dataset for Spatiotemporal Signal Processing with Graph Neural NetworksHistWords
: The historical word co-occurrence dataset from Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change (GitHub) (Website)
node2idx
: A dictionary that maps node names to node indices (usually starting from 0 to #nodes-1).embeds_<DATASET_NAME>.npy
: The node embeddings generated by DyGET. The shape of the embeddings is#nodes x #time_steps x #embedding_dim
.
- The Reddit dataset is a bit special because it is the only dataset that describes a bipartite graph. The first 60 snapshots are for each of the 60 snapshots. The last snapshot is for the background nodes. The shape of the embeddings is ``
We thank members of the CLAWS Lab and SRI International for their feedback and support.