All eight dynamic text-attributed graphs provided by DTGB can be downloaded from here.
Each graph is preserved through three files.
- edge_list.csv: stores each edge in DyTAG as a tuple. i.e.,
(u, r, i, ts, label)
.u
is the id of the source entity,i
is the id of the target entity,r
is the id of the relation between them,ts
is the occurring timestamp of this edge,label
is the label of this edge. - entity_text.csv: stores the mapping from entity ids (e.g.,
u
andi
) to the text descriptions of entities. - relation_text.csv: stores the mapping from relation ids (e.g.,
r
) to the text descriptions of relations.
- After downloading the datasets, they should be uncompressed into the
DyLink_Datasets
folder. - Run
get_pretrained_embeddings.py
to obtain the Bert-based node and edge text embeddings. They will be saved ase_feat.npy
andr_feat.npy
respectively. - Run
get_LLM_data.ipynb
to get the train and test set for the textual relation generation task. They will be saved asLLM_train.pkl
andLLM_test.pkl
respectively.
- Example of training DyGFormer on GDELT dataset without text attributes:
python train_link_prediction.py --dataset_name GDELT --model_name DyGFormer --patch_size 2 --max_input_sequence_length 64 --num_runs 5 --gpu 0 --use_feature no
- Example of training DyGFormer on GDELT dataset with text attributes:
python train_link_prediction.py --dataset_name GDELT --model_name DyGFormer --patch_size 2 --max_input_sequence_length 64 --num_runs 5 --gpu 0 --use_feature Bert
- The AP and AUC-ROC metrics on the test set (both transductive setting and inductive setting) will be automatically saved in
saved_resuts/DyGFormer/GDELT/DyGFormer_seed0no.json
- The best checkpoint will be saved in
saved_resuts/DyGFormer/GDELT/
folder, and the checkpoint will be used to reproduce the performance on the node retrieval task.
After obtaining the best checkpoint on the Future Link Prediction Task. The Hits@k metrics of the Destination Node Retrieval Task can be reproduced by running:
python evaluate_node_retrieval.py --dataset_name GDELT --model_name DyGFormer --patch_size 2 --max_input_sequence_length 64 --negative_sample_strategy random --num_runs 5 --gpu 0 --use_feature no
- The
negative_sample_strategy
hyper-parameter is used to control the candidate sampling strategies, which can berandom
andhistorical
. - The
use_feature
hyper-parameter is used to control whether to use Bert-based embeddings, which can beno
andBert
.
- Example of training DyGFormer on GDELT dataset without text attributes:
python train_edge_classification.py --dataset_name GDELT --model_name DyGFormer --patch_size 2 --max_input_sequence_length 64 --num_runs 5 --gpu 0 --use_feature no
- The Precision, Recall, and F1-score metrics on the test set will be automatically saved in
saved_resuts/DyGFormer/GDELT/edge_classification_DyGFormer_seed0no.json
After obtaining the LLM_train.pkl
and LLM_test.pkl
files. You can directly reproduce the performance of original LLMs by running
python LLM_eval.py -config_path=LLM_configs/vicuna_7b_qlora_uncensored.yaml -model=raw
- You can change the LLMs through the
config_path
hyper-parameter. - The generated text will be saved in
s_his_o_des_his_result_vicuna7b.pkl
.
And then to get the Bert_score metrics, you should change the file path in LLM_metric.py
and run:
python LLM_metric.py
If you want to fine-tune the LLMs, you should run:
python LLM_train.py LLM_configs/vicuna_7b_qlora_uncensored.yaml
and then reproduce the performance of the fine-tunned LLMs by running
python LLM_eval.py -config_path=LLM_configs/vicuna_7b_qlora_uncensored.yaml -model=lora
For any questions or suggestions, you can use the issues section or contact us at (zjss12358@gmail.com).
Codes and model implementations are referred to DyGLib project. Thanks for their great contributions!
@article{zhang2024dtgb,
title={DTGB: A Comprehensive Benchmark for Dynamic Text-Attributed Graphs},
author={Zhang, Jiasheng and Chen, Jialin and Yang, Menglin and Feng, Aosong and Liang, Shuang and Shao, Jie and Ying, Rex},
journal={arXiv preprint arXiv:2406.12072},
year={2024}
}