Multi-Document Scientific Summarization from a Knowledge Graph-Centric View

This is the Pytorch implementation for [Multi-Document Scientific Summarization from a Knowledge Graph-Centric View], accepted by COLING 2022.

Requirements

Python == 3.6.3
Pytorch == 1.5.0
transformers == 4.10.3
dgl-cu101 == 0.6.1
pyrouge == 0.1.3
rake-nltk

Usage

Create folder trained_model, result , log under the root directory.
Download Multi-Xscience Dataset from here.
Train a Dygie++ model for extracting entitie and relations from here.
Dataset Format and Dataset Preprocessing:
- 4.1 The format of the input files are shown in folder example_data , including **.label.jsonl, **.ent_type_relation.jsonl,**.ent_promptsummary.jsonl, **.ent_importance_score.jsonl, **_summary.ent_importance_score.jsonl and output_**_processed_coref.json.
- 4.2 Extract entities and relations for Multi-Xscience using Dygie++. The output file is output_**_processed_coref.json. Postprocess the output file output_**_processed_coref.json to the format of **.ent_type_relation.jsonl
- 4.3 Using RAKE algorithm on output_**_processed_coref.json to calculate RAKE score for each entity candidate.
```
python script/keyphrase_extract.py
```
- 4.4 Create KGText samples using **_summary.ent_importance_score.jsonl.
```
python script/add_prompt_info.py
```

Training a new model

export PYTHONPATH=.

CUDA_LAUNCH_BLOCKING=1 python train.py  --mode train --cuda  --data_dir <path-to-datasets-folder> --batch_size 4 --seed 666 --train_steps 100000 --save_checkpoint_steps 4000  --report_every 1  --visible_gpus 0 --gpu_ranks 0  --world_size 1 --accum_count 2 --dec_dropout 0.1 --enc_dropout 0.1  --model_path  <path-to-trained-model-folder>  --log_file <path-to-log-file>  --inter_layers 6,7 --inter_heads 8 --hier --doc_max_timesteps 50 --prop 3 --min_length1 100 --no_repeat_ngram_size1 3 --sep_optim false --num_workers 5 --lr_dec 0.05 --warmup_steps 8000 --lr 0.005 --enc_layers 6  --dec_layers 6 --use_nucleus_sampling false --label_smoothing 0.1 --entloss_weight 1

Test

export PYTHONPATH=.

python train.py  --mode test --cuda  --data_dir <path-to-datasets-folder> --batch_size 8 --valid_batch_size 8 --seed 666   --visible_gpus 0 --gpu_ranks 0 --dec_dropout 0.1 --enc_dropout 0.1  --lr 0.2 --label_smoothing 0.0  --log_file <path-to-log-file>  --inter_layers 6,7 --inter_heads 8 --doc_max_timesteps 50 --use_bert false --report_rouge --alpha 0.4 --max_length 400 --result_path <path-to-result-folder> --prop 3 --test_all false --sep_optim false   --use_bert false  --use_nucleus_sampling false --min_length1 100 --min_length2 110 --no_repeat_ngram_size1 3 --no_repeat_ngram_size2 3 --test_from <path-to-saved-model-checkpoint>

References

@inproceedings{wang2022multi,
  title={Multi-Document Scientific Summarization from a Knowledge Graph-Centric View},
  author={Wang, Pancheng and Li, Shasha and Pang, Kunyuan and He, Liangliang and Li, Dong and Tang, Jintao and Wang, Ting},
  booktitle={Proceedings of the 29th International Conference on Computational Linguistics},
  pages={6222--6233},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
dataset		dataset
example_data		example_data
images		images
module		module
script		script
tools		tools
utils		utils
.gitattributes		.gitattributes
README.md		README.md
Tester.py		Tester.py
dict_file.txt		dict_file.txt
ertsumgraph_transformer.py		ertsumgraph_transformer.py
evaluate.py		evaluate.py
modules.py		modules.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Document Scientific Summarization from a Knowledge Graph-Centric View

Requirements

Usage

Training a new model

Test

References

About

Releases

Packages

Languages

muguruzawang/KGSum

Folders and files

Latest commit

History

Repository files navigation

Multi-Document Scientific Summarization from a Knowledge Graph-Centric View

Requirements

Usage

Training a new model

Test

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages