This repo contains codes and pre-trained models for our paper
Segatron: Segment-aware Transformer for Language Modeling and Understanding
He Bai, Peng Shi, Jimmy Lin, Yuqing Xie, Luchen Tan, Kun Xiong, Wen Gao, Ming Li
AAAI 2021
To use this repo, please install NVIDIA APEX. We recommand using this docker or building your own environment with NGC's PyTorch container nvcr.io/nvidia/pytorch:20.03-py3
.
We have uploaded following checkpoints to the huggingace models:
- bert-base-500k
- segabert-base-500k
- segabert-large
- sentence-segabert-large
- segatron-xl-base
- segatron-xl-large
-
The source code is in the
segabert
folder, which is based on Megatron-LM's repository. -
Please refer to segabert/README.md for details.
-
The source code is in the
segatron-xl
folder, which is based on Transformer-xl's repository. -
Please refer to segatron-xl/README.md for details.
-
The source code is in the
transformers
folder, which is based on huggingface's Transformers repository. It should be notice that Segatron needs paragraph position index, sentence position index, and token position index in its input features. Hence we changed the input feature extraction and model forward functions of Transformers, which means our codes is not compatiable with the huggingface's Transformers. -
Please refer to transformers/README.md for details.
- The source code is in the
sentence-transformers
folder, which is based on Sentence-Transformer's repository. - Please refer to sentence-transformers/README.md for details.
Please cite the AAAI 2021 paper:
@inproceedings{bai2021segatron,
title={Segatron: Segment-Aware Transformer for Language Modeling and Understanding},
author={Bai, He and Shi, Peng and Lin, Jimmy and Xie, Yuqing and Tan, Luchen and Xiong, Kun and Gao, Wen and Li, Ming},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={35},
number={14},
pages={12526--12534},
year={2021}
}