This repository provides the code and data of the work in ACL2021: PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling Correction https://aclanthology.org/2021.acl-long.233.pdf
We have a new work on CSC in ACL 2022: https://github.com/liushulinle/CRASpell
Requirements:
-
python3
-
tensorflow1.14
-
horovod
Instructions:
-
Finetune:
train and evaluation file format: original sentence \t golden sentence
step1: cd finetune_src ; step2: download the pretrained PLOME model and corpus from https://drive.google.com/file/d/1aip_siFdXynxMz6-2iopWvJqr5jtUu3F/view?usp=sharing ; step3: sh start.sh
-
Pre-train
step1: cd pre_train_src ; step2: sh gen_train_tfrecords.sh ; step3: sh start.sh
Our Pretrained Models:
PLOME model: https://drive.google.com/file/d/1aip_siFdXynxMz6-2iopWvJqr5jtUu3F/view?usp=sharing
cBERT model: https://drive.google.com/file/d/1cqSTpn7r9pnDcvMoM3BbX1X67JsPdZ8_/view?usp=sharing
国内下载地址: