nCoV-2019 related sentence similarity

If useful for you, maybe a star to encourage our work.

introduce

ERNIE , RoBerta based model for sentence similarity

For example:

387,支原体肺炎,支原体肺炎的症状及治疗方法是什么,肺炎衣原体与肺炎支原体有什么区别？,0
388,支原体肺炎,支原体肺炎的症状及治疗方法是什么,肺炎支原体培养及药敏的检验单怎么看？,0
389,支原体肺炎,支原体肺炎的症状及治疗方法是什么,小儿支原体与小儿支原体肺炎相同吗？,0
390,支原体肺炎,宝宝支原体肺炎感染的症状有哪些？,宝宝肺炎支原体感染的症状是什么？,1
391,支原体肺炎,宝宝支原体肺炎感染的症状有哪些？,宝宝支原体肺炎感染有什么症状？,1

95.2 acc online (simply choose the 1st fold, 1/6)

ERNIE 1.0
Nadam with 2.0*1e-5 lr
OHEM CE, with label smoothing
cosine lr scheduler with warmup
clean noise data by an overfitted model

more tricks maybe

simply change the model
add any 'word2vec' features
split into multipiece data,get N bert,
using multiple feature to train a tree based
model, lightGBM, Xgboost...
for those hard example, maybe add the nearest sentence
(pair with label) for reference info, into bert
pseudo label
more open data(e.g ping an CHIP 2019)
...

denpendency

opencv-python
pytorch >= 1.4
pandas
yacs
sklearn

prepare

download the ernie (128 length) model from https://github.com/nghuyong/ERNIE-Pytorch
but using the config in this repo at pretrained/ernie/

train

you maye change the data path, have a look at train.py test.py

export PYTHONPATH=./
sh train_pipeline.sh

ref

https://tianchi.aliyun.com/competition/entrance/231776/introduction?spm=5176.12281949.1003.4.21eb2448atCLQk

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
cfg		cfg
cfgs		cfgs
data		data
dataset		dataset
loss		loss
optim		optim
pretrained		pretrained
tools		tools
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
models.py		models.py
run.sh		run.sh
test.py		test.py
train.py		train.py
train_pipeline.sh		train_pipeline.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

nCoV-2019 related sentence similarity

introduce

95.2 acc online (simply choose the 1st fold, 1/6)

more tricks maybe

denpendency

prepare

train

ref

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

lhwcv/nCoV_sentence_simi

Folders and files

Latest commit

History

Repository files navigation

nCoV-2019 related sentence similarity

introduce

95.2 acc online (simply choose the 1st fold, 1/6)

more tricks maybe

denpendency

prepare

train

ref

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages