Skip to content

zehantan6970/ReWriterNet

Repository files navigation

UE-Rewriter

Codes for paper UE-Rewriter


Start with UE-rewriting data with BERT Masked Language Model. Only need data.json.zip OR data.json to be present.

python preprocess.py

Rewrite:

python rewriter.py --data_dir part_cleaned_data

To generate hypotheses on rewritten inputs using various benchamark models:

python generate_batch_wise.py --data_dir unseen_from_bert-base-uncased_predicted_by_bert-base-uncased_rewritten_data.txt --model_name blenderbot_small-90M

python generate_batch_wise.py --data_dir unseen_from_bert-base-uncased_predicted_by_bert-base-uncased_rewritten_data.txt --model_name blenderbot-400M-distill

python generate_batch_wise.py --data_dir unseen_from_bert-base-uncased_predicted_by_bert-base-uncased_rewritten_data.txt --model_name blenderbot-1B-distill --eval_batch_size 32

python generate_batch_wise.py --data_dir unseen_from_bert-base-uncased_predicted_by_bert-base-uncased_rewritten_data.txt --rewritten_ids_dir rewritten_ids.pt --model_name DialoGPT-small

python generate_batch_wise.py --data_dir unseen_from_bert-base-uncased_predicted_by_bert-base-uncased_rewritten_data.txt --rewritten_ids_dir rewritten_ids.pt --model_name DialoGPT-medium

python generate_batch_wise.py --data_dir unseen_from_bert-base-uncased_predicted_by_bert-base-uncased_rewritten_data.txt --rewritten_ids_dir rewritten_ids.pt --model_name DialoGPT-large

To generate hypotheses on original inputs using various benchamark models:

python generate_batch_wise.py --data_dir part_cleaned_data.txt --rewritten_ids_dir rewritten_ids.pt --model_name blenderbot_small-90M

python generate_batch_wise.py --data_dir part_cleaned_data.txt --rewritten_ids_dir rewritten_ids.pt --model_name blenderbot-400M-distill

python generate_batch_wise.py --data_dir part_cleaned_data.txt --rewritten_ids_dir rewritten_ids.pt --model_name blenderbot-1B-distill --eval_batch_size 32

python generate_batch_wise.py --data_dir part_cleaned_data.txt --rewritten_ids_dir rewritten_ids.pt --model_name DialoGPT-small

python generate_batch_wise.py --data_dir part_cleaned_data.txt --rewritten_ids_dir rewritten_ids.pt --model_name DialoGPT-medium

python generate_batch_wise.py --data_dir part_cleaned_data.txt --rewritten_ids_dir rewritten_ids.pt --model_name DialoGPT-large


Code execution 5 times 10% of data:

python generate_batch_wise.py --model_name "DialoGPT-small" --debug True

Modify the following to generate an output txt for a given input txt.

python generate_batch_wise.py --model_name "blenderbot_small-90M" --data_dir 'all_data.txt'

evaluate original data via bleu:

python eval.py --debug True

evaluate rewrited data via bleu:

python eval.py --hyp_dir 'blenderbot_small-90M_generate_rewrited.txt' --ref_dir 包含##的rewrited_data --debug True

New eval (bleuT to change the file name back to bleu when using) ··· python metric_evaluate.py -metric [metric_name] -hyp [output_file] -ref [ground truth] ··· metric_name:chrF, rouge, meteor ···


Fine Tuning

Original:

python train.py

Rewritten:

python train.py --data_dir_txt ../data/all_data_punc_rewritten.txt --eod_token '# #'

Generate using fine-tuned model

Original:

python generate_batch_wise.py --data_dir all_data_punc.txt --model_ckpt pytorch_model.bin

Rewritten (using another checkpoint pytorch_model.bin):

python generate_batch_wise.py --data_dir all_data_punc_rewritten.txt --model_ckpt pytorch_model.bin

ReWriterNet

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published