Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DailyDialogue dataset #13

Open
Rabona17 opened this issue Jul 29, 2021 · 4 comments
Open

DailyDialogue dataset #13

Rabona17 opened this issue Jul 29, 2021 · 4 comments

Comments

@Rabona17
Copy link

Where can I get the preprocessed dailydialog dataset used for spacefusion pretraining code? Any suggestion on how to preprocess the original dailydialog would be appreciated! Thanks

@ChunyuanLI
Copy link
Owner

I don't have the spacefusion pre-training code. On dailydialog dataset, we keep the history of a fixed sequence length. We tried to follow the original paper setting:

https://github.com/golsun/SpaceFusion

@Rabona17
Copy link
Author

Thanks, so where can I get the daily dialog dataset you used in run_dialog_spacefusion.sh (../data/datasets/dailydialog_data/train.txt)? Or should I preprocess it myself?

@ChunyuanLI
Copy link
Owner

I'm afraid you have to pre-process it on your own.

@Rabona17
Copy link
Author

Sure, so for DailyDialog, since spacefusion doesn't provide any preprocessing code for the dataset, what criteria did you use for src and trgt, or what procedure did you use to split the original dailydialog in to src and trgt? Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants