Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expectation for data format in seq2seq_attention_copy #98

Open
shwetha-97 opened this issue Dec 3, 2023 · 0 comments
Open

Expectation for data format in seq2seq_attention_copy #98

shwetha-97 opened this issue Dec 3, 2023 · 0 comments

Comments

@shwetha-97
Copy link

shwetha-97 commented Dec 3, 2023

In the README for the seq2seq_attention_copy method, I was unable to understand what is the difference between the data in the folders data/datasets/data and data/datasets/data_radn_split

It is mentioned that we have to put the original data in these folders.

It seems to me that the folders data and data_randn_split have different data, else the experiments in attn_copying_tune_data_radn_split.yaml and attn_copying_tune_data.yaml would be equivalent. But how are they different? Is the original data in the spider dataset being split randomly into these 2 folders? If so, in what ratio should the split be - 50:50 or some other ratio?

As I understand from here, should the folders data and data_randn_split have their own train, dev and test json? What is the reason for having these 2 folders or 2 different kinds of data?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant