Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where is the "train_tgt_lines.txt"? #19

Open
happycjksh opened this issue Jun 29, 2021 · 4 comments
Open

Where is the "train_tgt_lines.txt"? #19

happycjksh opened this issue Jun 29, 2021 · 4 comments

Comments

@happycjksh
Copy link

Please help me, thank you

@happycjksh
Copy link
Author

In the make_e2e_labedata.py, you load the train_tgt_lines.txt, but I don't find the document anywhere. What should I do about it?

@p0x0q
Copy link

p0x0q commented Jul 21, 2022

@swiseman
Hello! I have the same problem.

I want to create original e2e training data and in the process I found that I need make_e2e_labedata.py.

So I ran it as cd data && python make_e2e_labedata.py "train" and found that the file train_tgt_lines.txt was missing.

https://github.com/harvardnlp/neural-template-gen/blob/master/data/make_e2e_labedata.py#L6-L9

I cannot find the following information about this train_tgt_lines.txt.

  • What are the contents of the file?
  • Is there a train_tgt_lines.txt which is the original of the already existing src_train.txt?

If @swiseman has any info on this, please let me know!

Best regards,

@swiseman
Copy link
Contributor

These are just the reference generations for the training set; each line of train_tgt_lines.txt has the reference generation for the corresponding line in src_train.txt.

@p0x0q
Copy link

p0x0q commented Jul 21, 2022

@swiseman
Thanks for the reply!
I thought about it for a bit, now I understand.

For the benefit of others, here is the information for each file

Example of src_train.txt

__start_name__ The Vaults __end_name__ __start_eatType__ pub __end_eatType__ __start_priceRange__ more than £ 30 __end_priceRange__ __start_customerrating__ 5 out of 5 __end_customerrating__ __start_near__ Café Adriatic __end_near__

Example of train_tgt_lines.txt

The Vaults pub near Café Adriatic has a 5 star rating . Prices start at £ 30 .

Result of running cd data && python make_e2e_labedata.py "train".

The Vaults pub near Café Adriatic has a 5 star rating . Prices start at £ 30 . <eos>|||0,2,0 2,3,1 4,6,6 11,12,7 17,18,7 18,19,8

And the result of running make_e2e_labedata.py can be used as train.txt.

Thanks for the answer! Very helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants