-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preprocessing #3
Comments
Python has an [“anytree”] (https://pypi.org/project/anytree/2.1.4/) . You can try. |
Thanks for your reply! I am still confused about how to get the structural sequence, maybe releasing the preprocessing codes or the preprocessing data is a better way to help people run your model. Meanwhile, there is another question. I trained Transformer baseline model implemented by OpenNMT with same hyperparameter setting as yours on LDC2015E86. When I compute the BLEU score on the BPE embedding prediction I can get a comparable result in Table 3 of your paper (25.5), but after I remove the "@@" in the prediction the BLEU droped a lot. So I am wondering that the BLEU results you reported in Table 3 was computed based on the BPE embedding prediction? Did you remove the "@@" in the final prediction of the model? |
After deleting "@@ ", the BLEU value should not decline, but rise a lot. Are you sure you are doing the right BPE process? It is worth noting that not only "@@" but also a space has been deleted. ( "@@ " ) |
Thanks for your reply! I follow the author's instruction to delete "@@ " ( |
What I mean is that the source and target segment needs to do BPE during training, and the target segment does not need to do BPE during testing. |
Thanks for your patient reply! I am still a little confused. So you only apply the BPE on the training set, and do not apply the BPE on the test set, is that right? Or you also apply the BPE on the source side of test set but do not apply BPE on the target side of test set? |
yes. During the test, only the source side needs to do BPE, and then test BLEU after deleting @@. |
Get it :). I will have a try, thanks! |
Sorry for bothering again, what is the
|
On LDC2015E86 10000 |
So you follow the instructions in BEST PRACTICE ADVICE FOR BYTE PAIR ENCODING IN NMT , right? If it is, the |
You only need to use these two commands. subword-nmt learn-bpe -s {num_operations} < {train_file} > {codes_file}
subword-nmt apply-bpe -c {codes_file} < {test_file} > {out_file}
On 09/26/2019 22:15, Will wrote:
train_source+train_target
So you follow this instruction, right?
If it is, the --vocabulary-threshold you still keep 50?
—You are receiving this because you commented.Reply to this email directly, view it on GitHub, or mute the thread.
[
{
"@context": "http://schema.org",
"@type": "EmailMessage",
"potentialAction": {
"@type": "ViewAction",
"target": "#3?email_source=notifications\u0026email_token=AJC27CN6BB7MCH3VJZTQXPTQLS7YFA5CNFSM4IX6GGS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7VW7OY#issuecomment-535523259",
"url": "#3?email_source=notifications\u0026email_token=AJC27CN6BB7MCH3VJZTQXPTQLS7YFA5CNFSM4IX6GGS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7VW7OY#issuecomment-535523259",
"name": "View Issue"
},
"description": "View this Issue on GitHub",
"publisher": {
"@type": "Organization",
"name": "GitHub",
"url": "https://github.com"
}
}
]
|
@Amazing-J Hi! I have the same question regarding generating structural sequences. Can you provide more insight on how to use [“anytree”] (https://pypi.org/project/anytree/2.1.4/) to get corpus_sample/all_path_corpus and corpus_sample/five_path_corpus? Any example preprocessing code will be much appreciated! |
Hi,
Could you please release the preprocessing codes for generating the structural sequence and the commands for applying bpe? i.e., how to get the files in corpus_sample/all_path_corpus and corpus_sample/five_path_corpus.
Thanks.
The text was updated successfully, but these errors were encountered: