Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing results for IWSLT En-De #6

Open
neerajgangwar opened this issue Jun 19, 2023 · 3 comments
Open

Reproducing results for IWSLT En-De #6

neerajgangwar opened this issue Jun 19, 2023 · 3 comments

Comments

@neerajgangwar
Copy link

Hi,

I am trying to reproduce the results on IWSLT En-De. I followed the instructions mentioned in the README file but was not able to achieve the BLEU score mentioned in the paper. To run the code, I made some fixes:
Changes: https://github.com/neerajgangwar/tree_transformer/tree/fixes
Diff: #5

It would be great if you could help me reproduce the results mentioned in the paper.

Thank you!

@neerajgangwar neerajgangwar changed the title Reproduce results for IWSLT En-De Reproducing results for IWSLT En-De Jun 19, 2023
@nxphi47
Copy link
Owner

nxphi47 commented Jun 20, 2023

Hi,

Thank you for your interest in the paper. There're few possible reasons.

  1. Many dependencies such as bleu calculation (which is not sacrebleu but a bleu with special tokenization and post-processing developed by google t2t back then to be consistent with Vaswani et al 2017), fairseq have changed significantly since then. The most discrepancy can be from different bleu implementation.
  2. Many dependencies are also gone / deprecated.
  3. Perhaps the experiments were not done correctly, pay attention to batch size, gradient accumulation and number of gpus. Generally, we should mimic 128-GPU setup (8*16) with at least 2048 tokens batch size. Generally should be as high as possible. Training longer and apply checkpoint averaging will help.
  4. This repo is also not maintained (my fault, I'm sorry for that), and I lost many details.

For correct experiments, I expect to achieve performance at least 1 BLEU higher than the transformer baseline, when compared with consistent BLEU implementation. I suggests you simply reimplement the model-part of the codebase here in your own training and eval pipeline with latest SacreBLEU implementation for your work. If you observe lost divergence, significantly lower performance, then likely there is a bug in the code or incorrect setup.

Hope this helps. Sorry for the inconvenience.

@neerajgangwar
Copy link
Author

Hi @nxphi47,

Thank you for your response. I am running the code in this repo and am using the instructions provided in the README file. The only modifications I have made are to fix the issues where the code breaks. But I could not reproduce the results for IWSLT En-De. Let me recheck the settings and see if the above-mentioned points are taken care of. I will get back to you with the results.

Thanks again for your suggestions. Appreciate the quick response!

@neerajgangwar
Copy link
Author

Hi @nxphi47,

I exported the model provided in this repository and ran it with fairseq v0.12.3. I used the model dwnstack_merge2seq_node_iwslt_onvalue_base_upmean_mean_mlesubenc_allcross_hier as mentioned in the README file. The code is present here.

For training, I used a batch size of 4096 tokens and ran the training for 61000 steps with other parameters as mentioned in the README file and kept the parameters the same between Transformer and Tree Transformer. I also tried five different seeds to ensure the random initialization was not an issue. Evaluating the best checkpoint resulted in a BLEU score of $27.892 \pm 0.060$ with Transformer and $27.502 \pm 0.104$ with Tree Transformer. I also tried averaging the last 10 checkpoints. It resulted in a BLEU score of $28.528 \pm 0.064$ with Transformer and $28.088 \pm 0.123$ with Tree Transformer.

I am using the config mentioned in the README file, and I am not sure if I am missing any other configuration you used for the results in the paper. Any suggestions or input will be appreciated.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants