Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MRPC hyperparameters question #5

Closed
ethanjperez opened this issue Nov 6, 2018 · 5 comments
Closed

MRPC hyperparameters question #5

ethanjperez opened this issue Nov 6, 2018 · 5 comments

Comments

@ethanjperez
Copy link
Contributor

ethanjperez commented Nov 6, 2018

When describing how you reproduced the MRPC results, you say:
"Our test ran on a few seeds with the original implementation hyper-parameters gave evaluation results between 82 and 87."
and you link to the SQuAD hyperparameters (https://github.com/google-research/bert#squad).

Is the link a mistake? Or did you use the SQuAD hyperparameters for tuning on MRPC? More generally, I'm wondering if there's a reason the MRPC dev set accuracy is slightly lower (in [82, 87] vs. [84, 88] reported by Google)

@thomwolf
Copy link
Member

thomwolf commented Nov 6, 2018

Hi Ethan,
Thanks we used the MRPC hyper-parameters indeed, I corrected the README.
Regarding the dev set accuracy, I am not really surprised there is a slightly lower accuracy with the PyTorch version (even though the variance is high so it's hard to get something significant). That is something that is generally observed (see for example the work of Remi Cadene) and we also experienced that with our TF->PT port of the OpenAI GPT model.
My personal feeling is that there are slight differences in the way the backends of TensorFlow and PyTorch handle the operations and these differences make the pre-trained weights sub-optimal for PyTorch.

@ethanjperez
Copy link
Contributor Author

Great, thanks for clarifying that. Regarding the slightly lower accuracy, that makes sense. Thanks for your help and for releasing this!

@ethanjperez
Copy link
Contributor Author

ethanjperez commented Nov 6, 2018

Maybe it would help to train the Tensorflow pre-trained weights for e.g. one epoch in PyTorch (using the MLM and next-sentence objective)? That may help transfer to other tasks, depending on what the issue is

@thomwolf
Copy link
Member

thomwolf commented Nov 7, 2018

Hi @ethanjperez, actually the weight initialization fix (tf. truncated_normal_initializer(stddev=0.02) was translated in weight.data.normal_(0.02) instead of weight.data.normal_(mean=0.0, std=0.02) fixed in 2a97fe2) has brought us back to the TensorFlow results on MRPC (between 84 and 88%).
I am closing this issue.

@thomwolf thomwolf closed this as completed Nov 7, 2018
@ethanjperez
Copy link
Contributor Author

@thomwolf Great to hear - thanks for working to fix it!

stevezheng23 added a commit to stevezheng23/transformers that referenced this issue Mar 24, 2020
* add default quac runner based on the latest squad runner

* add default quac runner based on the latest squad runner (cont.)

* add default quac runner based on the latest squad runner (cont.)

* add default quac runner based on the latest squad runner (cont.)

* add default quac runner based on the latest squad runner (cont.)

* add default quac runner based on the latest squad runner (cont.)

* update data pipeline for quac runner

* update data pipeline for quac runner (cont.)

* update data pipeline for quac runner (cont.)

* update data pipeline for quac runner (cont.)

* update data pipeline for quac runner (cont.)

* update data pipeline for quac runner (cont.)

* update data pipeline for quac runner (cont.)

* update predict output & evaluation generation for quac runner

* update predict output & evaluation generation for quac runner (cont.)
LysandreJik added a commit that referenced this issue Apr 10, 2020
* Initial commit to get BERT + run_glue.py on TPU

* Add README section for TPU and address comments.

* Cleanup TPU bits from run_glue.py (#3)

TPU runner is currently implemented in:
https://github.com/pytorch-tpu/transformers/blob/tpu/examples/run_glue_tpu.py.

We plan to upstream this directly into `huggingface/transformers`
(either `master` or `tpu`) branch once it's been more thoroughly tested.

* Cleanup TPU bits from run_glue.py

TPU runner is currently implemented in:
https://github.com/pytorch-tpu/transformers/blob/tpu/examples/run_glue_tpu.py.

We plan to upstream this directly into `huggingface/transformers`
(either `master` or `tpu`) branch once it's been more thoroughly tested.

* No need to call `xm.mark_step()` explicitly (#4)

Since for gradient accumulation we're accumulating on batches from
`ParallelLoader` instance which on next() marks the step itself.

* Resolve R/W conflicts from multiprocessing (#5)

* Add XLNet in list of models for `run_glue_tpu.py` (#6)

* Add RoBERTa to list of models in TPU GLUE (#7)

* Add RoBERTa and DistilBert to list of models in TPU GLUE (#8)

* Use barriers to reduce duplicate work/resources (#9)

* Shard eval dataset and aggregate eval metrics (#10)

* Shard eval dataset and aggregate eval metrics

Also, instead of calling `eval_loss.item()` every time do summation with
tensors on device.

* Change defaultdict to float

* Reduce the pred, label tensors instead of metrics

As brought up during review some metrics like f1 cannot be aggregated
via averaging. GLUE task metrics depends largely on the dataset, so
instead we sync the prediction and label tensors so that the metrics can
be computed accurately on those instead.

* Only use tb_writer from master (#11)

* Apply huggingface black code formatting

* Style

* Remove `--do_lower_case` as example uses cased

* Add option to specify tensorboard logdir

This is needed for our testing framework which checks regressions
against key metrics writtern by the summary writer.

* Using configuration for `xla_device`

* Prefix TPU specific comments.

* num_cores clarification and namespace eval metrics

* Cache features file under `args.cache_dir`

Instead of under `args.data_dir`. This is needed as our test infra uses
data_dir with a read-only filesystem.

* Rename `run_glue_tpu` to `run_tpu_glue`

Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
wamartin-aml pushed a commit to wamartin-aml/transformers that referenced this issue Nov 1, 2021
rraminen pushed a commit to rraminen/transformers that referenced this issue Jun 3, 2022
sim-so added a commit to sim-so/transformers that referenced this issue Apr 23, 2023
# This is the 1st commit message:

Update docs/source/ko/tasks/summarization.mdx

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>
# This is the commit message huggingface#2:

Update docs/source/ko/tasks/summarization.mdx

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>
# This is the commit message huggingface#3:

Update docs/source/ko/tasks/summarization.mdx

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>
# This is the commit message huggingface#4:

Update docs/source/ko/tasks/summarization.mdx

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>
# This is the commit message huggingface#5:

Update docs/source/ko/tasks/summarization.mdx

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>
# This is the commit message huggingface#6:

Update docs/source/ko/tasks/summarization.mdx

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>
# This is the commit message huggingface#7:

Update docs/source/ko/tasks/summarization.mdx

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>
# This is the commit message huggingface#8:

Update docs/source/ko/tasks/summarization.mdx

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>
# This is the commit message huggingface#9:

Update docs/source/ko/tasks/summarization.mdx

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>
# This is the commit message huggingface#10:

Update docs/source/ko/tasks/summarization.mdx

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>
# This is the commit message huggingface#11:

Update docs/source/ko/tasks/summarization.mdx
jameshennessytempus pushed a commit to jameshennessytempus/transformers that referenced this issue Jun 1, 2023
nikolaJovisic added a commit to nikolaJovisic/transformers that referenced this issue Aug 22, 2023
nikolaJovisic added a commit to nikolaJovisic/transformers that referenced this issue Aug 22, 2023
This reverts commit 15b5160.
nikolaJovisic added a commit to nikolaJovisic/transformers that referenced this issue Aug 23, 2023
nikolaJovisic added a commit to nikolaJovisic/transformers that referenced this issue Aug 23, 2023
fix binary classification for tensorflow segformer

fix binary classification for tf segformer huggingface#2

fix huggingface#5

Revert "fix huggingface#5"

This reverts commit 15b5160.

Revert "fix huggingface#4"

This reverts commit 0b534e6.

fix huggingface#5

fix

fix

fix
LysandreJik pushed a commit that referenced this issue Mar 15, 2024
* Cohere Model Release (#1)

Cohere Model Release

* Remove unnecessary files and code (#2)

Some cleanup

* Delete cohere-model directory (#3)

* Make Fix (#5)

* Pr fixes (#6)

* fixes for pr

* pr fixes for the format

* pr fixes for the format

* src/transformers/models/auto/tokenization_auto.py

* Tokenizer test (#8)

* tokenizer test

* format fix

* Adding Docs and other minor changes (#7)

* Add modeling tests (#9)

* Smol Fix (#11)

* tokenization tests are fixed

* format fixes

* fix pr doc tests

* fix pr doc tests

* fix pr doc tests

* fix pr style check

* small changes in cohere.md

* FIX: Address final comments for transformers integration (#13)

* fix modeling final nits and add proper test file

* for now leave empty tests

* add integration test

* push new test

* fix modeling cohere (#14)

* Update chat templates to use the new API (#15)

---------

Co-authored-by: ahmetustun <ahmetustun89@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
LysandreJik pushed a commit to LysandreJik/transformers that referenced this issue Apr 10, 2024
LysandreJik pushed a commit that referenced this issue Apr 24, 2024
Add message passing format

Co-authored-by: Cyril Kondratenko <kkn1993@gmail.com>
itazap pushed a commit that referenced this issue May 14, 2024
* Cohere Model Release (#1)

Cohere Model Release

* Remove unnecessary files and code (#2)

Some cleanup

* Delete cohere-model directory (#3)

* Make Fix (#5)

* Pr fixes (#6)

* fixes for pr

* pr fixes for the format

* pr fixes for the format

* src/transformers/models/auto/tokenization_auto.py

* Tokenizer test (#8)

* tokenizer test

* format fix

* Adding Docs and other minor changes (#7)

* Add modeling tests (#9)

* Smol Fix (#11)

* tokenization tests are fixed

* format fixes

* fix pr doc tests

* fix pr doc tests

* fix pr doc tests

* fix pr style check

* small changes in cohere.md

* FIX: Address final comments for transformers integration (#13)

* fix modeling final nits and add proper test file

* for now leave empty tests

* add integration test

* push new test

* fix modeling cohere (#14)

* Update chat templates to use the new API (#15)

---------

Co-authored-by: ahmetustun <ahmetustun89@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
ZYC-ModelCloud pushed a commit to ZYC-ModelCloud/transformers that referenced this issue Nov 14, 2024
ZYC-ModelCloud pushed a commit to ZYC-ModelCloud/transformers that referenced this issue Nov 14, 2024
* Fix desc_act check and refractor marlin check code.

* clean
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants