Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop #7

Merged
merged 3 commits into from
Nov 7, 2018
Merged

Develop #7

merged 3 commits into from
Nov 7, 2018

Conversation

thomwolf
Copy link
Member

@thomwolf thomwolf commented Nov 7, 2018

Fixing run_squad.py pre-processing bug.

Various clean-ups:

  • the weight initialization was not optimal (tf. truncated_normal_initializer(stddev=0.02) was translated in weight.data.normal_(0.02) instead of weight.data.normal_(mean=0.0, std=0.02) which likely affected the performance of run_classifer.py also.
  • gradient accumulation loss was not averaged over the accumulation steps which would have required to change the hyper-parameters for using accumulation.
  • the evaluation was not done with torch.no_grad() and thus sub-optimal in terms of speed/memory.

@thomwolf thomwolf merged commit 5c0838d into master Nov 7, 2018
@thomwolf thomwolf deleted the develop branch November 7, 2018 22:51
qwang70 pushed a commit to DRL36/pytorch-pretrained-BERT that referenced this pull request Mar 2, 2019
@HongyanJiao HongyanJiao mentioned this pull request Sep 19, 2019
stevezheng23 added a commit to stevezheng23/transformers that referenced this pull request Mar 24, 2020
* Remove `special_tokens_mask` from inputs in README

Co-authored-by: Thomas Wolf @thomwolf

* fix repetition penalty

* soft launch distilroberta

* Add Benchmarks to issue templates

* Benchmarks example script

* Benchmark section added to the documentation

* Fix hanging when loading pretrained models

- Fix hanging when loading pretrained models from the cache without having internet access. This is a widespread issue on supercomputers whose internal compute nodes are firewalled.

* gradient norm clipping should be done right before calling the optimiser

* Fix citation

* gradient norm clipping should be done right before calling the optimiser - fixing run_glue and run_ner as well

* Fix huggingface#1597

* Option to benchmark only one of the two libraries

* Fix architectures count

* [CTRL] warn if generation prompt does not start with a control code

see also salesforce/ctrl#50

* [RELEASE] DistilRoBERTa

* [release] fix table weirdness

* RoBERTa token classification

[WIP] copy paste bert token classification for roberta

* Use roberta model and update doc strings

* Add Roberta to run_ner.py

* Add roberta to doc
LysandreJik added a commit that referenced this pull request Apr 10, 2020
* Initial commit to get BERT + run_glue.py on TPU

* Add README section for TPU and address comments.

* Cleanup TPU bits from run_glue.py (#3)

TPU runner is currently implemented in:
https://github.com/pytorch-tpu/transformers/blob/tpu/examples/run_glue_tpu.py.

We plan to upstream this directly into `huggingface/transformers`
(either `master` or `tpu`) branch once it's been more thoroughly tested.

* Cleanup TPU bits from run_glue.py

TPU runner is currently implemented in:
https://github.com/pytorch-tpu/transformers/blob/tpu/examples/run_glue_tpu.py.

We plan to upstream this directly into `huggingface/transformers`
(either `master` or `tpu`) branch once it's been more thoroughly tested.

* No need to call `xm.mark_step()` explicitly (#4)

Since for gradient accumulation we're accumulating on batches from
`ParallelLoader` instance which on next() marks the step itself.

* Resolve R/W conflicts from multiprocessing (#5)

* Add XLNet in list of models for `run_glue_tpu.py` (#6)

* Add RoBERTa to list of models in TPU GLUE (#7)

* Add RoBERTa and DistilBert to list of models in TPU GLUE (#8)

* Use barriers to reduce duplicate work/resources (#9)

* Shard eval dataset and aggregate eval metrics (#10)

* Shard eval dataset and aggregate eval metrics

Also, instead of calling `eval_loss.item()` every time do summation with
tensors on device.

* Change defaultdict to float

* Reduce the pred, label tensors instead of metrics

As brought up during review some metrics like f1 cannot be aggregated
via averaging. GLUE task metrics depends largely on the dataset, so
instead we sync the prediction and label tensors so that the metrics can
be computed accurately on those instead.

* Only use tb_writer from master (#11)

* Apply huggingface black code formatting

* Style

* Remove `--do_lower_case` as example uses cased

* Add option to specify tensorboard logdir

This is needed for our testing framework which checks regressions
against key metrics writtern by the summary writer.

* Using configuration for `xla_device`

* Prefix TPU specific comments.

* num_cores clarification and namespace eval metrics

* Cache features file under `args.cache_dir`

Instead of under `args.data_dir`. This is needed as our test infra uses
data_dir with a read-only filesystem.

* Rename `run_glue_tpu` to `run_tpu_glue`

Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
rraminen pushed a commit to rraminen/transformers that referenced this pull request Jun 3, 2022
jlamypoirier pushed a commit to jlamypoirier/transformers that referenced this pull request Apr 4, 2023
sim-so added a commit to sim-so/transformers that referenced this pull request Apr 23, 2023
# This is the 1st commit message:

Update docs/source/ko/tasks/summarization.mdx

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>
# This is the commit message huggingface#2:

Update docs/source/ko/tasks/summarization.mdx

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>
# This is the commit message huggingface#3:

Update docs/source/ko/tasks/summarization.mdx

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>
# This is the commit message huggingface#4:

Update docs/source/ko/tasks/summarization.mdx

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>
# This is the commit message huggingface#5:

Update docs/source/ko/tasks/summarization.mdx

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>
# This is the commit message huggingface#6:

Update docs/source/ko/tasks/summarization.mdx

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>
# This is the commit message huggingface#7:

Update docs/source/ko/tasks/summarization.mdx

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>
# This is the commit message huggingface#8:

Update docs/source/ko/tasks/summarization.mdx

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>
# This is the commit message huggingface#9:

Update docs/source/ko/tasks/summarization.mdx

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>
# This is the commit message huggingface#10:

Update docs/source/ko/tasks/summarization.mdx

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>
# This is the commit message huggingface#11:

Update docs/source/ko/tasks/summarization.mdx
jameshennessytempus pushed a commit to jameshennessytempus/transformers that referenced this pull request Jun 1, 2023
younesbelkada pushed a commit to younesbelkada/transformers that referenced this pull request Mar 14, 2024
LysandreJik pushed a commit that referenced this pull request Mar 15, 2024
* Cohere Model Release (#1)

Cohere Model Release

* Remove unnecessary files and code (#2)

Some cleanup

* Delete cohere-model directory (#3)

* Make Fix (#5)

* Pr fixes (#6)

* fixes for pr

* pr fixes for the format

* pr fixes for the format

* src/transformers/models/auto/tokenization_auto.py

* Tokenizer test (#8)

* tokenizer test

* format fix

* Adding Docs and other minor changes (#7)

* Add modeling tests (#9)

* Smol Fix (#11)

* tokenization tests are fixed

* format fixes

* fix pr doc tests

* fix pr doc tests

* fix pr doc tests

* fix pr style check

* small changes in cohere.md

* FIX: Address final comments for transformers integration (#13)

* fix modeling final nits and add proper test file

* for now leave empty tests

* add integration test

* push new test

* fix modeling cohere (#14)

* Update chat templates to use the new API (#15)

---------

Co-authored-by: ahmetustun <ahmetustun89@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
itazap pushed a commit that referenced this pull request May 14, 2024
* Cohere Model Release (#1)

Cohere Model Release

* Remove unnecessary files and code (#2)

Some cleanup

* Delete cohere-model directory (#3)

* Make Fix (#5)

* Pr fixes (#6)

* fixes for pr

* pr fixes for the format

* pr fixes for the format

* src/transformers/models/auto/tokenization_auto.py

* Tokenizer test (#8)

* tokenizer test

* format fix

* Adding Docs and other minor changes (#7)

* Add modeling tests (#9)

* Smol Fix (#11)

* tokenization tests are fixed

* format fixes

* fix pr doc tests

* fix pr doc tests

* fix pr doc tests

* fix pr style check

* small changes in cohere.md

* FIX: Address final comments for transformers integration (#13)

* fix modeling final nits and add proper test file

* for now leave empty tests

* add integration test

* push new test

* fix modeling cohere (#14)

* Update chat templates to use the new API (#15)

---------

Co-authored-by: ahmetustun <ahmetustun89@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
ZYC-ModelCloud pushed a commit to ZYC-ModelCloud/transformers that referenced this pull request Nov 14, 2024
* Use real QuantLinear layers directly when we can

* Use real QuantLinear layers directly when we can

* Use real QuantLinear layers directly when we can
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant