Compute true loss Flax examples #18458

duongna21 · 2022-08-03T15:37:39Z

What does this PR do?

'True' losses should be computed in Flax examples, as discussed with @sanchit-gandhi.

Who can review?

HuggingFaceDocBuilderDev · 2022-08-03T15:51:47Z

The documentation is not available anymore as the PR was closed or merged.

sanchit-gandhi

Hey @duongna21, thanks for the PR - the 'true loss' implementation looks clean! There appear to be a few unintentional changes that have crept into the PR (deletion of lines, changes to drop_last_batch. It might simply be a case of rebasing onto main to fix these:

git fetch upstream
git rebase upstream/main

sanchit-gandhi · 2022-08-08T16:01:47Z

examples/flax/language-modeling/run_bart_dlm_flax.py

-                eval_batch_idx = generate_batch_splits(eval_samples_idx, eval_batch_size, drop_last=False)
+                eval_batch_idx = generate_batch_splits(eval_samples_idx, eval_batch_size)


Unintentional change?

@sanchit-gandhi I noticed that with drop_last=False the eval loss will become nan at the beginning of training, but eval accuracy is still on track. It appears to occur with both run_bart_dlm_flax and run_summarization so I temporarily turned it off. It would be great if you could take a look and fix it.

Interesting! What immediately jumps out to me is that num_labels is 0, causing the 'true loss' to be nan. You didn't get this behaviour previously with the pmap operation? What eval batch size are you using?

This bug occurs without dividing loss by num_labels actually. I took a quick look at the eval loss computed on every token at each step, the last losses was nan.

python run_summarization_flax.py \ --output_dir ./bart-base-xsum \ --model_name_or_path facebook/bart-base \ --tokenizer_name facebook/bart-base \ --dataset_name="xsum" \ --do_train --do_eval --do_predict --predict_with_generate \ --num_train_epochs 6 \ --learning_rate 5e-5 --warmup_steps 0 \ --per_device_train_batch_size 64 \ --per_device_eval_batch_size 64 \ --overwrite_output_dir \ --max_source_length 512 --max_target_length 64 \ --push_to_hub

Printed tensor has a shape of (8, 64, 64), provided that I have 8 TPU cores, per_device_eval_batch_size=64 and max_target_length=64.

Oh no! That's interesting to see. Is this an artefact of using the psum? As in, were the losses nan when we used a pmap previously? If so, we'll need to address this!

The losses were nan right after loss = optax.softmax_cross_entropy(logits, soft_labels). Great if you could have a look at this issue!

Anyway, this problem doesn't seem to be related to this PR. Flax examples are young so we'll improve it step by step :)

Hi @duongna21, just wondering what the status is on the nan issue being discussed here. I'm running into this issue while using the recently added run_bart_dlm_flax.py script but am very new to Flax/Jax so haven't been able to really make sense of it yet. Is run_bart_dlm_flax.py useable for model pre-training in its current state? Just as a side note, I haven't seen the nan issue when running run_t5_mlm_flax.py on my training data. Thanks in advance for any clarification!

@tannonk I'm sorry for the late reply. I believe this error is related to the drop_last=False option. The training will be fine if you set drop_last=True, at the cost that a few examples in the last batch will be skipped. Nice to see anyone is able to work on this weird bug.

sanchit-gandhi · 2022-08-08T16:02:02Z

examples/flax/language-modeling/run_bart_dlm_flax.py

-        eval_batch_idx = generate_batch_splits(eval_samples_idx, eval_batch_size, drop_last=False)
+        eval_batch_idx = generate_batch_splits(eval_samples_idx, eval_batch_size)


Unintentional change?

Same here! Is drop_last causing issues with the eval loss?

@sanchit-gandhi Yes!

examples/flax/language-modeling/run_bart_dlm_flax.py

* Fix DocumentQuestionAnsweringPipelineTests Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

…face#18814) * add gpt-neox-japanese model and tokenizer as new model * Correction to PR's comment for GPT NeoX Japanese - Fix to be able to use gpu - Add comment # Copied... at the top of RotaryEmbedding - Implement nn.Linear instead of original linear class - Add generation test under @slow * fix bias treatment for gpt-neox-japanese * Modidy gpt-neox-japanese following PR - add doc for bias_dropout_add - style change following a PR comment * add document for gpt-neox-japanese * remove unused import from gpt-neox-japanese * fix README for gpt-neox-japanese

…uggingface#19030)

…9037)

…True` (huggingface#19036)

* Automate check for new pipelines and metadata update * Add Datasets to quality extra

…ce#19001) * Fix a broken link for deepspeed ZeRO inference * fix link Co-authored-by: Stas Bekman <stas@stason.org>

correct the import statement

* Small replacement - replace `modules_to_not_convert` by `module_to_not_convert` * refactor a bit - changed variables name - now output a list - change error message * make style * add list * make style * change args name Co-authored-by: stas00 <stas00@users.noreply.github.com> * fix comment * fix typo Co-authored-by: stas00 <stas00@users.noreply.github.com> * Update src/transformers/modeling_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: stas00 <stas00@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Updated test values The image segmentation pipeline tests - tests/pipelines/test_pipelines_image_segmentation.py - were failing after the merging of huggingface#1849 (49e44b2). This was due to the difference in rescaling. Previously the images were rescaled by `image = image / 255`. In the new commit, a `rescale` method was added, and images rescaled using `image = image * scale`. This was known to cause small differences in the processed images (see [PR comment](huggingface#18499 (comment))). Testing locally, changing the `rescale` method to divide by a scale factor (255) resulted in the tests passing. It was therefore decided the test values could be updated, as there was no logic difference between the commits. * Use double quotes, like previous example * Fix up

* Fix test_save_load for TFViTMAEModelTest Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

…face#19034) * Override save() to use the serving signature as the default * Replace int32 with int64 in all our serving signatures * Remember one very important line so as not to break every test at once * Dtype fix for TFLED * dtype fix for shift_tokens_right in general * Dtype fixes in mBART and RAG * Fix dtypes for test_unpack_inputs * More dtype fixes * Yet more mBART + RAG dtype fixes * Yet more mBART + RAG dtype fixes * Add a check that the model actually has a serving method

* init PR * optimize top p and add edge case * styling * style * revert tf and flax test * add edge case test for FLAX and TF * update doc with smallest set sampling for top p * make style

* Fixing OPT fast tokenizer option. * Remove dependency on `pt`. * Move it to GPT2 tokenization tests. * Added a few tests.

* Fix CI for custom tokenizers * Add nightly tests * Run CI, run! * Fix paths * Typos * Fix test

* Enable torchdynamo tests * make style Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

…face#18843)

…uggingface#18702) * Adds package and requirement spec output to version check exception It's difficult to understand what package is affected when `got_ver` here comes back None, so output the requirement and the package. The requirement probably contains the package but let's output both for good measure. Non-exhaustive references for this problem aside from my own encounter: * https://stackoverflow.com/questions/70151167/valueerror-got-ver-is-none-when-importing-tensorflow * https://discuss.huggingface.co/t/valueerror-got-ver-is-none/17465 * UKPLab/sentence-transformers#1186 * huggingface#13356 I speculate that the root of the error comes from a conflict of conda-managed and pip-managed Python packages but I've not yet proven this. * Combines version presence check and streamlines exception message See also: huggingface#18702 (comment) Co-authored-by: Stas Bekman <stas@stason.org>

- set `use_cache` to `True` for consistency with other `transformers` models

* Support for ConvNext * Support for Wav2Vec2 * Support for Resnet * Fix small issue in test_modeling_convnext

…P16 input (huggingface#18746) * Adding cast to fp32 in convnext layernorm to prevent rounding errors in the case of fp16 input * Trigger CI

* Remove dependency of Roberta in Blenderbot * Move Copied from statements to each method of the Roberta classes * Remove copied from line for mask_token.setter * update output from example in docs

The sequence_masked variable is actually the part of the sequence that is kept unmasked for the encoder. This commit renames the variable.

* Add `OPTForQuestionAnswering` - added `OPTForQuestionAnswering` class based on `BloomForQuestionAnswering` - added `OPTForQuestionAnswering` in common tests - all common tests pass - make fixup done * added docstrings for OPTForQuestionAnswering * Fix docstrings for OPTForQuestionAnswering

sanchit-gandhi

Hey @duongna21! Sorry for the late review here. In general this looks good to me! Two things from my review:

Could you rebase onto main? This should update all the jax.tree_map calls to jax.tree_util.tree_map
Is using drop_last=False giving nan for the eval loss? If so, we can dig into this on a separate issue. Ideally we should keep the changes for this PR related to the 'true' Flax loss. We can open a separate issue/PR for any other side issues!

sanchit-gandhi · 2022-10-10T13:42:07Z

examples/flax/language-modeling/run_bart_dlm_flax.py

-                eval_batch_idx = generate_batch_splits(eval_samples_idx, eval_batch_size, drop_last=False)
+                eval_batch_idx = generate_batch_splits(eval_samples_idx, eval_batch_size)


It's drop_last=False which is causing the eval loss to be nan?

@sanchit-gandhi Thanks for the review. Rebased.

sanchit-gandhi · 2022-10-10T13:42:59Z

examples/flax/language-modeling/run_bart_dlm_flax.py

-        eval_batch_idx = generate_batch_splits(eval_samples_idx, eval_batch_size, drop_last=False)
+        eval_batch_idx = generate_batch_splits(eval_samples_idx, eval_batch_size)


Same here! Is drop_last causing issues with the eval loss?

* simplify loop * add featur extractor * add model * start conversion * add dropout * initial commit of test files * copnversion for all models * update processor for correct padding * update feature extraction * update integration test logits match * fmnt: off for the logits * on the fly mel bank * small nit * update test * update tokenizer * nit feature extraction * update * update tokenizer test * adds logit processor and update tokenizer to get supress tokens * style * clean convert * revert to original modeling tf utils * Update * update * nit * clean convert file * update tests and nits * quality * slow generation test * ffn_dim to allow customization * update readme * add to toctreee * start fixing integration tests * update tests and code * fix feature extractor * fix config tests common * update code to fix tests * fix feature exctractor * nit feature extraction * update test for new feature extractor * style * add absrtact * large logits wioth custom decoder input ids * wraap around is otrch available * fix feature extractor * correct logits for whisper small.en * nit * fix encoder_attentino_mask * some fixes * remove unnecessary inputs * nits * add normalizer file * update etst tokenization * fix attention mask not defined * fix generate * remove uncoder attention mask useless * update test modeling whisper * update condfig to add second non supress tokens * nits on feature exrtactor * nit for test tokenizers * update etsts * update tests * update tokenization test * fixup * invalidated hf token. Clean convert openai to whisper * fix logit tests * fixup * Add model to README * Fix doc tests * clean merge * revert toc_tree changes * remove useless LogitProcessor * Update whisper .mdx * update config file doc * update configuration docstring * update test tokenization * update test tokenization * update tokenization whisper Added copied from where needed * update feature extraction * nit test name * style * quality * remove get suppress tokens and update non_speech tokens global variables * Update src/transformers/models/whisper/feature_extraction_whisper.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * clean modeling whisper and test Removed the attention mask arguments that are deprecated * fix large test * Add multilingual audio test, and translate test * style * fix larg multilingual test * nits * add copied from for attention layer * remove attention masks in doc * add english normalizer * Update docs/source/en/model_doc/whisper.mdx Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * update tokenization test * remove copied from in whisper attention : no bias in k_proj only * wrap around dependencies in english normalizer * style * correct import generation logits * for now, wrap feature extractor with torch * remove torch depencies for feature extraction and style * Update src/transformers/models/whisper/convert_openai_whisper_to_tfms.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/whisper/configuration_whisper.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update docs/source/en/model_doc/whisper.mdx Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * fixup * nit * update logitds * style * nit * nits and fix final tests * add `is_more_itertools_available` to utils * quality * add begin supress tokens, supress tokens to generate args and config * clean supressTokensLogitProcessor in generation logits * Nit naming * add supressTokensAtBegin * udpate tests, supress tokens to None or correct values * nit and style * update RAG to fit test and generate_logit * add copy pasted statment on english normalizer * add arguments to config_common_kwargs * Update src/transformers/generation_utils.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/generation_logits_process.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * revert changes based on reviews * update doc and nits * Update src/transformers/models/whisper/configuration_whisper.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * more nits * last nits * update test configuration common * add BART name in decoder attention mask documentation * Update src/transformers/models/whisper/modeling_whisper.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * style * nit * nit * add english.json file to git * nits on documentation * nit * nits * last styling * add main toctree file * remove sentence piece dependency * clean init file * fix tokenizer that has no dependencies on sentencepiece * update whisper init file, nit * remove english.json file * add get decoder prompt id * All weights loading * Remove hanging pdb * Fixup and tidy up * Use same copied from as PT model * Remove whitespace changes * Remove torch references * Tie embeddings * Remove logits processor input to generate * Update logit values * revert changes and add forced logit processor * nit * clean normalizer * remove protected * Add logit processors and update generation code & tests * Some tidy up * Update docstring * update * update based on review * Update src/transformers/models/whisper/configuration_whisper.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/whisper/configuration_whisper.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update to reflect changes on the PT model branch * Tidy up * Remove extra whitespace * Fix test - make input ids small enough we can append * Include upstream changes on main * PR comments - add batch tests, remove comments & defaults * Fix model output imports * Update src/transformers/models/whisper/modeling_tf_whisper.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation_tf_logits_process.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/models/whisper/modeling_tf_whisper.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/models/whisper/modeling_tf_whisper.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update tests/models/whisper/test_modeling_tf_whisper.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/models/whisper/modeling_tf_whisper.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/models/whisper/modeling_tf_whisper.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update docstring example * Update src/transformers/models/whisper/modeling_tf_whisper.py Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * Remove changes to adjust_logits_during_generation function * Update src/transformers/models/whisper/modeling_tf_whisper.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Tidy up imports that don't require TF * Update tests - skip and no more skip * Update tests/generation/test_generation_tf_logits_process.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/models/whisper/modeling_tf_whisper.py * Update src/transformers/models/whisper/modeling_tf_whisper.py Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * Add training flags * Add (skipped) XLA generation tests * Add embedding correctness test * Add constant ids for generation tests * Make logits finding a bit tidier * Remove unused args * xla generation enabled * Don't skip XLA tests anymore * Fix tests - add position ids to expected signature and update rag generation * Undo method reorder * Remove added whitespace * Remove copy-paste gradient checkopint ref * Remove * Trigger CI - (issue with refs when pulling) Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: NielsRogge <niels.rogge1@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> Co-authored-by: Joao Gante <joao@huggingface.co>

* fix conflicts * start translating * proof check * add toc * fix errors and typos

The momentum value for PyTorch and TensorFlow batch normalization layers is not equivalent. The TensorFlow value should be (1 - pytorch_momentum) in order to ensure the correct updates are applied to the running mean and running variance calculations. We wouldn't observe a difference loading a pretrained model and performing inference, but evaluation outputs would change after some training steps.

…ngface#19376)

* Fixed a non-working hyperlink in the README.md file The hyperlink to the community notebooks was outdated. * Fixing missing double slash in hyperlink

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

patrickvonplaten · 2022-10-11T18:29:14Z

@duongna21 super sorry it seems like the git commit history got messed up :-/ Any chance you could re-submit your PR?

duongna21 changed the title ~~Compute true loss flax examples~~ Compute true loss Flax examples Aug 3, 2022

sanchit-gandhi reviewed Aug 8, 2022

View reviewed changes

ydshieh and others added 27 commits September 14, 2022 16:13

Fix DocumentQuestionAnsweringPipelineTests (huggingface#19023)

6a9726e

* Fix DocumentQuestionAnsweringPipelineTests Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Mark right save_load test as slow (huggingface#19031)

4eb36f2

Fix GPT-NeoX doc examples (huggingface#19033)

693ba2c

TF: tf.debugging assertions without tf.running_eagerly() protection (h…

31be02f

…uggingface#19030)

Add safeguards for CUDA kernel load in Deformable DETR (huggingface#1…

0e24548

…9037)

fix GPT2 token's special_tokens_mask when used with `add_bos_token=…

0efbb6e

…True` (huggingface#19036)

Automate check for new pipelines and metadata update (huggingface#19029)

3774010

* Automate check for new pipelines and metadata update * Add Datasets to quality extra

Dev version

16913b3

Fix a broken link for deepspeed ZeRO inference in the docs (huggingfa…

abca174

…ce#19001) * Fix a broken link for deepspeed ZeRO inference * fix link Co-authored-by: Stas Bekman <stas@stason.org>

[doc] debug: fix import (huggingface#19042)

8edf196

correct the import statement

Fix test_save_load for TFViTMAEModelTest (huggingface#19040)

0a42b61

* Fix test_save_load for TFViTMAEModelTest Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Pin minimum PyTorch version for BLOOM ONNX export (huggingface#19046)

9b80a0b

Move cache: expand error message (huggingface#19051)

2700ba6

🚨🚨🚨 Optimize Top P Sampler and fix edge case (huggingface#18984)

578e18e

* init PR * optimize top p and add edge case * styling * style * revert tf and flax test * add edge case test for FLAX and TF * update doc with smallest set sampling for top p * make style

Fixing OPT fast tokenizer option. (huggingface#18753)

68bb33d

* Fixing OPT fast tokenizer option. * Remove dependency on `pt`. * Move it to GPT2 tokenization tests. * Added a few tests.

Fix custom tokenizers test (huggingface#19052)

f7ce4f1

* Fix CI for custom tokenizers * Add nightly tests * Run CI, run! * Fix paths * Typos * Fix test

Run torchdynamo tests (huggingface#19056)

16242e1

* Enable torchdynamo tests * make style Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

fix arg name in BLOOM testing and remove unused arg document (hugging…

f3d3863

…face#18843)

fix use_cache (huggingface#19060)

c8e40d6

- set `use_cache` to `True` for consistency with other `transformers` models

FX support for ConvNext, Wav2Vec2 and ResNet (huggingface#19053)

c603c80

* Support for ConvNext * Support for Wav2Vec2 * Support for Resnet * Fix small issue in test_modeling_convnext

[doc] Fix link in PreTrainedModel documentation (huggingface#19065)

532ca05

Add FP32 cast in ConvNext LayerNorm to prevent rounding errors with F…

d63bdf7

…P16 input (huggingface#18746) * Adding cast to fp32 in convnext layernorm to prevent rounding errors in the case of fp16 input * Trigger CI

rchan26 and others added 3 commits October 10, 2022 09:25

Remove dependency of Roberta in Blenderbot (huggingface#19411)

4824741

* Remove dependency of Roberta in Blenderbot * Move Copied from statements to each method of the Roberta classes * Remove copied from line for mask_token.setter * update output from example in docs

fix: renamed variable name (huggingface#18850)

ba71bf4

The sequence_masked variable is actually the part of the sequence that is kept unmasked for the encoder. This commit renames the variable.

sanchit-gandhi reviewed Oct 10, 2022

View reviewed changes

amyeroberts and others added 23 commits October 10, 2022 14:48

Fix the error message in run_t5_mlm_flax.py (huggingface#19282)

e150c4e

Add Italian translation for add_new_model.mdx (huggingface#18713)

b0b962c

* fix conflicts * start translating * proof check * add toc * fix errors and typos

Generate: corrected exponential_decay_length_penalty type hint (huggi…

d866b48

…ngface#19376)

Fix misspelled word in docstring (huggingface#19415)

9df953a

Fixed a non-working hyperlink in the README.md file (huggingface#19434)

25cfd91

* Fixed a non-working hyperlink in the README.md file The hyperlink to the community notebooks was outdated. * Fixing missing double slash in hyperlink

fix (huggingface#19469)

a7bc422

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

wrap forward passes with torch.no_grad() (huggingface#19439)

692c5be

wrap forward passes with torch.no_grad() (huggingface#19438)

870a954

wrap forward passes with torch.no_grad() (huggingface#19416)

d739a70

wrap forward passes with torch.no_grad() (huggingface#19414)

c6a928c

wrap forward passes with torch.no_grad() (huggingface#19413)

5f5e264

wrap forward passes with torch.no_grad() (huggingface#19412)

df2f281

Dev version

1010097

Compute true loss

d7d71c8

final

b5ccda0

fixup

135cb98

final

d94d04f

final

d0ccf00

Update examples/flax/language-modeling/run_bart_dlm_flax.py

418f6c4

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

jax.tree_map => jax.tree_util.tree_map

b90b5ae

final

6c0ae1c

duongna21 mentioned this pull request Oct 12, 2022

[Re-submit] Compute true loss Flax examples #19504

Merged

duongna21 closed this Oct 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compute true loss Flax examples #18458

Compute true loss Flax examples #18458

duongna21 commented Aug 3, 2022

HuggingFaceDocBuilderDev commented Aug 3, 2022 •

edited

Loading

sanchit-gandhi left a comment

sanchit-gandhi Aug 8, 2022

duongna21 Aug 9, 2022

sanchit-gandhi Aug 9, 2022

duongna21 Aug 11, 2022

sanchit-gandhi Aug 12, 2022

duongna21 Aug 13, 2022

duongna21 Aug 16, 2022

tannonk Sep 6, 2022

duongna21 Sep 21, 2022 •

edited

Loading

sanchit-gandhi Aug 8, 2022

sanchit-gandhi Oct 10, 2022

duongna21 Oct 11, 2022

sanchit-gandhi left a comment •

edited

Loading

sanchit-gandhi Oct 10, 2022

duongna21 Oct 11, 2022 •

edited

Loading

sanchit-gandhi Oct 10, 2022

patrickvonplaten commented Oct 11, 2022

		eval_batch_idx = generate_batch_splits(eval_samples_idx, eval_batch_size, drop_last=False)
		eval_batch_idx = generate_batch_splits(eval_samples_idx, eval_batch_size)

Compute true loss Flax examples #18458

Compute true loss Flax examples #18458

Conversation

duongna21 commented Aug 3, 2022

What does this PR do?

Who can review?

HuggingFaceDocBuilderDev commented Aug 3, 2022 • edited Loading

sanchit-gandhi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

duongna21 Sep 21, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sanchit-gandhi left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

duongna21 Oct 11, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickvonplaten commented Oct 11, 2022

HuggingFaceDocBuilderDev commented Aug 3, 2022 •

edited

Loading

duongna21 Sep 21, 2022 •

edited

Loading

sanchit-gandhi left a comment •

edited

Loading

duongna21 Oct 11, 2022 •

edited

Loading