-
Notifications
You must be signed in to change notification settings - Fork 27.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when using Adafactor without learn rate #11612
Comments
Thank you @sgugger for the feedback. I install the latest transformers version from source using: and set the recommended parameters from the patch: optimizer = Adafactor(model.parameters(), scale_parameter=False, relative_step=True, warmup_init=True, lr=None)
However, the error message remains the same. Can you give me a hint where I can address this issue? For reference this is the code that I am using: model = AutoModelForTokenClassification.from_pretrained(model_checkpoint, num_labels=len(label_2_id))
args = TrainingArguments(
output_dir=f"models/{run_name}/checkpoints",
run_name=run_name,
evaluation_strategy = "epoch",
per_device_train_batch_size=batch_size,
per_device_eval_batch_size=batch_size,
gradient_accumulation_steps=1,
num_train_epochs=2,
report_to=["tensorboard"],
logging_dir='runs/'+run_name,
logging_first_step=True,
logging_steps=100,
save_steps= 10000,
save_total_limit=10,
seed=16,
fp16=True
)
optimizer = Adafactor(model.parameters(), scale_parameter=False, relative_step=True, warmup_init=True, lr=None)
lrs = get_constant_schedule_with_warmup(optimizer,100)
data_collator = DataCollatorForTokenClassification(tokenizer)
trainer = Trainer(
model,
args,
train_dataset=tokenized_dataset_train,
eval_dataset=tokenized_dataset_val,
data_collator=data_collator,
tokenizer=tokenizer,
compute_metrics=compute_metrics_sklearn,
optimizers=(optimizer,lrs)
) |
@oliverguhr, please always post a full traceback for errors. It's impossible otherwise to know where the error came from, please refer to https://github.com/huggingface/transformers/blob/master/ISSUES.md#the-github-issues item (3). The actual recommendation is:
The alternative one I saved because others said it worked well for them. Once you post the full traceback then we can see why it fails. Thank you! p.s. colab notebook reproducing the problem is even better |
Thanks for looking at this @stas00 Here is a traceback and this is a colab notebook to reproduce the issue. Hint: Depending on setting
or
|
Thank you for creating the reproducible colab notebook, @oliverguhr - that's very helpful. So when you use But I see that barebones HF Trainer doesn't support training w/o a scheduler. So we aren't quite supporting this option then and perhaps we should. Regardless of the outcome we should document the conclusion of this thread in the Adafactor docstring. So here are a few ideas meanwhile:
Let me know if this unblocks you a bit.
clearly this is a quick hack, but it seems to work. it returns As you can see I had to hack If this is desired than we could add If you like the 2nd solution feel free to clean it up, and making a PR, perhaps getting rid of
|
So @sgugger suggests the 3rd option. For that will have to track down all the cases where the scheduler is used and condition those on Not sure about back-compat though since we auto-create a scheduler if it's not passed: transformers/src/transformers/trainer.py Lines 817 to 829 in 33fd83b
|
Another proposition from @sgugger is:
|
@stas00 Sorry for the late reply and thanks for your feedback. The DummyLR worked for me, but this parameter combination did not improve my results, maybe these parameter settings are kind of an edge case. Regarding the 3rd option: Would it possible to check if |
@sgugger, what is your take - I'm happy with either way, but let's resolve it one way or another. |
Mmm, the |
@oliverguhr, we went with the |
Hi,
I get these strange errors when I use the Adafactor. This code will result in this (expected) error:
however, if I do not set a manual learn rate I get a different error. Btw: This code is recommended in the documentation.
will return this error
Environment info
transformers
version: 4.5.1Who can help
Trainer: @sgugger
The text was updated successfully, but these errors were encountered: