-
Notifications
You must be signed in to change notification settings - Fork 27.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an examples folder for code downstream tasks #18679
Add an examples folder for code downstream tasks #18679
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I mostly left comments for simplifying the training code a bit.
optimizer = AdamW(get_grouped_params(model, args), lr=args.learning_rate) | ||
lr_scheduler = get_scheduler( | ||
name=args.lr_scheduler_type, | ||
optimizer=optimizer, | ||
num_training_steps=args.num_epochs, | ||
num_warmup_steps=args.num_warmup_steps, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can define all that in the training arguments, no? No need to pass an optimizer/lr_scheduler explicitely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think by default the linear scheduler is used and I needed cosine scheduler here, but you're right we don't need to specify the optimizer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can also specify cosine: https://github.com/huggingface/transformers/blob/v4.21.1/src/transformers/trainer_utils.py#L356
def get_grouped_params(model, args, no_decay=["bias", "ln_1.weight", "ln_2.weight", "ln_f.weight"]): | ||
params_with_wd, params_without_wd = [], [] | ||
for n, p in model.named_parameters(): | ||
if any(nd in n for nd in no_decay): | ||
params_without_wd.append(p) | ||
else: | ||
params_with_wd.append(p) | ||
return [ | ||
{"params": params_with_wd, "weight_decay": args.weight_decay}, | ||
{"params": params_without_wd, "weight_decay": 0.0}, | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you don't pass the optimizer explicitly, the Trainer will take care of that for you, no?
class CustomCallback(TrainerCallback): | ||
def __init__(self, trainer) -> None: | ||
super().__init__() | ||
self._trainer = trainer | ||
|
||
def on_epoch_end(self, args, state, control, **kwargs): | ||
if control.should_evaluate: | ||
control_copy = deepcopy(control) | ||
self._trainer.evaluate(eval_dataset=self._trainer.train_dataset, metric_key_prefix="train") | ||
return control_copy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed? Isn't this the same as evaluation_strategy="epoch"
in the training arguments? also why do you evaluate on the train set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this because I wanted to monitor the gap in accuracy between the training set and evaluation set
@@ -12,7 +12,7 @@ This is an open-source effort to train and evaluate code generation models. Code | |||
- continuously push checkpoints to the hub with `huggingface_hub` | |||
- stream the dataset with `datasets` during training to avoid disk bottlenecks | |||
- apply the `code_eval` metric in `datasets` to evaluate on [OpenAI's _HumanEval_ benchmark](https://huggingface.co/datasets/openai_humaneval) | |||
|
|||
- showcase examples for downstream tasks with code models in [examples](https://github.com/huggingface/transformers/tree/main/examples/research_projects/codeparrot/examples) folder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can say what examples we show there. should we also add the code for the text2py and py2text here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok but the text2py and py2text examples use a similar script to the one for pretraining codeparrot just with a different dataset and model checkpoint, maybe I can just mention that in the README?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I think mentioning the settings would also be useful to document.
12762f5
to
17e508d
Compare
* add examples subfolder * mention examples in codeparrot readme * use Trainer optimizer and scheduler type and add output_dir as argument * add example of text-to-python and python-to-text models * mention the downstream examples in the readme * fix typo
What does this PR do?
This PR adds a folder in CodeParrot directory to store examples for downstream tasks on code models.