Mlflow integration callback #8016

noise-field · 2020-10-24T12:18:20Z

What does this PR do?

This PR adds Trainer integration with MLflow.

It is implemented in roughly the same way as other integration callbacks (CometML, wandb) and gets added to the list of Trainer callbacks automatically when mlflow is installed. All the mlflow parameters are configured with env variables, as described in the library documentation. This PR adds an additional environment variable, HF_MLFLOW_LOG_ARTIFACTS, which controls whether to use mlflow artifact logging facility to save artifacts generated after training (it doesn't make much sense if mlflow is used locally).

Fixes #7698

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to the it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@sgugger

Add integration code for MLflow in integrations.py along with the code that checks that MLflow is installed.

Add import of MLflowCallback in trainer.py

Allow the callback to handle model argument and store model config items as hyperparameters.

MLflow cannot log more than a hundred parameters at once. Code added to split the parameters into batches of 100 items and log the batches one by one.

The "fluent" api used in MLflow integration allows only one run to be active at any given moment. If the Trainer is disposed off and a new one is created, but the training is not finished, it will refuse to log the results when the next trainer is created.

Add integration code for MLflow in integrations.py along with the code that checks that MLflow is installed.

Add import of MLflowCallback in trainer.py

Allow the callback to handle model argument and store model config items as hyperparameters.

MLflow cannot log more than a hundred parameters at once. Code added to split the parameters into batches of 100 items and log the batches one by one.

The "fluent" api used in MLflow integration allows only one run to be active at any given moment. If the Trainer is disposed off and a new one is created, but the training is not finished, it will refuse to log the results when the next trainer is created.

…e-field/transformers into mlflow-integration-callback

sgugger

This is very clean, thanks a lot for this PR!

LysandreJik

LGTM!

* Add MLflow integration class Add integration code for MLflow in integrations.py along with the code that checks that MLflow is installed. * Add MLflowCallback import Add import of MLflowCallback in trainer.py * Handle model argument Allow the callback to handle model argument and store model config items as hyperparameters. * Log parameters to MLflow in batches MLflow cannot log more than a hundred parameters at once. Code added to split the parameters into batches of 100 items and log the batches one by one. * Fix style * Add docs on MLflow callback * Fix issue with unfinished runs The "fluent" api used in MLflow integration allows only one run to be active at any given moment. If the Trainer is disposed off and a new one is created, but the training is not finished, it will refuse to log the results when the next trainer is created. * Add MLflow integration class Add integration code for MLflow in integrations.py along with the code that checks that MLflow is installed. * Add MLflowCallback import Add import of MLflowCallback in trainer.py * Handle model argument Allow the callback to handle model argument and store model config items as hyperparameters. * Log parameters to MLflow in batches MLflow cannot log more than a hundred parameters at once. Code added to split the parameters into batches of 100 items and log the batches one by one. * Fix style * Add docs on MLflow callback * Fix issue with unfinished runs The "fluent" api used in MLflow integration allows only one run to be active at any given moment. If the Trainer is disposed off and a new one is created, but the training is not finished, it will refuse to log the results when the next trainer is created.

This reverts commit 5b13249.

noise-field added 15 commits October 24, 2020 01:17

Add MLflow integration class

e173f45

Add integration code for MLflow in integrations.py along with the code that checks that MLflow is installed.

Add MLflowCallback import

37d1146

Add import of MLflowCallback in trainer.py

Handle model argument

4c13166

Allow the callback to handle model argument and store model config items as hyperparameters.

Log parameters to MLflow in batches

758dcd5

MLflow cannot log more than a hundred parameters at once. Code added to split the parameters into batches of 100 items and log the batches one by one.

Fix style

00ae9c9

Add docs on MLflow callback

d29c8e1

Add MLflow integration class

7502296

Add integration code for MLflow in integrations.py along with the code that checks that MLflow is installed.

Add MLflowCallback import

c14401b

Add import of MLflowCallback in trainer.py

Handle model argument

a3899a1

Allow the callback to handle model argument and store model config items as hyperparameters.

Log parameters to MLflow in batches

ae5b9c8

MLflow cannot log more than a hundred parameters at once. Code added to split the parameters into batches of 100 items and log the batches one by one.

Fix style

07f9080

Add docs on MLflow callback

68ce75f

Merge branch 'mlflow-integration-callback' of https://github.com/nois…

dfbc994

…e-field/transformers into mlflow-integration-callback

sgugger approved these changes Oct 26, 2020

View reviewed changes

sgugger requested a review from LysandreJik October 26, 2020 12:15

LysandreJik approved these changes Oct 26, 2020

View reviewed changes

LysandreJik merged commit c48b16b into huggingface:master Oct 26, 2020

noise-field deleted the mlflow-integration-callback branch October 26, 2020 21:00

HenryMaguire mentioned this pull request Nov 13, 2020

MLflowCallback to log run_name argument #8519

Closed

fabiocapsouza added a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020

Revert "Mlflow integration callback (huggingface#8016)"

d6a10dd

This reverts commit 5b13249.

dmilcevski mentioned this pull request Mar 24, 2021

MlFlow log artefacts #10881

Closed

4 tasks

pathikg mentioned this pull request Jun 15, 2023

TypeError: cannot pickle 'module' object #24308

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mlflow integration callback #8016

Mlflow integration callback #8016

noise-field commented Oct 24, 2020

sgugger left a comment

LysandreJik left a comment

Mlflow integration callback #8016

Mlflow integration callback #8016

Conversation

noise-field commented Oct 24, 2020

What does this PR do?

Before submitting

Who can review?

sgugger left a comment

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment