Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add documentation for mlflow autologging on website #1508

Merged
merged 4 commits into from
May 13, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 77 additions & 0 deletions website/docs/mlflow/autologging.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
---
title: SynapseML Autologging
description: SynapseML autologging
---

## Automatic Logging

[MLflow automatic logging](https://www.mlflow.org/docs/latest/tracking.html#automatic-logging) allows you to log metrics, parameters, and models without the need for explicit log statements.
For SynapseML we support autologging for all of our existing models.
serena-ruan marked this conversation as resolved.
Show resolved Hide resolved

To enable autologging for SynapseML:
1. Download this customized [log_model_allowlist file](https://mmlspark.blob.core.windows.net/publicwasb/log_model_allowlist.txt) and put it at a place that your code have access to.
serena-ruan marked this conversation as resolved.
Show resolved Hide resolved
2. Set spark configuration `spark.mlflow.pysparkml.autolog.logModelAllowlistFile` to the path of your `log_model_allowlist.txt` file.
3. Call `mlflow.pyspark.ml.autolog()` before your training code to enable autologging for all supported models.

Note:
1. If you want to support autologging of pyspark models that's not in the log_model_allowlist file, you can modify the file to add them.
serena-ruan marked this conversation as resolved.
Show resolved Hide resolved
2. If you've enabled autologging, then please don't write explicit `with mlflow.start_run()` as the behavior would be strange.
serena-ruan marked this conversation as resolved.
Show resolved Hide resolved


## Configuration process in Databricks as an example

1. Install MLflow via `%pip install mlflow`
2. Upload your customized `log_model_allowlist.txt` file to dbfs by clicking File/Upload Data button on Databricks UI.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we can pass a URL in this param, perhaps we can make this platform agnostic or at least give advice for both Synapse and Databricks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll add example in step 2 in section To enable autologging for SynapseML

3. Set Spark configuration:
```
spark.conf.set("spark.mlflow.pysparkml.autolog.logModelAllowlistFile", "/dbfs/FileStore/PATH_TO_YOUR_log_model_allowlist.txt")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this accept URLs? Perhaps we can host a reasonable default in our blob!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might also want to mention this can be set in cluster configs too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this idea! I'll raise a PR to mlflow to make the URL work lol this sounds so reasonable. And I just tested that the above spark.conf.set doesn't work as the cluster is already started, I'll change it to add spark config inside cluster configuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

```
4. Run the below code before your training code, you can also customize corresponding [parameters](https://www.mlflow.org/docs/latest/python_api/mlflow.pyspark.ml.html#mlflow.pyspark.ml.autolog) here.
serena-ruan marked this conversation as resolved.
Show resolved Hide resolved
```
mlflow.pyspark.ml.autolog()
```
5. Enjoy playing with models and find your experiment results in `Experiments` tab.
serena-ruan marked this conversation as resolved.
Show resolved Hide resolved

serena-ruan marked this conversation as resolved.
Show resolved Hide resolved
## Example for ConditionalKNNModel
```python
from pyspark.ml.linalg import Vectors
from synapse.ml.nn import *

df = spark.createDataFrame([
(Vectors.dense(2.0,2.0,2.0), "foo", 1),
(Vectors.dense(2.0,2.0,4.0), "foo", 3),
(Vectors.dense(2.0,2.0,6.0), "foo", 4),
(Vectors.dense(2.0,2.0,8.0), "foo", 3),
(Vectors.dense(2.0,2.0,10.0), "foo", 1),
(Vectors.dense(2.0,2.0,12.0), "foo", 2),
(Vectors.dense(2.0,2.0,14.0), "foo", 0),
(Vectors.dense(2.0,2.0,16.0), "foo", 1),
(Vectors.dense(2.0,2.0,18.0), "foo", 3),
(Vectors.dense(2.0,2.0,20.0), "foo", 0),
(Vectors.dense(2.0,4.0,2.0), "foo", 2),
(Vectors.dense(2.0,4.0,4.0), "foo", 4),
(Vectors.dense(2.0,4.0,6.0), "foo", 2),
(Vectors.dense(2.0,4.0,8.0), "foo", 2),
(Vectors.dense(2.0,4.0,10.0), "foo", 4),
(Vectors.dense(2.0,4.0,12.0), "foo", 3),
(Vectors.dense(2.0,4.0,14.0), "foo", 2),
(Vectors.dense(2.0,4.0,16.0), "foo", 1),
(Vectors.dense(2.0,4.0,18.0), "foo", 4),
(Vectors.dense(2.0,4.0,20.0), "foo", 4)
], ["features","values","labels"])

cnn = (ConditionalKNN().setOutputCol("prediction"))
cnnm = cnn.fit(df)

test_df = spark.createDataFrame([
(Vectors.dense(2.0,2.0,2.0), "foo", 1, [0, 1]),
(Vectors.dense(2.0,2.0,4.0), "foo", 4, [0, 1]),
(Vectors.dense(2.0,2.0,6.0), "foo", 2, [0, 1]),
(Vectors.dense(2.0,2.0,8.0), "foo", 4, [0, 1]),
(Vectors.dense(2.0,2.0,10.0), "foo", 4, [0, 1])
], ["features","values","labels","conditioner"])

display(cnnm.transform(test_df))
```

This should log one run that consists of ConditionalKNNModel artifact and its parameters.
serena-ruan marked this conversation as resolved.
Show resolved Hide resolved
8 changes: 4 additions & 4 deletions website/docs/mlflow/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,10 @@ Install SynapseML based on the [installation guidance](../getting_started/instal

## API Reference

[mlflow.spark.save_model](https://www.mlflow.org/docs/latest/python_api/mlflow.spark.html#mlflow.spark.save_model)
[mlflow.spark.log_model](https://www.mlflow.org/docs/latest/python_api/mlflow.spark.html#mlflow.spark.log_model)
[mlflow.spark.load_model](https://www.mlflow.org/docs/latest/python_api/mlflow.spark.html#mlflow.spark.load_model)
[mlflow.log_metric](https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.log_metric)
* [mlflow.spark.save_model](https://www.mlflow.org/docs/latest/python_api/mlflow.spark.html#mlflow.spark.save_model)
* [mlflow.spark.log_model](https://www.mlflow.org/docs/latest/python_api/mlflow.spark.html#mlflow.spark.log_model)
* [mlflow.spark.load_model](https://www.mlflow.org/docs/latest/python_api/mlflow.spark.html#mlflow.spark.load_model)
* [mlflow.log_metric](https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.log_metric)

## LightGBMClassificationModel

Expand Down
1 change: 1 addition & 0 deletions website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,7 @@ module.exports = {
items: [
'mlflow/introduction',
'mlflow/examples',
'mlflow/autologging'
],
},
{
Expand Down