Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KED-2917] Set up experiment tracking tutorial #1144

Merged
merged 22 commits into from
Jan 17, 2022

Conversation

studioswong
Copy link
Contributor

Description

Resolves KED-2917.

With the recent implementation of the set of experiment tracking features on both Kedro and Kedro-Viz, this tutorial aims to provide a step by step process, using the Kedro-spaceflight starter project, to demonstrate the process of setting up tracking datasets and visualising the run data on Kedro-Viz.

Development notes

I have added a new section under 03_tutorial, along with the required images and gifs. I have also added a cross reference to this tutorial under the existing 'experiment_tracking' section on the docs.

I have added a description under a new section on the release notes as well ( release version pending)

Massive thanks to @AntonyMilneQB , @rashidakanchwala and @datajoely for your help with going through the tracked dataset setup with me 😉

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes

@@ -1,3 +1,8 @@
# Release 0.17.7
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that there is no new section on the release notes for the upcoming release, and had temporarily named the upcoming release as 0.17.7 for now - feel free to suggest any new version otherwise

@studioswong studioswong changed the title set up experiment tracking tutorial [KED-2917] Set up experiment tracking tutorial Jan 8, 2022
Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great start @studioswong ! 👏 I've added some small comments, but will do another review later. I've noticed that the tone of voice of this section is a bit different from other parts e.g. "Let's implement/do ... " so we might want to make it a bit more consistent with the rest of the spaceflights tutorial.

docs/source/03_tutorial/07_set_up_experiment_tracking.md Outdated Show resolved Hide resolved
docs/source/03_tutorial/07_set_up_experiment_tracking.md Outdated Show resolved Hide resolved
docs/source/03_tutorial/07_set_up_experiment_tracking.md Outdated Show resolved Hide resolved
docs/source/03_tutorial/07_set_up_experiment_tracking.md Outdated Show resolved Hide resolved
docs/source/03_tutorial/07_set_up_experiment_tracking.md Outdated Show resolved Hide resolved
docs/source/03_tutorial/07_set_up_experiment_tracking.md Outdated Show resolved Hide resolved
docs/source/03_tutorial/07_set_up_experiment_tracking.md Outdated Show resolved Hide resolved
docs/source/03_tutorial/07_set_up_experiment_tracking.md Outdated Show resolved Hide resolved
docs/source/03_tutorial/07_set_up_experiment_tracking.md Outdated Show resolved Hide resolved
docs/source/03_tutorial/07_set_up_experiment_tracking.md Outdated Show resolved Hide resolved
Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a second review and added a couple more comments, mostly around making the language consistent to use "you" instead of "we".

docs/source/03_tutorial/07_set_up_experiment_tracking.md Outdated Show resolved Hide resolved
docs/source/03_tutorial/07_set_up_experiment_tracking.md Outdated Show resolved Hide resolved
docs/source/03_tutorial/07_set_up_experiment_tracking.md Outdated Show resolved Hide resolved
docs/source/03_tutorial/07_set_up_experiment_tracking.md Outdated Show resolved Hide resolved
docs/source/03_tutorial/07_set_up_experiment_tracking.md Outdated Show resolved Hide resolved
docs/source/03_tutorial/07_set_up_experiment_tracking.md Outdated Show resolved Hide resolved
@studioswong studioswong requested a review from merelcht January 12, 2022 14:16
Copy link
Contributor

@yetudada yetudada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of my commentary is around formatting and a few on content. In general this piece is very structured, and it's really easy to read.

I have one more comment that perhaps we can move images further up the page to showcase what users get. But thank you so much for this great work @studioswong! 🚀

docs/source/03_tutorial/07_set_up_experiment_tracking.md Outdated Show resolved Hide resolved
docs/source/03_tutorial/07_set_up_experiment_tracking.md Outdated Show resolved Hide resolved

## Project setup

We assume that you have already [installed Kedro](../02_get_started/02_install.md) and [Kedro-Viz](../03_tutorial/06_visualise_pipeline.md). Now set up a new Kedro project using the Kedro-spaceflight starter by running
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We assume that you have already [installed Kedro](../02_get_started/02_install.md) and [Kedro-Viz](../03_tutorial/06_visualise_pipeline.md). Now set up a new Kedro project using the Kedro-spaceflight starter by running
We assume that you have already [installed Kedro](../02_get_started/02_install.md) and [Kedro-Viz](../03_tutorial/06_visualise_pipeline.md). Set up a new project using spaceflights starter by running:

docs/source/03_tutorial/07_set_up_experiment_tracking.md Outdated Show resolved Hide resolved
docs/source/03_tutorial/07_set_up_experiment_tracking.md Outdated Show resolved Hide resolved
docs/source/03_tutorial/07_set_up_experiment_tracking.md Outdated Show resolved Hide resolved
docs/source/03_tutorial/07_set_up_experiment_tracking.md Outdated Show resolved Hide resolved
docs/source/03_tutorial/07_set_up_experiment_tracking.md Outdated Show resolved Hide resolved

![](../meta/images/experiment-tracking_demo.gif)

You can also access a more detailed demo [here](https://kedro-viz-live-demo.hfa4c8ufrmn4u.eu-west-2.cs.amazonlightsail.com/).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great note! This should perhaps be moved into the introduction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I've moved this to the first section right underneath the reference to the other section for experiment tracking setup ( line 9)

docs/source/03_tutorial/07_set_up_experiment_tracking.md Outdated Show resolved Hide resolved
Copy link
Contributor

@rashidakanchwala rashidakanchwala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Great tutorial.

@studioswong studioswong requested a review from yetudada January 12, 2022 16:15
@studioswong
Copy link
Contributor Author

Thanks Yetu! I have added a new gif that showcases the experiment tracking features on Kedro-Viz under the intro section.

@rashidakanchwala rashidakanchwala self-requested a review January 12, 2022 17:22
Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more small comments from my side. I think this is really close to being done! 👏 I'll have another look when all comments/questions are resolved and will be happy to approve! 😄

docs/source/03_tutorial/07_set_up_experiment_tracking.md Outdated Show resolved Hide resolved
docs/source/03_tutorial/07_set_up_experiment_tracking.md Outdated Show resolved Hide resolved
docs/source/03_tutorial/07_set_up_experiment_tracking.md Outdated Show resolved Hide resolved
docs/source/03_tutorial/07_set_up_experiment_tracking.md Outdated Show resolved Hide resolved

There are 2 types of tracking datatsets: [`tracking.MetricsDataSet`](/kedro.extras.datasets.tracking.MetricsDataSet) and [`tracking.JSONDataSet`](/kedro.extras.datasets.tracking.JSONDataSet). The `tracking.MetricsDataSet` should be used for tracking numerical metrics, and the `tracking.JSONDataSet` can be used for tracking any other JSON-compatible data.

Let's set up the following 2 datasets to log our r2 scores and parameters for each run by adding the following in `catalog.yml` under `/conf/base`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order doesn't really matter, so I'd leave it like this.

docs/source/03_tutorial/07_set_up_experiment_tracking.md Outdated Show resolved Hide resolved
kedro viz
```

When you open the Kedro-Viz webapp, you will see an `experiment tracking` icon on your left. Clicking the icon will bring you to the experiment tracking page (you can also access the page via `http://127.0.0.1:4141/runsList`), where you will now see the set of experiment data generated from your previous runs, as shown below:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a picture of the icon? I think Rashida did something like that in the plotly docs.

Copy link
Contributor Author

@studioswong studioswong Jan 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure thing - I've added the icon

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might need a black version of the icon 😅
Screenshot 2022-01-14 at 11 21 08

Copy link
Contributor Author

@studioswong studioswong Jan 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooo yea good spot - I actually use dark mode on github and completely forgotten about that!

This reminds me that perhaps I should replace with a png with background color intead - I'll update that.

P.S I had a quick glance and the different github modes might affect the icon in the plotly doc under dark mode 👇 - @rashidakanchwala think you might want to have a look too.
image

Copy link
Contributor Author

@studioswong studioswong Jan 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion with Rashida, I have updated the icon in the plotly docs as well in my latest commit 😉 let me know if you spot anything else with this icon

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, yes this looks much better!

docs/source/03_tutorial/07_set_up_experiment_tracking.md Outdated Show resolved Hide resolved
@studioswong studioswong requested a review from merelcht January 13, 2022 17:49
Copy link
Contributor

@yetudada yetudada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the final comments on my side are purely format. I'm approving this PR and it would be great to see @MerelTheisenQB and @rashidakanchwala's changes merged in too and then I think we're good to go. 🚀 Thank you for this great work!


You can also access a more detailed demo [here](https://kedro-viz-live-demo.hfa4c8ufrmn4u.eu-west-2.cs.amazonlightsail.com/).

## Project setup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Project setup
## Set up a project

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so it matches the other headers you have created. They have all followed the format of "Set up ..."


## Project setup

We assume that you have already [installed Kedro](../02_get_started/02_install.md) and [Kedro-Viz](../03_tutorial/06_visualise_pipeline.md). Set up a new Kedro project using spaceflights starter by running:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We assume that you have already [installed Kedro](../02_get_started/02_install.md) and [Kedro-Viz](../03_tutorial/06_visualise_pipeline.md). Set up a new Kedro project using spaceflights starter by running:
We assume that you have already [installed Kedro](../02_get_started/02_install.md) and [Kedro-Viz](../03_tutorial/06_visualise_pipeline.md). Set up a new Kedro project using the spaceflights starter by running:


Feel free to name your project as you like, but this guide will assume the project is named **Kedro Experiment Tracking Tutorial**, and that your project is in a sub-folder in your working directory that was created by `kedro new`, named `kedro-experiment-tracking-tutorial`. Keep the default names for the `repo_name` and `python_package` when prompted by pressing the enter key.

## Set up session store
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Set up session store
## Set up the session store

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realised there is a definite article missing here and have also updated @rashidakanchwala's above comment so that the hyperlink will still work.

SESSION_STORE_ARGS = {"path": str(Path(__file__).parents[2] / "data")}
```

This will specify the creation of the `SQLiteStore` under the `/data` subfolder, using the `SQLiteStore` setup from your installed Kedro-Viz plugin. (Please ensure that your installed version of Kedro-Viz is at least version 4.1.1 onwards). This step is crucial for enabling experiment tracking features on Kedro-Viz as it is the database used to serve all run data to the Kedro-Viz front end.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This will specify the creation of the `SQLiteStore` under the `/data` subfolder, using the `SQLiteStore` setup from your installed Kedro-Viz plugin. (Please ensure that your installed version of Kedro-Viz is at least version 4.1.1 onwards). This step is crucial for enabling experiment tracking features on Kedro-Viz as it is the database used to serve all run data to the Kedro-Viz front end.
This will specify the creation of the `SQLiteStore` under the `data/` subfolder, using the `SQLiteStore` setup from your installed Kedro-Viz plugin.
Please ensure that your installed version of Kedro-Viz is at least version 4.1.1 onwards. This step is crucial for enabling experiment tracking features on Kedro-Viz as it is the database used to serve all run data to the Kedro-Viz front-end. Once this step is complete you can either proceed to [set up the tracking datasets](#set-up-tracking-datasets) or [set up your nodes and pipelines to log metrics](#set-up-your-nodes-and-pipelines-to-log-metrics); these two activities are interchangeable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really enjoyed this explanation! It's great! Minor changes to it to add in spacing for ease of reading and then I added in additional notes to explain to users that they could choose to set up their datasets or their nodes and pipelines; the order doesn't matter.


There are 2 types of tracking datatsets: [`tracking.MetricsDataSet`](/kedro.extras.datasets.tracking.MetricsDataSet) and [`tracking.JSONDataSet`](/kedro.extras.datasets.tracking.JSONDataSet). The `tracking.MetricsDataSet` should be used for tracking numerical metrics, and the `tracking.JSONDataSet` can be used for tracking any other JSON-compatible data.

Let's set up the following 2 datasets to log our r2 scores and parameters for each run by adding the following in `catalog.yml` under `/conf/base`:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rashidakanchwala might have a point in terms of workflow but we can indicate that with a note. I've added a note in the previous section.

)
```

You have to repeat the same steps for setting up the `companies_column` dataset. For this dataset you should log the column that contains the list of companies as outlined in `companies.csv` under `/data/01_raw`. Modify the `preprocess_companies` function under the `data_processing` pipeline to return the data under a key value pair, as shown below:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You have to repeat the same steps for setting up the `companies_column` dataset. For this dataset you should log the column that contains the list of companies as outlined in `companies.csv` under `/data/01_raw`. Modify the `preprocess_companies` function under the `data_processing` pipeline to return the data under a key value pair, as shown below:
You have to repeat the same steps for setting up the `companies_column` dataset. For this dataset you should log the column that contains the list of companies as outlined in `companies.csv` under `data/01_raw`. Modify the `preprocess_companies` node under the `data_processing` pipeline (`src/kedro-experiment-tracking-tutorial/pipelines/data_processing/nodes.py`) to return the data under a key-value pair, as shown below:

return companies, {"columns": companies.columns.tolist(), "data_type": "companies"}
```

Again, you will need to ensure that the dataset is also specified as an output on `pipeline.py` under the `data_processing` pipeline, as follows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Again, you will need to ensure that the dataset is also specified as an output on `pipeline.py` under the `data_processing` pipeline, as follows:
Again, you will need to ensure that the dataset is also specified as an output on `pipeline.py` under the `data_processing` pipeline (`src/kedro-experiment-tracking-tutorial/pipelines/data_processing/pipeline.py`), as follows:

Kedro run
```

After the run completes, under `data/09_tracking`, you will now see 2 folders, `companies_column.json` and `metrics.json`. On performing a pipeline run after setting up the tracking datasets, Kedro will generate a folder with the dataset name for each tracked dataset. Each folder of the tracked dataset will contain folders named by the timestamp of each pipeline run to store the saved metrics of the dataset, with each future pipeline run generating a new timestamp folder with the json file of the saved metrics under the folder of its subsequent tracked dataset.
Copy link
Contributor

@yetudada yetudada Jan 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
After the run completes, under `data/09_tracking`, you will now see 2 folders, `companies_column.json` and `metrics.json`. On performing a pipeline run after setting up the tracking datasets, Kedro will generate a folder with the dataset name for each tracked dataset. Each folder of the tracked dataset will contain folders named by the timestamp of each pipeline run to store the saved metrics of the dataset, with each future pipeline run generating a new timestamp folder with the json file of the saved metrics under the folder of its subsequent tracked dataset.
After the run completes, under `data/09_tracking`, you will now see two folders, `companies_column.json` and `metrics.json`. On performing a pipeline run after setting up the tracking datasets, Kedro will generate a folder with the dataset name for each tracked dataset. Each folder of the tracked dataset will contain folders named by the timestamp of each pipeline run to store the saved metrics of the dataset, with each future pipeline run generating a new timestamp folder with the JSON file of the saved metrics under the folder of its subsequent tracked dataset.


![](../meta/images/experiment-tracking_demo.gif)

The Kedro-Viz team will be adding new features in the coming weeks, such as allowing the editing of your run title, adding notes, bookmarking and searching your runs. Keep an eye out on the [Kedro-Viz release page](https://github.com/quantumblacklabs/kedro-viz/releases) for the upcoming releases.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The Kedro-Viz team will be adding new features in the coming weeks, such as allowing the editing of your run title, adding notes, bookmarking and searching your runs. Keep an eye out on the [Kedro-Viz release page](https://github.com/quantumblacklabs/kedro-viz/releases) for the upcoming releases.
Keep an eye out on the [Kedro-Viz release page](https://github.com/quantumblacklabs/kedro-viz/releases) for the upcoming releases on this experiment tracking functionality.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the specific features because this would become out of date.

kedro viz
```

When you open the Kedro-Viz webapp, you will see an experiment tracking icon ![](../meta/images/icon-experiments.svg) on your left. Clicking the icon will bring you to the experiment tracking page (you can also access the page via `http://127.0.0.1:4141/runsList`), where you will now see the set of experiment data generated from your previous runs, as shown below:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When you open the Kedro-Viz webapp, you will see an experiment tracking icon ![](../meta/images/icon-experiments.svg) on your left. Clicking the icon will bring you to the experiment tracking page (you can also access the page via `http://127.0.0.1:4141/runsList`), where you will now see the set of experiment data generated from your previous runs, as shown below:
When you open the Kedro-Viz web app, you will see an experiment tracking icon ![](../meta/images/icon-experiments.svg) on your left. Clicking the icon will bring you to the experiment tracking page (you can also access the page via `http://127.0.0.1:4141/runsList`), where you will now see the set of experiment data generated from your previous runs, as shown below:

Copy link
Contributor

@rashidakanchwala rashidakanchwala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work @studioswong 👏 ⭐

@studioswong studioswong merged commit 9fc4ad2 into main Jan 17, 2022
datajoely pushed a commit that referenced this pull request Jan 18, 2022
* set up experiment tracking tutorial

* added release note

* first set of comments

* second set of comments

* third round of comments

* addition of demo gif in intro section

* minor wording changes

* further minor changes

* update wording changes and added experiment icon

* minor wording change

* minor formatting change

* further wording changes and adding new doc to index.rst

* update experiment tracking icon

* updated release notes, edited plotly icon

* rename icon

* lint changes

* futher minor wording changes

* further minor change

Signed-off-by: datajoely <joel.schwarzmann@quantumblack.com>
@merelcht merelcht deleted the feature/setup-experiment-tracking-tutorial branch February 3, 2022 11:08
lvijnck pushed a commit to lvijnck/kedro that referenced this pull request Apr 7, 2022
* set up experiment tracking tutorial

* added release note

* first set of comments

* second set of comments

* third round of comments

* addition of demo gif in intro section

* minor wording changes

* further minor changes

* update wording changes and added experiment icon

* minor wording change

* minor formatting change

* further wording changes and adding new doc to index.rst

* update experiment tracking icon

* updated release notes, edited plotly icon

* rename icon

* lint changes

* futher minor wording changes

* further minor change

Signed-off-by: Laurens Vijnck <laurens_vijnck@mckinsey.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants