-
Notifications
You must be signed in to change notification settings - Fork 914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[KED-2917] Set up experiment tracking tutorial #1144
[KED-2917] Set up experiment tracking tutorial #1144
Conversation
@@ -1,3 +1,8 @@ | |||
# Release 0.17.7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that there is no new section on the release notes for the upcoming release, and had temporarily named the upcoming release as 0.17.7
for now - feel free to suggest any new version otherwise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great start @studioswong ! 👏 I've added some small comments, but will do another review later. I've noticed that the tone of voice of this section is a bit different from other parts e.g. "Let's implement/do ... " so we might want to make it a bit more consistent with the rest of the spaceflights tutorial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a second review and added a couple more comments, mostly around making the language consistent to use "you" instead of "we".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of my commentary is around formatting and a few on content. In general this piece is very structured, and it's really easy to read.
I have one more comment that perhaps we can move images further up the page to showcase what users get. But thank you so much for this great work @studioswong! 🚀
|
||
## Project setup | ||
|
||
We assume that you have already [installed Kedro](../02_get_started/02_install.md) and [Kedro-Viz](../03_tutorial/06_visualise_pipeline.md). Now set up a new Kedro project using the Kedro-spaceflight starter by running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We assume that you have already [installed Kedro](../02_get_started/02_install.md) and [Kedro-Viz](../03_tutorial/06_visualise_pipeline.md). Now set up a new Kedro project using the Kedro-spaceflight starter by running | |
We assume that you have already [installed Kedro](../02_get_started/02_install.md) and [Kedro-Viz](../03_tutorial/06_visualise_pipeline.md). Set up a new project using spaceflights starter by running: |
|
||
![](../meta/images/experiment-tracking_demo.gif) | ||
|
||
You can also access a more detailed demo [here](https://kedro-viz-live-demo.hfa4c8ufrmn4u.eu-west-2.cs.amazonlightsail.com/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great note! This should perhaps be moved into the introduction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I've moved this to the first section right underneath the reference to the other section for experiment tracking setup ( line 9)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Great tutorial.
Thanks Yetu! I have added a new gif that showcases the experiment tracking features on Kedro-Viz under the intro section. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more small comments from my side. I think this is really close to being done! 👏 I'll have another look when all comments/questions are resolved and will be happy to approve! 😄
|
||
There are 2 types of tracking datatsets: [`tracking.MetricsDataSet`](/kedro.extras.datasets.tracking.MetricsDataSet) and [`tracking.JSONDataSet`](/kedro.extras.datasets.tracking.JSONDataSet). The `tracking.MetricsDataSet` should be used for tracking numerical metrics, and the `tracking.JSONDataSet` can be used for tracking any other JSON-compatible data. | ||
|
||
Let's set up the following 2 datasets to log our r2 scores and parameters for each run by adding the following in `catalog.yml` under `/conf/base`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order doesn't really matter, so I'd leave it like this.
kedro viz | ||
``` | ||
|
||
When you open the Kedro-Viz webapp, you will see an `experiment tracking` icon on your left. Clicking the icon will bring you to the experiment tracking page (you can also access the page via `http://127.0.0.1:4141/runsList`), where you will now see the set of experiment data generated from your previous runs, as shown below: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a picture of the icon? I think Rashida did something like that in the plotly docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure thing - I've added the icon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooo yea good spot - I actually use dark mode on github and completely forgotten about that!
This reminds me that perhaps I should replace with a png with background color intead - I'll update that.
P.S I had a quick glance and the different github modes might affect the icon in the plotly doc under dark mode 👇 - @rashidakanchwala think you might want to have a look too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussion with Rashida, I have updated the icon in the plotly docs as well in my latest commit 😉 let me know if you spot anything else with this icon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, yes this looks much better!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the final comments on my side are purely format. I'm approving this PR and it would be great to see @MerelTheisenQB and @rashidakanchwala's changes merged in too and then I think we're good to go. 🚀 Thank you for this great work!
|
||
You can also access a more detailed demo [here](https://kedro-viz-live-demo.hfa4c8ufrmn4u.eu-west-2.cs.amazonlightsail.com/). | ||
|
||
## Project setup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## Project setup | |
## Set up a project |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just so it matches the other headers you have created. They have all followed the format of "Set up ..."
|
||
## Project setup | ||
|
||
We assume that you have already [installed Kedro](../02_get_started/02_install.md) and [Kedro-Viz](../03_tutorial/06_visualise_pipeline.md). Set up a new Kedro project using spaceflights starter by running: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We assume that you have already [installed Kedro](../02_get_started/02_install.md) and [Kedro-Viz](../03_tutorial/06_visualise_pipeline.md). Set up a new Kedro project using spaceflights starter by running: | |
We assume that you have already [installed Kedro](../02_get_started/02_install.md) and [Kedro-Viz](../03_tutorial/06_visualise_pipeline.md). Set up a new Kedro project using the spaceflights starter by running: |
|
||
Feel free to name your project as you like, but this guide will assume the project is named **Kedro Experiment Tracking Tutorial**, and that your project is in a sub-folder in your working directory that was created by `kedro new`, named `kedro-experiment-tracking-tutorial`. Keep the default names for the `repo_name` and `python_package` when prompted by pressing the enter key. | ||
|
||
## Set up session store |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## Set up session store | |
## Set up the session store |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realised there is a definite article missing here and have also updated @rashidakanchwala's above comment so that the hyperlink will still work.
SESSION_STORE_ARGS = {"path": str(Path(__file__).parents[2] / "data")} | ||
``` | ||
|
||
This will specify the creation of the `SQLiteStore` under the `/data` subfolder, using the `SQLiteStore` setup from your installed Kedro-Viz plugin. (Please ensure that your installed version of Kedro-Viz is at least version 4.1.1 onwards). This step is crucial for enabling experiment tracking features on Kedro-Viz as it is the database used to serve all run data to the Kedro-Viz front end. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will specify the creation of the `SQLiteStore` under the `/data` subfolder, using the `SQLiteStore` setup from your installed Kedro-Viz plugin. (Please ensure that your installed version of Kedro-Viz is at least version 4.1.1 onwards). This step is crucial for enabling experiment tracking features on Kedro-Viz as it is the database used to serve all run data to the Kedro-Viz front end. | |
This will specify the creation of the `SQLiteStore` under the `data/` subfolder, using the `SQLiteStore` setup from your installed Kedro-Viz plugin. | |
Please ensure that your installed version of Kedro-Viz is at least version 4.1.1 onwards. This step is crucial for enabling experiment tracking features on Kedro-Viz as it is the database used to serve all run data to the Kedro-Viz front-end. Once this step is complete you can either proceed to [set up the tracking datasets](#set-up-tracking-datasets) or [set up your nodes and pipelines to log metrics](#set-up-your-nodes-and-pipelines-to-log-metrics); these two activities are interchangeable. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really enjoyed this explanation! It's great! Minor changes to it to add in spacing for ease of reading and then I added in additional notes to explain to users that they could choose to set up their datasets or their nodes and pipelines; the order doesn't matter.
|
||
There are 2 types of tracking datatsets: [`tracking.MetricsDataSet`](/kedro.extras.datasets.tracking.MetricsDataSet) and [`tracking.JSONDataSet`](/kedro.extras.datasets.tracking.JSONDataSet). The `tracking.MetricsDataSet` should be used for tracking numerical metrics, and the `tracking.JSONDataSet` can be used for tracking any other JSON-compatible data. | ||
|
||
Let's set up the following 2 datasets to log our r2 scores and parameters for each run by adding the following in `catalog.yml` under `/conf/base`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rashidakanchwala might have a point in terms of workflow but we can indicate that with a note. I've added a note in the previous section.
) | ||
``` | ||
|
||
You have to repeat the same steps for setting up the `companies_column` dataset. For this dataset you should log the column that contains the list of companies as outlined in `companies.csv` under `/data/01_raw`. Modify the `preprocess_companies` function under the `data_processing` pipeline to return the data under a key value pair, as shown below: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to repeat the same steps for setting up the `companies_column` dataset. For this dataset you should log the column that contains the list of companies as outlined in `companies.csv` under `/data/01_raw`. Modify the `preprocess_companies` function under the `data_processing` pipeline to return the data under a key value pair, as shown below: | |
You have to repeat the same steps for setting up the `companies_column` dataset. For this dataset you should log the column that contains the list of companies as outlined in `companies.csv` under `data/01_raw`. Modify the `preprocess_companies` node under the `data_processing` pipeline (`src/kedro-experiment-tracking-tutorial/pipelines/data_processing/nodes.py`) to return the data under a key-value pair, as shown below: |
return companies, {"columns": companies.columns.tolist(), "data_type": "companies"} | ||
``` | ||
|
||
Again, you will need to ensure that the dataset is also specified as an output on `pipeline.py` under the `data_processing` pipeline, as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, you will need to ensure that the dataset is also specified as an output on `pipeline.py` under the `data_processing` pipeline, as follows: | |
Again, you will need to ensure that the dataset is also specified as an output on `pipeline.py` under the `data_processing` pipeline (`src/kedro-experiment-tracking-tutorial/pipelines/data_processing/pipeline.py`), as follows: |
Kedro run | ||
``` | ||
|
||
After the run completes, under `data/09_tracking`, you will now see 2 folders, `companies_column.json` and `metrics.json`. On performing a pipeline run after setting up the tracking datasets, Kedro will generate a folder with the dataset name for each tracked dataset. Each folder of the tracked dataset will contain folders named by the timestamp of each pipeline run to store the saved metrics of the dataset, with each future pipeline run generating a new timestamp folder with the json file of the saved metrics under the folder of its subsequent tracked dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the run completes, under `data/09_tracking`, you will now see 2 folders, `companies_column.json` and `metrics.json`. On performing a pipeline run after setting up the tracking datasets, Kedro will generate a folder with the dataset name for each tracked dataset. Each folder of the tracked dataset will contain folders named by the timestamp of each pipeline run to store the saved metrics of the dataset, with each future pipeline run generating a new timestamp folder with the json file of the saved metrics under the folder of its subsequent tracked dataset. | |
After the run completes, under `data/09_tracking`, you will now see two folders, `companies_column.json` and `metrics.json`. On performing a pipeline run after setting up the tracking datasets, Kedro will generate a folder with the dataset name for each tracked dataset. Each folder of the tracked dataset will contain folders named by the timestamp of each pipeline run to store the saved metrics of the dataset, with each future pipeline run generating a new timestamp folder with the JSON file of the saved metrics under the folder of its subsequent tracked dataset. |
|
||
![](../meta/images/experiment-tracking_demo.gif) | ||
|
||
The Kedro-Viz team will be adding new features in the coming weeks, such as allowing the editing of your run title, adding notes, bookmarking and searching your runs. Keep an eye out on the [Kedro-Viz release page](https://github.com/quantumblacklabs/kedro-viz/releases) for the upcoming releases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Kedro-Viz team will be adding new features in the coming weeks, such as allowing the editing of your run title, adding notes, bookmarking and searching your runs. Keep an eye out on the [Kedro-Viz release page](https://github.com/quantumblacklabs/kedro-viz/releases) for the upcoming releases. | |
Keep an eye out on the [Kedro-Viz release page](https://github.com/quantumblacklabs/kedro-viz/releases) for the upcoming releases on this experiment tracking functionality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the specific features because this would become out of date.
kedro viz | ||
``` | ||
|
||
When you open the Kedro-Viz webapp, you will see an experiment tracking icon ![](../meta/images/icon-experiments.svg) on your left. Clicking the icon will bring you to the experiment tracking page (you can also access the page via `http://127.0.0.1:4141/runsList`), where you will now see the set of experiment data generated from your previous runs, as shown below: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you open the Kedro-Viz webapp, you will see an experiment tracking icon ![](../meta/images/icon-experiments.svg) on your left. Clicking the icon will bring you to the experiment tracking page (you can also access the page via `http://127.0.0.1:4141/runsList`), where you will now see the set of experiment data generated from your previous runs, as shown below: | |
When you open the Kedro-Viz web app, you will see an experiment tracking icon ![](../meta/images/icon-experiments.svg) on your left. Clicking the icon will bring you to the experiment tracking page (you can also access the page via `http://127.0.0.1:4141/runsList`), where you will now see the set of experiment data generated from your previous runs, as shown below: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work @studioswong 👏 ⭐
* set up experiment tracking tutorial * added release note * first set of comments * second set of comments * third round of comments * addition of demo gif in intro section * minor wording changes * further minor changes * update wording changes and added experiment icon * minor wording change * minor formatting change * further wording changes and adding new doc to index.rst * update experiment tracking icon * updated release notes, edited plotly icon * rename icon * lint changes * futher minor wording changes * further minor change Signed-off-by: datajoely <joel.schwarzmann@quantumblack.com>
* set up experiment tracking tutorial * added release note * first set of comments * second set of comments * third round of comments * addition of demo gif in intro section * minor wording changes * further minor changes * update wording changes and added experiment icon * minor wording change * minor formatting change * further wording changes and adding new doc to index.rst * update experiment tracking icon * updated release notes, edited plotly icon * rename icon * lint changes * futher minor wording changes * further minor change Signed-off-by: Laurens Vijnck <laurens_vijnck@mckinsey.com>
Description
Resolves KED-2917.
With the recent implementation of the set of experiment tracking features on both Kedro and Kedro-Viz, this tutorial aims to provide a step by step process, using the Kedro-spaceflight starter project, to demonstrate the process of setting up tracking datasets and visualising the run data on Kedro-Viz.
Development notes
I have added a new section under
03_tutorial
, along with the required images and gifs. I have also added a cross reference to this tutorial under the existing 'experiment_tracking' section on the docs.I have added a description under a new section on the release notes as well ( release version pending)
Massive thanks to @AntonyMilneQB , @rashidakanchwala and @datajoely for your help with going through the tracked dataset setup with me 😉
Checklist
RELEASE.md
file