Build out a query to return data for exp. tracking metrics plots #1133

tynandebold · 2022-10-11T16:14:23Z

Description

With the development work for experiment tracking metrics plots that will start on Oct. 24th, we'll need to prepare a query that the FE can use to render the metrics plots.

Context

Additionally, we should think about limiting the amount of data we return. E.g. if the user has 100 runs, we probably shouldn't return all of those runs due to performance concerns.

Shape of Plotly data

Here's an example of what Plotly expects for a time-series plot.
Here's an example of what Plotly expects for a parallel-coordinates plot.

Design

Here's the Figma file.

Possible Implementation

Create a brand new query in the BE that would return all runs and metric information for those runs.

Prerequisite

Determine the shape of the data the FE wants to receive.

Checklist

Include labels so that we can categorise your feature request

antonymilne · 2022-10-12T06:43:38Z

I'm not really here but just a quick comment since I spotted the issue... Performance is absolutely a concern here, and another reason I think we need to answer the questions in #1217. See kedro-org/kedro#1070 (comment) for more here.

I don't want to put a spanner in the works or be a doom-monger, so I'll say that for now I think we should be ok to create the plot using e.g. the last 25 runs without solving all the above. We will then be able to ship the feature and it should work ok for now, but ultimately as experimenting tracking gets more heavily used I think there's going to be performance issues that people start complaining about (especially because there's currently no way to delete runs).

Back of the envelop calculation here to illustrate... Let's say you have 100 runs, each of which has 5 metrics datasets, stored on an s3 server that's 10,000km away from you. These don't seem like unreasonable numbers to me (e.g. unfortunately not that weird to have s3 located far away from the server). Currently we would need to perform 100 * 5 = 500 independent calls to MetricsDataSet.load. Even at the speed of light, each request/response from kedro-viz to s3 would take 20,000km / c = 66ms. For 500 requests that's 500 * 66ms = 33 seconds. That's very slow and doesn't even take into account all the extra time for processing each request on s3 etc. Even if we just limit to 25 runs it's 8 seconds, which still feels very slow.

This is exactly why PAI used to take a huge amount of time (like hours) to start up. The situation there was worse because every time there was a new run you had to restart the app for it to show up - we don't have that problem here on kedro-viz (but it is worth considering exactly when this query gets triggered). The solution in the PAI case was some sort of server-side caching system.

From memory I believe @idanov's solution here might be that tracking.MetricsDataSet works cumulatively, i.e. the metrics json file would record not just metrics from one particular run but all runs up until that point. This way you would just load up one metrics dataset that contained information for the last 100 runs rather than 100 separate files. I'm not at all sure this is a good solution though.

antonymilne · 2022-10-12T07:22:42Z

Oh, another slight catch here... If you add a new metrics dataset, that will not be fetched by the query without restarting kedro-viz unless --autoreload is used. Fortunately new runs should be automatically fetched already though. From kedro-org/kedro#964:

The way that tracking data is populated and retrieved is worth documenting here:

initially populate_data when the app is first loaded, TrackingDatasetRepository is populated by data_access_manager.add_catalog

this adds tracking datasets for the whole pipeline (i.e. the registered pipelines dropdown menu and --pipeline argument affect only the flowchart view)

If you add a new tracked dataset and do a kedro run then the new dataset won't show up in experiment tracking unless --autoreload is set, because populate_data won't be called again

the data for a run_id is only loaded (through dataset._load) when the GraphQL query run_tracking_data is run with that run_id. Subsequent run_tracking_data queries will not perform the load from disk again but instead read out of memory

@limdauto's solution here would be to swap from in-memory to database repository layer, as per #872. This way we would have a sqlite database that would keep track of metrics, thereby also avoiding the above performance issues and providing some way of implementing a search also.

Overall I think it's fine to go ahead with some sort of MVP here without reworking the backend, but be aware that we may very well need to do some significant reworking in future to make it more performant.

tynandebold · 2022-10-12T12:30:39Z

Thanks, Antony! All great points.

tynandebold added Issue: Feature Request Python Pull requests that update Python code labels Oct 11, 2022

tynandebold added this to Kedro-Viz Oct 11, 2022

tynandebold moved this to Todo in Kedro-Viz Oct 11, 2022

tynandebold assigned jmholzer Oct 12, 2022

NeroOkwa moved this from Todo to In Progress in Kedro-Viz Oct 19, 2022

yetudada moved this from In Progress to Todo in Kedro-Viz Nov 1, 2022

jmholzer mentioned this issue Nov 2, 2022

Create query for tracking metrics #1150

Merged

5 tasks

jmholzer moved this from Todo to In Progress in Kedro-Viz Nov 3, 2022

jmholzer moved this from In Progress to In Review in Kedro-Viz Nov 14, 2022

jmholzer moved this from In Review to Done in Kedro-Viz Nov 15, 2022

tynandebold closed this as completed Nov 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build out a query to return data for exp. tracking metrics plots #1133

Build out a query to return data for exp. tracking metrics plots #1133

tynandebold commented Oct 11, 2022 •

edited

Loading

antonymilne commented Oct 12, 2022 •

edited

Loading

antonymilne commented Oct 12, 2022

tynandebold commented Oct 12, 2022

Build out a query to return data for exp. tracking metrics plots #1133

Build out a query to return data for exp. tracking metrics plots #1133

Comments

tynandebold commented Oct 11, 2022 • edited Loading

Description

Context

Shape of Plotly data

Design

Possible Implementation

Prerequisite

Checklist

antonymilne commented Oct 12, 2022 • edited Loading

antonymilne commented Oct 12, 2022

tynandebold commented Oct 12, 2022

tynandebold commented Oct 11, 2022 •

edited

Loading

antonymilne commented Oct 12, 2022 •

edited

Loading