Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build out a query to return data for exp. tracking metrics plots #1133

Closed
1 task done
tynandebold opened this issue Oct 11, 2022 · 3 comments · Fixed by #1150
Closed
1 task done

Build out a query to return data for exp. tracking metrics plots #1133

tynandebold opened this issue Oct 11, 2022 · 3 comments · Fixed by #1150
Assignees
Labels
Issue: Feature Request Python Pull requests that update Python code

Comments

@tynandebold
Copy link
Member

tynandebold commented Oct 11, 2022

Description

With the development work for experiment tracking metrics plots that will start on Oct. 24th, we'll need to prepare a query that the FE can use to render the metrics plots.

Context

Additionally, we should think about limiting the amount of data we return. E.g. if the user has 100 runs, we probably shouldn't return all of those runs due to performance concerns.

Shape of Plotly data

Design

Here's the Figma file.

Possible Implementation

Create a brand new query in the BE that would return all runs and metric information for those runs.

Prerequisite

Determine the shape of the data the FE wants to receive.

Checklist

  • Include labels so that we can categorise your feature request
@tynandebold tynandebold added Issue: Feature Request Python Pull requests that update Python code labels Oct 11, 2022
@tynandebold tynandebold moved this to Todo in Kedro-Viz Oct 11, 2022
@antonymilne
Copy link
Contributor

antonymilne commented Oct 12, 2022

I'm not really here but just a quick comment since I spotted the issue... Performance is absolutely a concern here, and another reason I think we need to answer the questions in #1217. See kedro-org/kedro#1070 (comment) for more here.

I don't want to put a spanner in the works or be a doom-monger, so I'll say that for now I think we should be ok to create the plot using e.g. the last 25 runs without solving all the above. We will then be able to ship the feature and it should work ok for now, but ultimately as experimenting tracking gets more heavily used I think there's going to be performance issues that people start complaining about (especially because there's currently no way to delete runs).

Back of the envelop calculation here to illustrate... Let's say you have 100 runs, each of which has 5 metrics datasets, stored on an s3 server that's 10,000km away from you. These don't seem like unreasonable numbers to me (e.g. unfortunately not that weird to have s3 located far away from the server). Currently we would need to perform 100 * 5 = 500 independent calls to MetricsDataSet.load. Even at the speed of light, each request/response from kedro-viz to s3 would take 20,000km / c = 66ms. For 500 requests that's 500 * 66ms = 33 seconds. That's very slow and doesn't even take into account all the extra time for processing each request on s3 etc. Even if we just limit to 25 runs it's 8 seconds, which still feels very slow.

This is exactly why PAI used to take a huge amount of time (like hours) to start up. The situation there was worse because every time there was a new run you had to restart the app for it to show up - we don't have that problem here on kedro-viz (but it is worth considering exactly when this query gets triggered). The solution in the PAI case was some sort of server-side caching system.

From memory I believe @idanov's solution here might be that tracking.MetricsDataSet works cumulatively, i.e. the metrics json file would record not just metrics from one particular run but all runs up until that point. This way you would just load up one metrics dataset that contained information for the last 100 runs rather than 100 separate files. I'm not at all sure this is a good solution though.

@antonymilne
Copy link
Contributor

Oh, another slight catch here... If you add a new metrics dataset, that will not be fetched by the query without restarting kedro-viz unless --autoreload is used. Fortunately new runs should be automatically fetched already though. From kedro-org/kedro#964:

The way that tracking data is populated and retrieved is worth documenting here:

  • initially populate_data when the app is first loaded, TrackingDatasetRepository is populated by data_access_manager.add_catalog
  • this adds tracking datasets for the whole pipeline (i.e. the registered pipelines dropdown menu and --pipeline argument affect only the flowchart view)
  • If you add a new tracked dataset and do a kedro run then the new dataset won't show up in experiment tracking unless --autoreload is set, because populate_data won't be called again
  • the data for a run_id is only loaded (through dataset._load) when the GraphQL query run_tracking_data is run with that run_id. Subsequent run_tracking_data queries will not perform the load from disk again but instead read out of memory

@limdauto's solution here would be to swap from in-memory to database repository layer, as per #872. This way we would have a sqlite database that would keep track of metrics, thereby also avoiding the above performance issues and providing some way of implementing a search also.

Overall I think it's fine to go ahead with some sort of MVP here without reworking the backend, but be aware that we may very well need to do some significant reworking in future to make it more performant.

@tynandebold
Copy link
Member Author

Thanks, Antony! All great points.

@NeroOkwa NeroOkwa moved this from Todo to In Progress in Kedro-Viz Oct 19, 2022
@yetudada yetudada moved this from In Progress to Todo in Kedro-Viz Nov 1, 2022
@jmholzer jmholzer moved this from Todo to In Progress in Kedro-Viz Nov 3, 2022
@jmholzer jmholzer moved this from In Progress to In Review in Kedro-Viz Nov 14, 2022
@jmholzer jmholzer moved this from In Review to Done in Kedro-Viz Nov 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request Python Pull requests that update Python code
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants