-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build out a query to return data for exp. tracking metrics plots #1133
Comments
I'm not really here but just a quick comment since I spotted the issue... Performance is absolutely a concern here, and another reason I think we need to answer the questions in #1217. See kedro-org/kedro#1070 (comment) for more here. I don't want to put a spanner in the works or be a doom-monger, so I'll say that for now I think we should be ok to create the plot using e.g. the last 25 runs without solving all the above. We will then be able to ship the feature and it should work ok for now, but ultimately as experimenting tracking gets more heavily used I think there's going to be performance issues that people start complaining about (especially because there's currently no way to delete runs). Back of the envelop calculation here to illustrate... Let's say you have 100 runs, each of which has 5 metrics datasets, stored on an s3 server that's 10,000km away from you. These don't seem like unreasonable numbers to me (e.g. unfortunately not that weird to have s3 located far away from the server). Currently we would need to perform 100 * 5 = 500 independent calls to This is exactly why PAI used to take a huge amount of time (like hours) to start up. The situation there was worse because every time there was a new run you had to restart the app for it to show up - we don't have that problem here on kedro-viz (but it is worth considering exactly when this query gets triggered). The solution in the PAI case was some sort of server-side caching system. From memory I believe @idanov's solution here might be that |
Oh, another slight catch here... If you add a new metrics dataset, that will not be fetched by the query without restarting kedro-viz unless
@limdauto's solution here would be to swap from in-memory to database repository layer, as per #872. This way we would have a sqlite database that would keep track of metrics, thereby also avoiding the above performance issues and providing some way of implementing a search also. Overall I think it's fine to go ahead with some sort of MVP here without reworking the backend, but be aware that we may very well need to do some significant reworking in future to make it more performant. |
Thanks, Antony! All great points. |
Description
With the development work for experiment tracking metrics plots that will start on Oct. 24th, we'll need to prepare a query that the FE can use to render the metrics plots.
Context
Additionally, we should think about limiting the amount of data we return. E.g. if the user has 100 runs, we probably shouldn't return all of those runs due to performance concerns.
Shape of Plotly data
Design
Here's the Figma file.
Possible Implementation
Create a brand new query in the BE that would return all runs and metric information for those runs.
Prerequisite
Determine the shape of the data the FE wants to receive.
Checklist
The text was updated successfully, but these errors were encountered: