-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exposing load/save version in hooks or included these information in logs? #1580
Comments
FYI I think it's already possible to achieve this since That's obviously not a good solution though, and it would be nice to have a better way of doing it. Currently we expose |
Is the load version information when load_version = None where it would just look up the latest version? |
Yeah, exactly. From memory that's what Edit: I'm wrong. Looks like |
That's what I found out, that information wasn't expose to the hook at all, what we got is None only. So I think it is almost impossible to implement a solution currently. |
Sorry I have written a more detailed issue originally but Github project actually convert my issue to a blank page so I missed this when I recreated the issue😅 I think this information would be quite valuable to experiment tracking, data versioning is one of the key for reproducible experiment. If this info is stored in session store we can eventually reproduce an experiment with the session_id and extract all the dataset that it used exactly. |
Notes from Technical Design session: It was agreed that the code for loading the latest data/fetching the load version needs further refactoring. Inside After the refactoring is done, we should looking into whether we should expose the load/save version information and how to best do that. |
This wasn't completed in #1911, closed by mistake |
(Created by Nok, converted from Discord Discussion)
Desciption
A user want to have dataset load/save version logged, potentially like this
This is not possible currently as kedro does not track this information, it should belongs to either
How current load version is determined when version=None?
Currently, this
load_version
information is buried deep down in the framework, and it is determined only when a dataset is loaded at runtime.The details of how "latest" version is in a method called
resolve_load_version
, which further calls_fetch_latest_load_version
Further Studies
For some reason,
resolve_load_version
is being called twice, L594 seems to be a leftover from historical refactoring, need further confirmation.kedro/kedro/io/core.py
Lines 558 to 564 in b2e59fa
kedro/kedro/io/core.py
Line 594 in b2e59fa
The refactor PR is here:
f03226e
The text was updated successfully, but these errors were encountered: