-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding --save-versions to the cli #3038
Comments
Hi, cna you explain why you need to override the save version? The save version is maintained by the framework to ensure it will fetch the latest timestamp, it's not design to be overrided thus it's not available in the CLI, on the other hand it is possible to specify which version to load. |
Hi, Still, I think it can be useful with any datasets where the versioning logic is handled outside of kedro. For example, I am currently working with azure machine learning and here, you can save multiple versions of datasets (called data asset in aml). When you save a new version, azure automatically tracks the timestamp of creation. In these conditions it actually does not make sense to have the version name be a timestamp. I wanted to create a custom dataset that inherit from AbstractVersionedDataset and be able to specify the save version name. |
Make sense, you can extend the CLI for your specific use case. |
Thank you I will do just that. |
Description
I noticed that kedro cli allows to specify
load-versions
at runtime by runningkedro run --load-versions mydataset:myversion
.However I did not find a way to specify the save version.
In
kedro/framework/session/session.py
, in therun
method ofKedroSession
, we even have the following lines:Is there a reason for setting the save_version to the session_id? More generally, is there a design choice for forcing the version to be a timestamp as opposed to anything else?
If not, is it possible to implement the
--save-version
in the cli? Then it could be used like this:kedro run --save-version mysaveversion
This allows to keep the current design choice of being able to specify multiple load versions but a unique save version (as we can see in
_get_catalog
method ofKedroContext
Possible Implementation
We could add
--save-version
as a@click.option
, add it as an argument to therun
method ofKedroSession
Then in the
run
method ofKedroSession
:Extra considerations
Note that I understand that forcing the save_version to be the current timestamp is probably useful to help determine which version should be loaded when we don't specify the load_version (the most recent timestamp). However, I find it strange that this behavior can be overriden when using hooks but not the cli, which is the reason for my question.
The text was updated successfully, but these errors were encountered: