Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding --save-versions to the cli #3038

Closed
Gabriel2409 opened this issue Sep 17, 2023 · 4 comments
Closed

Adding --save-versions to the cli #3038

Gabriel2409 opened this issue Sep 17, 2023 · 4 comments
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@Gabriel2409
Copy link

Description

I noticed that kedro cli allows to specify load-versions at runtime by running kedro run --load-versions mydataset:myversion.
However I did not find a way to specify the save version.

In kedro/framework/session/session.py, in the run method of KedroSession, we even have the following lines:

session_id = self.store["session_id"]
save_version = session_id

Is there a reason for setting the save_version to the session_id? More generally, is there a design choice for forcing the version to be a timestamp as opposed to anything else?

If not, is it possible to implement the --save-version in the cli? Then it could be used like this: kedro run --save-version mysaveversion

This allows to keep the current design choice of being able to specify multiple load versions but a unique save version (as we can see in _get_catalog method of KedroContext

Possible Implementation

We could add --save-version as a @click.option, add it as an argument to the run method of KedroSession

@click.option(
    "--save-version",
    type=str,
    default=""
)

Then in the run method of KedroSession:

session_id = self.store["session_id"]
if not save_version:
	save_version = session_id

Extra considerations

Note that I understand that forcing the save_version to be the current timestamp is probably useful to help determine which version should be loaded when we don't specify the load_version (the most recent timestamp). However, I find it strange that this behavior can be overriden when using hooks but not the cli, which is the reason for my question.

@Gabriel2409 Gabriel2409 added the Issue: Feature Request New feature or improvement to existing feature label Sep 17, 2023
@noklam
Copy link
Contributor

noklam commented Sep 17, 2023

Hi, cna you explain why you need to override the save version? The save version is maintained by the framework to ensure it will fetch the latest timestamp, it's not design to be overrided thus it's not available in the CLI, on the other hand it is possible to specify which version to load.

@Gabriel2409
Copy link
Author

Hi,
I figured it was a design choice.

Still, I think it can be useful with any datasets where the versioning logic is handled outside of kedro.

For example, I am currently working with azure machine learning and here, you can save multiple versions of datasets (called data asset in aml). When you save a new version, azure automatically tracks the timestamp of creation. In these conditions it actually does not make sense to have the version name be a timestamp.

I wanted to create a custom dataset that inherit from AbstractVersionedDataset and be able to specify the save version name.

@noklam
Copy link
Contributor

noklam commented Sep 17, 2023

Make sense, you can extend the CLI for your specific use case.

https://docs.kedro.org/en/stable/extend_kedro/common_use_cases.html#use-case-3-how-to-add-or-modify-cli-commands

@Gabriel2409
Copy link
Author

Thank you I will do just that.
I think we can close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
None yet
Development

No branches or pull requests

2 participants