Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ability to kedro run --version 'label' #4028

Open
pascalwhoop opened this issue Jul 24, 2024 · 6 comments
Open

ability to kedro run --version 'label' #4028

pascalwhoop opened this issue Jul 24, 2024 · 6 comments
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@pascalwhoop
Copy link
Contributor

Description

We currently have no control over the naming of the kedro version according to this doc
https://docs.kedro.org/en/stable/data/data_catalog.html

However it would be nice to control the name of the folder that the kedro run creates. E.g.

kedro run --env prod --version v1.0.1

Can we get this?
--load-versions seems to be meant to read a previous version but you still have no control over what the current version name is going to be.

@pascalwhoop pascalwhoop added the Issue: Feature Request New feature or improvement to existing feature label Jul 24, 2024
@datajoely
Copy link
Contributor

So this falls under a wider topic we have about custom session_ids. It was also identified in our #3094 MLOps research.

We have a weird constraint that session_id/versioning needs to be sortable but I still think a composite key of label_{timestamp} would be helpful here.

@pascalwhoop
Copy link
Contributor Author

Giving some human label would be great. That way we can do things like "look at the pubmedbert-v3 run" instead of having to juggle date strings.

I'd prefix the date to make the sort easier though

@noklam
Copy link
Contributor

noklam commented Aug 27, 2024

As stated, we need to ensure a date_prefix so the order is maintained, adding a suffix should be fine. I am pro for this as I have need for this many years ago, can we link this to the versioning research at least?

#3997 @iamelijahko

@pascalwhoop
Copy link
Contributor Author

pascalwhoop commented Aug 28, 2024 via email

@astrojuanlu
Copy link
Member

@pascalwhoop It's unlikely that we add this feature for now (especially since there's a reasonable workaround) so @ankatiyar will check to what extent something like this is possible with DVC and we'll turn this into a Discussion.

@ankatiyar
Copy link
Contributor

ankatiyar commented Feb 5, 2025

I looked into it with DVC, and you can indeed tag versions (but with git). The docs for "Kedro + DVC" integration are under construction 🚧 but and should be out soon but you can refer to this comment or this coffee chat

The versioning in DVC is tied to git commits but you can tag the commits with a label eg.

git add data.dvc model.weights.h5.dvc metrics.csv .gitignore
git commit -m "First model, trained with 1000 images"
git tag -a "v1.0" -m "model v1.0, 1000 images"

And then train a second version and commit this too

git add data.dvc model.weights.h5.dvc metrics.csv
git commit -m "Second model, trained with 2000 images"
git tag -a "v2.0" -m "model v2.0, 2000 images"

And go back to a previous version like this

git checkout v1.0
dvc checkout

ETA: Example above from this DVC tutorial: http://dvc.org/doc/use-cases/versioning-data-and-models/tutorial#tutorial-data-and-model-versioning

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
None yet
Development

No branches or pull requests

6 participants