Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kedro new --starter default to use main/master #1696

Open
2 tasks
noklam opened this issue Jul 8, 2022 · 3 comments
Open
2 tasks

kedro new --starter default to use main/master #1696

noklam opened this issue Jul 8, 2022 · 3 comments

Comments

@noklam
Copy link
Contributor

noklam commented Jul 8, 2022

I've actually always thought (and I think @Galileo-Galilei agrees) that we should change this behaviour since in reality no one maintains a starters repo which they keep up to date with every single version of the kedro template.

  • by default don't try and checkout the kedro version unless it's an official starter (cookiecutter will by default look for the right main/master branch if you don't specify checkout)
  • and/or enable people to specify checkout in their KedroStarterSpec

But that's a discussion for another time.

Originally posted by @AntonyMilneQB in #1695 (comment)

Background

Currently kedro new assumes the starter uses the same version as kedro, i.e. With kedro==0.18.0 it will pull the 0.18.0 kedro-starters template. This makes sense when we only have kedro-starter. We introduced the starters template plugin in 0.18.2 and this assumption no longer holds.

Description

  • Default checkout version should be None, cookiecutters should take cares of pulling the latest version. (Do check if this is the case)
  • kedro official starters should be a special case in this, so we need to keep this backward compatibility where kedro new --starter=pandas-iris still pull the starter version that match kedro version

Example
Kedro version = 0.18.2

  • Official starter: kedro new --starter=pandas-iris - It should pull starters with 0.18.2 tag
  • Community starter: kedro new --starter=plugin_starter - It should pull the main branch when checkout argument isn't provided
  • kedro new --starter=xxxxx --checkout branch_name - It should still respect the checkout argument and the behavior should not be changed.
  • Reach out to @Galileo-Galilei and @WaylonWalker to check if the above proposal would be a breaking change for them. If not, we can introduce this change as non-breaking.
  • Do a poll to check with the wider community
@antonymilne
Copy link
Contributor

Looking back through the discussion in #1265, I can confirm that @Galileo-Galilei independently had the same feeling as me on this:

It makes sense for kedro itself to have a default to the starter matching the kedro version, but i don't think it really does for user defined starters.

He also thought checkout should not become part of KedroStarterSpec, which is fine by me. So the behaviour would be exactly as you describe above 👍

Default checkout version should be None, cookiecutters should take cares of pulling the latest version. (Do check if this is the case)

I can confirm from looking through cookiecutter that this is correct. If checkout is None then it clones the repo and doesn't checkout any particular branch, which means it will automatically be on HEAD. So we don't need to do anything special here - it will work by itself.

The only catch with changing this is that it's a breaking change, so we will need to put in some warning that the behaviour will change when the user calls a starter that's not a built-in one.

@merelcht merelcht added this to the Starter improvements milestone Feb 6, 2023
@astrojuanlu
Copy link
Member

It's unclear to me what problem does this solve, would you folks please clarify? Also, wondering if any of the new project tools affect this.

@antonymilne
Copy link
Contributor

From memory, the main problem goes like this:

  • starter maintainer creates their own starter with kedro 0.18.2 and puts it in a git repo somewhere
  • user pip install kedro==0.18.2 and does kedro new -s ... ➡️ error message saying git can't checkout the starter with tag/branch 0.18.2
  • this is annoying but ok: starter maintainer goes and adds a tag/branch 0.18.2 to their git repo and all works ok
  • user upgrades with pip install kedro==0.18.3
  • user does kedro new -s ... ➡️ error message saying git can't checkout the starter with tag/branch 0.18.3
  • this is getting more annoying now because something that previously worked fine has now broken and the starter maintainer needs to go and add another tag/branch to their git repo
  • annoyance grows with every subsequent upgrade throughout 0.18.x, which is specifically designed to be backwards compatible with any previous 0.18.x project template and yet necessitates manual intervention from starter maintainer in order to have existing workflows not break

The current solution (other than starter maintainer making a new tag every single kedro release) would be for the user to explicitly specify --checkout main or similar. The problem is that this should really be the default behaviour rather than needing to be explicitly specified.

The only possible problem that changing this causes is that it would mean the default --checkout value would be different in official kedro starters (where it would use kedro version) compared to all others (where it would take None and so default to e.g. main branch as above). This is ok by me, although maybe it's worth considering if the same change should actually be made for official kedro starters also.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

4 participants