-
Notifications
You must be signed in to change notification settings - Fork 910
Kedro new project creation ‐ how it works
The kedro new
command allows users to create a new project. This project can be customised to suit the user's needs; they can provide their specifications through several different paths:
Argument | Through interactive flow | Through CLI flag | Through config file |
---|---|---|---|
Project name | Yes | Yes; if not provided, interactive flow will be triggered | Yes; if not provided, error is thrown. |
Tools | Yes | Yes; if not provided, interactive flow will be triggrered | Yes; if not provided, default value of none will be used |
Example pipeline | Yes | Yes; if not provided, interactive flow will be triggered | Yes; if not provided, default value of no will be used |
Starter | No | Yes; cannot be used with tools or example | No |
Checkout | No | Yes; cannot be used without starter, project version used if not provided | |
Directory | No | Yes; cannot be used without a starter, cannot be used with Kedro starter alias | |
Config | No | Yes | No |
Invoking the command will trigger the following execution path:
Let's explore this in a little more detail.
As noted in the table above, some CLI flags cannot be used in combination which each other. At this stage in the execution, we check for the presence for any of the following invalid CLI flag combinations:
- --checkout AND NO --starter
- --directory AND NO --starter
- --starter AND (--tools OR --example)
- --directory AND --starter IF starter provided is one of Kedro starters
After this validation the directory and path to project template are updated according to the inputs, bringing us to the next step:
First, we fetch the path to a cookiecutter template project directory. In this template project, we look at any prompts.yml
in the template and collect the prompts required for the project. If the user's desired project name, tools selection, or example code selection has already been provided through the command flags, we validate them and delete the respective prompts from the collection.
With the collection of necessary prompts, the execution proceeds to the next step.
To proceed, we must first check if a config file is included. If one is included, we don't need to launch the interactive flow.
- Validate the file can be loaded
- Validate tools or example_pipeline selection wasn't included in config if starter was provided
- Validate all necessary prompt values are provided in the config file
- Validate the output directory is valid, if specified
- Validate the provided project name matches the format expected
- Validate the example pipeline selection matches the format expected, and parse to either "True" or "False"
- Validate the tools selected are all valid tools, and that if none or all were selected, they were not selected with any other tools
- Parse the validated selection to full readable names
- For each prompt, get the user's input. Each input is validated against the relevant regex specified in
prompts.yml
- If tools are provided, parse any ranges into a list of numbers, validating that any ranges are correctly specified (smaller to larger number), and that the end of the range isn't outside the range of available tools
- Convert the list of numbers to tools names
- Parse any example pipeline selection to either "True" or "False"
Currently, any values provided by CLI flag will overwrite any provided in config (remember user prompts won't ask for any input if values were provided in the CLI). Tools provided via CLI are parsed into a list of the full tool names.
Though not required by cookiecutter for our project creation, we require some values to be populated in the new project's pyproject.toml
for telemetry purposes. This includes the project's Kedro version, the tools selection, and the example pipeline selection. As the user has no way to specify the former, and is not always required to specify the latter two, we set default values to be used instead.
Tip
When making changes expected in pyproject.toml
, make sure to update the expected values in ProjectMetadata()
accordingly
Note
The default value for tools, str(["None"])
, may strike you as odd, and similarly, the values passed as the tools selection to cookiecutter are all string-wrapped lists. This is done because cookiecutter treats lists as possible options, only populating the placeholders in pyproject.toml
with one item from the list. Instead, to pass the whole list through, we wrap it in a string, and unwrap it when it's populated in the placeholder.
After collecting all the project specifications, we ensure that in the case that a starter was selected, any specified directory and checkout values are passed to cookiecutter to ensure the correct project template is used for creating the project. Additionally, any tools and example pipeline selection will determine which template is used. We collect the path to the correct template project and the specified arguments for cookiecutter, and call cookiecutter()
to create the project.
With cookiecutter, you can specify hooks to run before or after its project creation execution. We make use of the post project generation hook to make changes to our generated project. The template project includes all files and requirements necessary for all tools we provide, before completing the project generation we must ensure it is modified in line with what the user requires.
- We go through every tools option and check if they are included in the user's selection. If they are not included, we remove the related setup for that tool in the generated project
- We sort the requirements in the generated project to be in alphabetical order
Note
We previously created sort requirements as the first iteration would inject the necessary requirements. Now that we opt for removal, is this step still necessary?
Finally, our generated project is now ready and suited to the user's specifications. We print a success message. If no starter was used, we also print the selections for tools and example pipeline. The process then finishes here.
- Contribute to Kedro
- Guidelines for contributing developers
- Contribute changes to Kedro that are tested on Databricks
- Backwards compatibility and breaking changes
- Contribute to the Kedro documentation
- Kedro documentation style guide
- Creating developer documentation
- Kedro new project creation - how it works
- The CI Setup: GitHub Actions
- The Performance Test Setup: Airspeed Velocity
- Kedro Framework team norms & ways of working ⭐️
- Kedro Framework Pull Request and Review team norms ✍️