diff --git a/docs/source/deployment/airflow_astronomer.md b/docs/source/deployment/airflow_astronomer.md index 7d8a27c2e8..703ef48488 100644 --- a/docs/source/deployment/airflow_astronomer.md +++ b/docs/source/deployment/airflow_astronomer.md @@ -15,7 +15,7 @@ The following tutorial uses a different approach and shows how to deploy a Kedro [Astronomer](https://docs.astronomer.io/astro/install-cli) is a managed Airflow platform which allows users to spin up and run an Airflow cluster easily in production. Additionally, it also provides a set of tools to help users get started with Airflow locally in the easiest way possible. -The tutorial discusses how to run the [example Iris classification pipeline](../get_started/new_project.md#create-a-new-project-containing-example-code) on a local Airflow cluster with Astronomer. You may also consider using our [`astro-airflow-iris` starter](https://github.com/kedro-org/kedro-starters/tree/main/astro-airflow-iris) which provides a template containing the boilerplate code that the tutorial describes: +The tutorial discusses how to run the example Iris classification pipeline on a local Airflow cluster with Astronomer. You may also consider using our [`astro-airflow-iris` starter](https://github.com/kedro-org/kedro-starters/tree/main/astro-airflow-iris) which provides a template containing the boilerplate code that the tutorial describes: ```shell kedro new --starter=astro-airflow-iris @@ -44,10 +44,10 @@ To follow this tutorial, ensure you have the following: astro dev init ``` -2. Create a new Kedro project using the `pandas-iris` starter. You can use the default value in the project creation process: +2. Create a new Kedro project using the `astro-airflow-iris` starter. You can use the default value in the project creation process: ```shell - kedro new --starter=pandas-iris + kedro new --starter=astro-airflow-iris ``` 3. Copy all files and directories under `new-kedro-project`, which was the default project name created in step 2, to the root directory so Kedro and Astro CLI share the same project root: diff --git a/docs/source/deployment/distributed.md b/docs/source/deployment/distributed.md index fe21c66dd1..0a1eb9157e 100644 --- a/docs/source/deployment/distributed.md +++ b/docs/source/deployment/distributed.md @@ -40,4 +40,4 @@ We encourage you to play with different ways of parameterising your runs as you ## 4. (Optional) Create starters -This is an optional step, but it may speed up your work in the long term. If you find yourself having to deploy in a similar environment or to a similar platform fairly often, you may want to [build your own Kedro starter](../kedro_project_setup/starters.md). That way you will be able to re-use any deployment scripts written as part of step 2. +You may opt to [build your own Kedro starter](../starters/starters.md) if you regularly have to deploy in a similar environment or to a similar platform. The starter enables you to re-use any deployment scripts written as part of step 2. diff --git a/docs/source/extend_kedro/architecture_overview.md b/docs/source/extend_kedro/architecture_overview.md index d918022bea..7f11253468 100644 --- a/docs/source/extend_kedro/architecture_overview.md +++ b/docs/source/extend_kedro/architecture_overview.md @@ -37,7 +37,7 @@ Kedro framework serves as the interface between a Kedro project and Kedro librar ## Kedro starter -You can use a [Kedro starter](../kedro_project_setup/starters.md) to generate a Kedro project that contains boilerplate code. We maintain a set of [official starters](https://github.com/kedro-org/kedro-starters/) but you can also use a custom starter of your choice. +You can use a [Kedro starter](../starters/starters.md) to generate a Kedro project that contains boilerplate code. We maintain a set of [official starters](https://github.com/kedro-org/kedro-starters/) but you can also use a custom starter of your choice. ## Kedro library diff --git a/docs/source/extend_kedro/common_use_cases.md b/docs/source/extend_kedro/common_use_cases.md index db9717dbdd..0714ea209e 100644 --- a/docs/source/extend_kedro/common_use_cases.md +++ b/docs/source/extend_kedro/common_use_cases.md @@ -39,4 +39,4 @@ Your plugin's implementation can take advantage of other extension mechanisms su ## Use Case 4: How to customise the initial boilerplate of your project -Sometimes you might want to tailor the starting boilerplate of a Kedro project to your specific needs. For example, your organisation might have a standard CI script that you want to include in every new Kedro project. To this end, see the [guide for creating Kedro starters](../kedro_project_setup/starters.md#how-to-create-a-kedro-starter). +Sometimes you might want to tailor the starting boilerplate of a Kedro project to your specific needs. For example, your organisation might have a standard CI script that you want to include in every new Kedro project. To this end, see the [guide for creating Kedro starters](../starters/create_a_starter.md). diff --git a/docs/source/extend_kedro/index.md b/docs/source/extend_kedro/index.md index 5671a2c786..6a7400947c 100644 --- a/docs/source/extend_kedro/index.md +++ b/docs/source/extend_kedro/index.md @@ -6,4 +6,5 @@ common_use_cases plugins architecture_overview +../starters/create_a_starter ``` diff --git a/docs/source/extend_kedro/plugins.md b/docs/source/extend_kedro/plugins.md index de259e65b5..ec1a182933 100644 --- a/docs/source/extend_kedro/plugins.md +++ b/docs/source/extend_kedro/plugins.md @@ -42,49 +42,6 @@ Once the plugin is installed, you can run it as follows: kedro to_json ``` -## Extend starter aliases -It is possible to extend the list of starter aliases built into Kedro. This means that a [custom Kedro starter](../kedro_project_setup/starters.md#how-to-create-a-kedro-starter) can be used directly through the `starter` argument in `kedro new` rather than needing to explicitly provide the `template` and `directory` arguments. A custom starter alias behaves in the same way as an official Kedro starter alias and is also picked up by `kedro starter list`. - -You need to extend the starters by providing a list of `KedroStarterSpec`, in this example it is defined in a file called `plugin.py`. - -Example for a non-git repository starter: -```python -# plugin.py -starters = [ - KedroStarterSpec( - alias="test_plugin_starter", - template_path="your_local_directory/starter_folder", - ) -] -``` - -Example for a git repository starter: -```python -# plugin.py -starters = [ - KedroStarterSpec( - alias="test_plugin_starter", - template_path="https://github.com/kedro-org/kedro-starters/", - directory="spaceflights-pandas", - ) -] -``` - -The `directory` argument is optional and should be used when you have multiple templates in one repository as for the [official kedro-starters](https://github.com/kedro-org/kedro-starters). If you only have one template, your top-level directory will be treated as the template. For an example, see the [spaceflights-pandas starter](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas). - -In your `pyproject.toml`, you need to register the specifications to `kedro.starters`: - -```toml -[project.entry-points."kedro.starters"] -starter = "plugin:starters" -``` - -After that you can use this starter with `kedro new --starter=test_plugin_starter`. - -```{note} -If your starter lives on a git repository, by default Kedro attempts to use a tag or branch labelled with your version of Kedro, e.g. `0.18.12`. This means that you can host different versions of your starter template on the same repository, and the correct one will automatically be used. If you do not wish to follow this structure, you should override it with the `checkout` flag, e.g. `kedro new --starter=test_plugin_starter --checkout=main`. -``` - ## Working with `click` Commands must be provided as [`click` `Groups`](https://click.palletsprojects.com/en/7.x/api/#click.Group) diff --git a/docs/source/faq/faq.md b/docs/source/faq/faq.md index 8652833b9c..99b8f235b9 100644 --- a/docs/source/faq/faq.md +++ b/docs/source/faq/faq.md @@ -2,6 +2,12 @@ This is a growing set of technical FAQs. The [product FAQs on the Kedro website](https://kedro.org/#faq) explain how Kedro can answer the typical use cases and requirements of data scientists, data engineers, machine learning engineers and product owners. + +## Installing Kedro +* [How do I install a development version of Kedro](https://github.com/kedro-org/kedro/wiki/Guidelines-for-contributing-developers)? + +* **How can I check the version of Kedro installed?** To check the version installed, type `kedro -V` in your terminal window. + ## Kedro documentation * {doc}`Where can I find the documentation about Kedro-Viz`? * {doc}`Where can I find the documentation for Kedro's datasets`? @@ -13,7 +19,7 @@ This is a growing set of technical FAQs. The [product FAQs on the Kedro website] ## Kedro project development -* [How do I write my own Kedro starter projects](../kedro_project_setup/starters.md#how-to-create-a-kedro-starter)? +* [How do I write my own Kedro starter projects](../starters/create_a_starter.md)? ## Configuration diff --git a/docs/source/get_started/install.md b/docs/source/get_started/install.md index ed9b857ebb..b55b2ecc34 100644 --- a/docs/source/get_started/install.md +++ b/docs/source/get_started/install.md @@ -162,28 +162,6 @@ When migrating an existing project to a newer Kedro version, make sure you also * For projects generated with versions of Kedro > 0.17.0, you'll do this in the `pyproject.toml` file from the project root directory. * If your project was generated with a version of Kedro <0.17.0, you will instead need to update the `ProjectContext`, which is found in `src//run.py`. -## How to install a development version of Kedro - -This section explains how to try out a development version of Kedro direct from the [Kedro GitHub repository](https://github.com/kedro-org/kedro). - -```{important} -The development version of Kedro is not guaranteed to be bug-free and/or compatible with any of the [stable versions](https://pypi.org/project/kedro/#history). We do not recommend that you use a development version of Kedro in any production systems. Please install and use with caution. -``` - -To try out latest, unreleased functionality from the `develop` branch of the Kedro GitHub repository, run the following installation command: - -```bash -pip install git+https://github.com/kedro-org/kedro.git@develop -``` - -This will install Kedro from the `develop` branch of the GitHub repository, which is always the most up to date. This command will install Kedro from source, unlike `pip install kedro` which installs Kedro from PyPI. - -If you want to roll back to a stable version of Kedro, execute the following in your environment: - -```bash -pip uninstall kedro -y -pip install kedro -``` ## Summary diff --git a/docs/source/get_started/new_project.md b/docs/source/get_started/new_project.md index 5499a09fae..4d4bb4fcd8 100644 --- a/docs/source/get_started/new_project.md +++ b/docs/source/get_started/new_project.md @@ -1,29 +1,20 @@ # Create a new Kedro project -## Summary +There are several ways to create a new Kedro project. This page explains the flow to create a basic project using `kedro new` to output a project directory containing the basic files and subdirectories that make up a Kedro project. -There are a few ways to create a new project once you have [set up Kedro](install.md): +You can also create a new Kedro project with a starter that adds a set of code for a common project use case. [Starters are explained separately](../starters/starters.md) later in the documentation set and illustrated with the [spaceflights tutorial](../tutorial/tutorial_template.md). -* You can use `kedro new` to [create a basic Kedro project](#create-a-new-empty-project) containing project directories and basic code, but empty to extend as you need. -* You can use `kedro new` and [pass in a configuration file](#create-a-new-project-from-a-configuration-file) to manually control project details such as the name, folder and package name. -* You can [create a Kedro project populated with template code](#create-a-new-project-containing-example-code) that acts as a starter example. This guide illustrates with the `pandas-iris` starter, and there is a [range of Kedro starter projects](../kedro_project_setup/starters.md#list-of-official-starters). +## Introducing `kedro new` - -Once you've created a project: - -* You need to **navigate to its project folder** and **install its dependencies**: `pip install -r requirements.txt` -* **To run the project**: `kedro run` -* **To visualise the project**: `kedro viz` - -## Create a new empty project - -The simplest way to create a default Kedro project is to navigate to your preferred directory and type: +You can create a basic Kedro project containing the default code needed to set up your own nodes and pipelines. Navigate to your preferred directory and type: ```bash kedro new ``` -Enter a name for the project, which can be human-readable and may contain alphanumeric symbols, spaces, underscores and hyphens. It must be at least two characters long. +### Project name + +The command line interface then asks you to enter a name for the project. This is the human-readable name, and it may contain alphanumeric symbols, spaces, underscores, and hyphens. It must be at least two characters long. It's best to keep the name simple because the choice is set as the value of `project_name` and is also used to generate the folder and package names for the project automatically. @@ -35,48 +26,27 @@ So, if you enter "Get Started", the folder for the project (`repo_name`) is auto | Local directory to store the project | `repo_name` | `get-started` | | The Python package name for the project (short, all-lowercase) | `python_package` | `get_started` | +### Project tools -The output of `kedro new` is a directory containing all the project files and subdirectories required for a basic Kedro project, ready to extend with the code. - -## Create a new project from a configuration file - -To customise a new project's directory and package name, use a configuration file to specify those values. The configuration file must contain: - -- `output_dir` The path in which to create the project directory -- `project_name` -- `repo_name` -- `python_package` +The command line interface then asks which tools you'd like to include in the project. The options are as follows and described in more detail above in the [documentation about the new project tools](../starters/new_project_tools.md). -The `output_dir` can be set to customised. For example, `~` for the home directory or `.` for the current working directory. Here is an example `config.yml`, which assumes that a directory named `~/code` already exists: +You can add one or more of the options, or follow the default and add none at all: -```yaml -output_dir: ~/code -project_name: My First Kedro Project -repo_name: testing-kedro -python_package: test_kedro -``` - -To create this new project: - -```bash -kedro new --config=/config.yml -``` +* Linting: A basic linting setup with Black and ruff +* Testing: A basic testing setup with pytest +* Custom Logging: Additional logging options +* Documentation: Configuration for basic documentation built with Sphinx +* Data Structure: The [directory structure](../faq/faq.md#what-is-data-engineering-convention) for storing data locally +* PySpark: Setup and configuration for working with PySpark +* Kedro Viz: Kedro's native visualisation tool. -## Create a new project containing example code +### Project examples -Use a [Kedro starter](../kedro_project_setup/starters.md) to create a project containing template code, to run as-is or to adapt and extend. +TO DO -The following illustrates a project created with example code based on the familiar [Iris dataset](https://www.kaggle.com/uciml/iris). +## Run the new project -The first step is to create the Kedro project using a starter to add the example code and data. - -```bash -kedro new --starter=pandas-iris -``` - -## Run the project - -However you create a Kedro project, once `kedro new` has completed, the next step is to navigate to the project folder (`cd `) and install dependencies with `pip` as follows: +Whichever options you selected for tools and example code, once `kedro new` has completed, the next step is to navigate to the project folder (`cd `) and install dependencies with `pip` as follows: ```bash pip install -r requirements.txt @@ -102,7 +72,7 @@ The Kedro-Viz package needs to be installed into your virtual environment separa pip install kedro-viz ``` -To start Kedro-Viz, enter the following in your terminal: +To start Kedro-Viz, navigate to the project folder (`cd `) and enter the following in your terminal: ```bash kedro viz @@ -113,7 +83,7 @@ This command automatically opens a browser tab to serve the visualisation at `ht To exit the visualisation, close the browser tab. To regain control of the terminal, enter `^+c` on Mac or `Ctrl+c` on Windows or Linux machines. ## Where next? -You have completed the section on Kedro project creation for new users. Now choose how to learn more: +You have completed the section on Kedro project creation for new users. Here are some useful resources to learn more: * Understand more about Kedro: The following page explains the [fundamental Kedro concepts](./kedro_concepts.md). @@ -122,35 +92,3 @@ You have completed the section on Kedro project creation for new users. Now choo * How-to guide for notebook users: The documentation section following the tutorial explains [how to combine Kedro with a Jupyter notebook](../notebooks_and_ipython/kedro_and_notebooks.md). If you've worked through the documentation listed and are unsure where to go next, review the [Kedro repositories on GitHub](https://github.com/kedro-org) and [Kedro's Slack channels](https://slack.kedro.org). - - -## More information about the `pandas-iris` example project - -If you used the `pandas-iris` starter to create an example project, the rest of this page gives further information. - -
-Expand for more details. - -### Background information -The Iris dataset was generated in 1936 by the British statistician and biologist Ronald Fisher. The dataset contains 150 samples, comprising 50 each of 3 different species of Iris plant (*Iris Setosa*, *Iris Versicolour* and *Iris Virginica*). For each sample, the flower measurements are recorded for the sepal length, sepal width, petal length and petal width. - -![](../meta/images/iris_measurements.png) - -A machine learning model can use the Iris dataset to illustrate classification (a method used to determine the type of an object by comparison with similar objects that have previously been categorised). Once trained on known data, the machine learning model can make a predictive classification by comparing a test object to the output of its training data. - -The Kedro starter contains a single [pipeline](../resources/glossary.md#pipeline) comprising three [nodes](../resources/glossary.md#node) responsible for splitting the data into training and testing samples, running a 1-nearest neighbour classifier algorithm to make predictions and accuracy-reporting. - -The nodes are stored in `src/get_started/nodes.py`: - -| Node | Description | -| --------------- | ----------------------------------------------------------------------------------- | -| `split_data` | Splits the example Iris dataset into train and test samples | -| `make_predictions`| Makes class predictions (using 1-nearest neighbour classifier and train-test set) | -| `report_accuracy` | Reports the accuracy of the predictions performed by the previous node. | - -### Iris example: visualisation - -If you [visualise your project with Kedro-Viz](#visualise-a-kedro-project) you should see the following: - -![](../meta/images/pipeline_visualisation_iris_starter.png) -
diff --git a/docs/source/hooks/examples.md b/docs/source/hooks/examples.md index c60724a7d9..54e584d89c 100644 --- a/docs/source/hooks/examples.md +++ b/docs/source/hooks/examples.md @@ -72,26 +72,6 @@ Then re-run the pipeline: $ kedro run ``` -The output should look similar to the following: - -``` -... -[01/25/23 21:38:23] INFO Loading data from 'example_iris_data' (CSVDataset)... data_catalog.py:343 - INFO Loading example_iris_data consumed 0.99MiB memory hooks.py:67 - INFO Loading data from 'parameters' (MemoryDataset)... data_catalog.py:343 - INFO Loading parameters consumed 0.48MiB memory hooks.py:67 - INFO Running node: split: split_data([example_iris_data,parameters]) -> [X_train,X_test,y_train,y_test] node.py:327 - INFO Saving data to 'X_train' (MemoryDataset)... data_catalog.py:382 - INFO Saving data to 'X_test' (MemoryDataset)... data_catalog.py:382 - INFO Saving data to 'y_train' (MemoryDataset)... data_catalog.py:382 - INFO Saving data to 'y_test' (MemoryDataset)... data_catalog.py:382 - INFO Completed 1 out of 3 tasks sequential_runner.py:85 - INFO Loading data from 'X_train' (MemoryDataset)... data_catalog.py:343 - INFO Loading X_train consumed 0.49MiB memory hooks.py:67 - INFO Loading data from 'X_test' (MemoryDataset)... -... -``` - ## Add data validation This example adds data validation to node inputs and outputs using [Great Expectations](https://docs.greatexpectations.io/en/latest/). diff --git a/docs/source/index.rst b/docs/source/index.rst index 7216658d28..d4af27077c 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -60,10 +60,6 @@ Welcome to Kedro's documentation! :caption: Learn about Kedro introduction/index.md - -.. toctree:: - :maxdepth: 2 - get_started/index.md .. toctree:: @@ -71,16 +67,7 @@ Welcome to Kedro's documentation! :caption: Tutorial and basic Kedro usage tutorial/spaceflights_tutorial.md - - -.. toctree:: - :maxdepth: 2 - visualisation/index.md - -.. toctree:: - :maxdepth: 2 - notebooks_and_ipython/index.md resources/index.md @@ -88,16 +75,9 @@ Welcome to Kedro's documentation! :maxdepth: 2 :caption: Kedro projects + starters/index.md configuration/index.md - -.. toctree:: - :maxdepth: 2 - data/index.md - -.. toctree:: - :maxdepth: 2 - nodes_and_pipelines/index.md .. toctree:: @@ -105,35 +85,11 @@ Welcome to Kedro's documentation! :caption: Advanced usage kedro_project_setup/index.md - -.. toctree:: - :maxdepth: 2 - extend_kedro/index.md - -.. toctree:: - :maxdepth: 2 - hooks/index.md - -.. toctree:: - :maxdepth: 2 - logging/index.md - -.. toctree:: - :maxdepth: 2 - integrations/pyspark_integration.md - -.. toctree:: - :maxdepth: 2 - development/index.md - -.. toctree:: - :maxdepth: 2 - deployment/index.md .. toctree:: diff --git a/docs/source/kedro_project_setup/index.md b/docs/source/kedro_project_setup/index.md index 609b4d32b8..5112a51094 100644 --- a/docs/source/kedro_project_setup/index.md +++ b/docs/source/kedro_project_setup/index.md @@ -3,7 +3,6 @@ ```{toctree} :maxdepth: 1 -starters dependencies session settings diff --git a/docs/source/kedro_project_setup/starters.md b/docs/source/kedro_project_setup/starters.md deleted file mode 100644 index 3b597b86e8..0000000000 --- a/docs/source/kedro_project_setup/starters.md +++ /dev/null @@ -1,171 +0,0 @@ -# Kedro starters - -A Kedro starter contains code in the form of a [Cookiecutter](https://cookiecutter.readthedocs.io/en/1.7.2/) template for a Kedro project. Metaphorically, a starter is similar to using a pre-defined layout when creating a presentation or document. - -Kedro starters provide pre-defined example code and configuration that can be reused, for example: - -* As template code for a typical Kedro project -* To add a `docker-compose` setup to launch Kedro next to a monitoring stack -* To add deployment scripts and CI/CD setup for your targeted infrastructure - -You can create your own starters for reuse within a project or team, as described in the documentation about [how to create a Kedro starter](../kedro_project_setup/starters.md#how-to-create-a-kedro-starter). - -## How to use Kedro starters - -To create a Kedro project using a starter, apply the `--starter` flag to `kedro new`: - -```bash -kedro new --starter= -``` - -```{note} -`path-to-starter` could be a local directory or a VCS repository, as long as [Cookiecutter](https://cookiecutter.readthedocs.io/en/1.7.2/usage.html) supports it. -``` - -To create a project using the `PySpark` starter: - -```bash -kedro new --starter=pyspark -``` - -## Starter aliases - -We provide aliases for common starters maintained by the Kedro team so that users don't have to specify the full path. For example, to use the `PySpark` starter to create a project: - -```bash -kedro new --starter=pyspark -``` - -To list all the aliases we support: - -```bash -kedro starter list -``` - -## List of official starters - -The Kedro team maintains the following starters for a range of Kedro projects: - -* [`astro-airflow-iris`](https://github.com/kedro-org/kedro-starters/tree/main/astro-airflow-iris): The [Kedro Iris dataset example project](../get_started/new_project.md) with a minimal setup for deploying the pipeline on Airflow with [Astronomer](https://www.astronomer.io/). -* [`spaceflights-pandas`](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas): The [spaceflights tutorial](../tutorial/spaceflights_tutorial.md) example code with `pandas` datasets. -* [`spaceflights-pandas-viz`](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas-viz): The [spaceflights tutorial](../tutorial/spaceflights_tutorial.md) example code with `pandas` datasets and visualisation and experiment tracking `kedro-viz` features. -* [`spaceflights-pyspark`](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pyspark): The [spaceflights tutorial](../tutorial/spaceflights_tutorial.md) example code with `pyspark` datasets. -* [`spaceflights-pyspark-viz`](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pyspark-viz): The [spaceflights tutorial](../tutorial/spaceflights_tutorial.md) example code with `pyspark` datasets and visualisation and experiment tracking `kedro-viz` features. - -## Starter versioning - -By default, Kedro will use the latest version available in the repository, but if you want to use a specific version of a starter, you can pass a `--checkout` argument to the command: - -```bash -kedro new --starter=pyspark --checkout=0.1.0 -``` - -The `--checkout` value points to a branch, tag or commit in the starter repository. - -Under the hood, the value will be passed to the [`--checkout` flag in Cookiecutter](https://cookiecutter.readthedocs.io/en/1.7.2/usage.html#works-directly-with-git-and-hg-mercurial-repos-too). - - -## Use a starter with a configuration file - -By default, when you create a new project using a starter, `kedro new` asks you to enter the `project_name`, which it uses to set the `repo_name` and `python_package` name. This is the same behavior as when you [create a new empty project](../get_started/new_project.md#create-a-new-empty-project) - -However, Kedro also allows you to [specify a configuration file](../get_started/new_project.md#create-a-new-project-from-a-configuration-file) when you create a project using a Kedro starter. Use the `--config` flag alongside the starter: - -```bash -kedro new --config=my_kedro_pyspark_project.yml --starter=pyspark -``` - -This option is useful when the starter requires more configuration than the default mode requires. - -## How to create a Kedro starter - -Kedro starters are used to create projects that contain code to run as-is, or to adapt and extend. A good example is the Iris dataset example of basic Kedro project layout, configuration and initialisation code. A team may find it useful to build Kedro starters to create reusable projects that bootstrap a common base and can be extended. - -A Kedro starter is a [Cookiecutter](https://cookiecutter.readthedocs.io/en/1.7.2/) template that contains the boilerplate code for a Kedro project. - -To create a Kedro starter, you need a base project to convert to a `cookiecutter` template, which forms the boilerplate for all projects that use the Kedro starter. - -Install `cookiecutter` as follows: - -```bash -pip install cookiecutter -``` - -You then need to decide which are: - -* the common, boilerplate parts of the project -* the configurable elements, which need to be replaced by `cookiecutter` strings - -### Configuration variables - -By default, when you create a new project using a Kedro starter, `kedro new` launches in interactive mode. The user is then prompted for the variables that have been set in `prompts.yml`. - -The most basic and empty starter triggered by `kedro new` is set up with the following variable: - -* `project_name` - A human readable name for the new project - -Kedro will then automatically generate the following two variables from the entered `project_name`: - -* `repo_name` - A name for the directory that holds the project repository -* `python_package` - A Python package name for the project package (see [Python package naming conventions](https://www.python.org/dev/peps/pep-0008/#package-and-module-names)) - -See the configuration for this basic configuration in [the default starter setup](https://github.com/kedro-org/kedro/blob/main/kedro/templates/project/prompts.yml). - -As the creator of the Kedro starter you can customise the prompts triggered by `kedro new` by adding your own prompts in `prompts.yml`. This is an example of a custom prompt: - -```yaml -custom_prompt: - title: "Prompt title" - text: | - Prompt description that explains to the user what - information they should provide. -``` - -At the very least, the prompt `title` must be defined for the prompt to be valid. After Kedro gets the user's input for each prompt, we pass the value to [`cookiecutter`](https://cookiecutter.readthedocs.io/en/1.7.2/), so every key in your `prompts.yml` must have a corresponding key in [`cookiecutter.json`](https://cookiecutter.readthedocs.io/en/1.7.2/tutorial1.html#cookiecutter-json). - -If the input to the prompts needs to be **validated**, for example to make sure it only has alphanumeric characters, you can add regex validation rules via the `regex_validator` key. For more complex validation, have a look at [cookiecutter pre/post-generate hooks](https://cookiecutter.readthedocs.io/en/1.7.2/advanced/hooks.html#using-pre-post-generate-hooks-0-7-0). - -If you want `cookiecutter` to provide sensible **defaults** in case a user doesn't provide any input, you can add those to `cookiecutter.json`. See [the default starter `cookiecutter.json`](https://github.com/kedro-org/kedro/blob/main/kedro/templates/project/cookiecutter.json) as example. - -### Example Kedro starter - -To review an example Kedro starter, check out the [`pandas-iris` starter on GitHub](https://github.com/kedro-org/kedro-starters/tree/main/pandas-iris). - -When you create an Iris dataset example project by calling `kedro new`, you supply configuration variables as the documentation in [Create a new project](../get_started/new_project.md) describes. When you go through the interactive flow you must supply the `project_name` variable, which is then used to generate the `repo_name` and `python_package` variables. If you use a configuration file, you must supply all three variables in the file. You can see how these variables are used by inspecting the template: - -**project_name** - -The human-readable `project_name` variable is used in the [README.md](https://github.com/kedro-org/kedro-starters/tree/main/pandas-iris/README.md) for the new project. - -**repo_name** - -The project structure contains a folder labelled [`{{ cookiecutter.repo_name }}`](https://github.com/kedro-org/kedro-starters/tree/0.18.14/pandas-iris/%7B%7B%20cookiecutter.repo_name%20%7D%7D), which forms the top-level folder to contain the Iris dataset example when it is created. The folder storing the example project is represented by `cookiecutter.repo_name`, which is a customisable variable, as you would expect. - -**python_package** - -Within the parent folder, inside the `src` subfolder, is another configurable variable [{{ cookiecutter.python_package }}](https://github.com/kedro-org/kedro-starters/tree/0.18.14/pandas-iris/%7B%7B%20cookiecutter.repo_name%20%7D%7D/src/%7B%7B%20cookiecutter.python_package%20%7D%7D) which contains the source code for the example pipelines. The variable is also used within [`__main__.py`](https://github.com/kedro-org/kedro-starters/tree/0.18.14/pandas-iris/%7B%7B%20cookiecutter.repo_name%20%7D%7D/src/%7B%7B%20cookiecutter.python_package%20%7D%7D/__main__.py). - -Here is the layout of the project as a Cookiecutter template: - -``` -{{ cookiecutter.repo_name }} # Parent directory of the template -├── conf # Project configuration files -├── data # Local project data (not committed to version control) -├── docs # Project documentation -├── notebooks # Project related Jupyter notebooks (can be used for experimental code before moving the code to src) -├── README.md # Project README -└── src # Project source code - └── {{ cookiecutter.python_package }} - ├── __init.py__ - ├── pipelines - ├── pipeline_registry.py - ├── __main__.py - └── settings.py - ├── requirements.txt - ├── pyproject.toml - └── tests -``` - -```{note} -You can [add an alias by creating a plugin using `kedro.starters` entry point](../extend_kedro/plugins.md#extend-starter-aliases), which will allows you to do `kedro new --starter=your_starters` and shows up on shows up on `kedro starter list`. -``` diff --git a/docs/source/meta/images/KedroArchitecture.drawio b/docs/source/meta/images/KedroArchitecture.drawio index 8256737edf..a7fb121e8c 100644 --- a/docs/source/meta/images/KedroArchitecture.drawio +++ b/docs/source/meta/images/KedroArchitecture.drawio @@ -1,6 +1,6 @@ - + - + @@ -40,10 +40,7 @@ - - - - + @@ -139,14 +136,8 @@ - - - - - - - - + + @@ -154,9 +145,6 @@ - - - @@ -228,12 +216,15 @@ - - + + - - + + + + + @@ -263,7 +254,7 @@ - + diff --git a/docs/source/meta/images/kedro_architecture.png b/docs/source/meta/images/kedro_architecture.png index 0d7fd896ba..bcc3cfef9d 100644 Binary files a/docs/source/meta/images/kedro_architecture.png and b/docs/source/meta/images/kedro_architecture.png differ diff --git a/docs/source/nodes_and_pipelines/modular_pipelines.md b/docs/source/nodes_and_pipelines/modular_pipelines.md index 09c0d5b54e..c7c104d7e8 100644 --- a/docs/source/nodes_and_pipelines/modular_pipelines.md +++ b/docs/source/nodes_and_pipelines/modular_pipelines.md @@ -103,7 +103,7 @@ Pipeline templates are rendered using [Cookiecutter](https://cookiecutter.readth See the [`cookiecutter.json` file in the Kedro default template](https://github.com/kedro-org/kedro/tree/main/kedro/templates/pipeline/cookiecutter.json) for an example. It is important to note that if you are embedding your custom pipeline template within a Kedro starter template, you must tell Cookiecutter not to render this template when creating a new project from the starter. To do this, -you must add [`_copy_without_render: ["templates"]`](https://cookiecutter.readthedocs.io/en/latest/advanced/copy_without_render.html) to the `cookiecutter.json` file for the starter +you must add [`_copy_without_render: ["templates"]`](https://cookiecutter.readthedocs.io/en/stable/advanced/copy_without_render.html) to the `cookiecutter.json` file for the starter and not the `cookiecutter.json` for the pipeline template. ### Ensuring portability diff --git a/docs/source/nodes_and_pipelines/nodes.md b/docs/source/nodes_and_pipelines/nodes.md index a1106ee1ae..77da140d3b 100644 --- a/docs/source/nodes_and_pipelines/nodes.md +++ b/docs/source/nodes_and_pipelines/nodes.md @@ -4,7 +4,7 @@ In this section, we introduce the concept of a node, for which the relevant API Nodes are the building blocks of pipelines, and represent tasks. Pipelines are used to combine nodes to build workflows, which range from simple machine learning workflows to end-to-end (E2E) production workflows. -You must first import libraries from Kedro and other standard tools to run the code snippets demonstrated below. +You must first import libraries from Kedro and other standard tools to run the code snippets below. ```python from kedro.pipeline import * @@ -184,18 +184,24 @@ You can also call a node as a regular Python function: `adder_node(dict(a=2, b=3 ## How to use generator functions in a node +```{warning} +This documentation section uses the `pandas-iris` starter that is unavailable in Kedro version 0.19.0 and beyond. The latest version of Kedro that supports `pandas-iris` is Kedro 0.18.14: install that or an earlier version to work through this example `pip install kedro==0.18.14`). + +To check the version installed, type `kedro -V` in your terminal window. +``` + [Generator functions](https://learnpython.org/en/Generators) were introduced with [PEP 255](https://www.python.org/dev/peps/pep-0255) and are a special kind of function in Python that returns lazy iterators. They are often used for lazy-loading or lazy-saving of data, which can be particularly useful when dealing with large datasets that do not fit entirely into memory. In the context of Kedro, generator functions can be used in nodes to efficiently process and handle such large datasets. ### Set up the project -To demonstrate the use of generator functions in Kedro nodes, first, set up a Kedro project using the `pandas-iris` starter. If you haven't already created a Kedro project, you can follow the [get started guide](../get_started/new_project.md#create-a-new-project-containing-example-code) to create it. +Set up a Kedro project using the legacy `pandas-iris` starter. Create the project with this command, assuming Kedro version 0.18.14: -Create the project with this command: ```bash -kedro new -s pandas-iris +kedro new --starter=pandas-iris --checkout=0.18.14 ``` ### Loading data with generators + To use generator functions in Kedro nodes, you need to update the `catalog.yml` file to include the `chunksize` argument for the relevant dataset that will be processed using the generator. You need to add a new dataset in your `catalog.yml` as follows: diff --git a/docs/source/nodes_and_pipelines/run_a_pipeline.md b/docs/source/nodes_and_pipelines/run_a_pipeline.md index 44dbc85c29..4eaa06c296 100644 --- a/docs/source/nodes_and_pipelines/run_a_pipeline.md +++ b/docs/source/nodes_and_pipelines/run_a_pipeline.md @@ -144,7 +144,6 @@ If a node has multiple inputs or outputs (e.g., `node(func, ["a", "b", "c"], ["d $ kedro run --async ... 2020-03-24 09:20:01,482 - kedro.runner.sequential_runner - INFO - Asynchronous mode is enabled for loading and saving data -2020-03-24 09:20:01,483 - kedro.io.data_catalog - INFO - Loading data from `example_iris_data` (CSVDataset)... ... ``` diff --git a/docs/source/notebooks_and_ipython/kedro_and_notebooks.md b/docs/source/notebooks_and_ipython/kedro_and_notebooks.md index 80eb0f9740..9af34b4e92 100644 --- a/docs/source/notebooks_and_ipython/kedro_and_notebooks.md +++ b/docs/source/notebooks_and_ipython/kedro_and_notebooks.md @@ -4,15 +4,11 @@ This page explains how to use a Jupyter notebook to explore elements of a Kedro This page also explains how to use line magic to display a Kedro-Viz visualisation of your pipeline directly in your notebook. -## Iris dataset example +## Example project -Create a sample Kedro project with the [`pandas-iris` starter](https://github.com/kedro-org/kedro-starters/tree/main/pandas-iris) as we showed in the [get started documentation](../get_started/new_project.md#create-a-new-project-containing-example-code): +The example adds a notebook to experiment with the retired [`pandas-iris` starter](https://github.com/kedro-org/kedro-starters/tree/main/pandas-iris). As an alternative, you can follow the example using a different starter, such as [`spaceflights-pandas`](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas) or just add a notebook to your own project. -```bash -kedro new --starter=pandas-iris -``` - -We will assume you call the project `iris`, but you can call it whatever you choose. +We will assume the example project is called `iris`, but you can call it whatever you choose. Navigate to the project directory (`cd iris`) and issue the following command in the terminal to launch Jupyter: diff --git a/docs/source/resources/glossary.md b/docs/source/resources/glossary.md index bef605dc2c..477f6ccc3e 100644 --- a/docs/source/resources/glossary.md +++ b/docs/source/resources/glossary.md @@ -89,7 +89,7 @@ Runners are different execution mechanisms to run pipelines with the specified d ## Starters Kedro starters are used to create projects that contain code to run as-is, or to adapt and extend. They provide pre-defined example code and configuration that can be reused. A Kedro starter is a [Cookiecutter template](https://cookiecutter.readthedocs.io/) that contains the boilerplate code for a Kedro project. -[Further information about Kedro starters](../kedro_project_setup/starters.md) +[Further information about Kedro starters](../starters/starters.md) ## Tags You can apply tags to nodes or pipelines as a means of filtering which are executed. diff --git a/docs/source/starters/create_a_starter.md b/docs/source/starters/create_a_starter.md new file mode 100644 index 0000000000..f988f0245d --- /dev/null +++ b/docs/source/starters/create_a_starter.md @@ -0,0 +1,132 @@ +# Create a Kedro starter + +Kedro starters are a useful way to create a new project that contains code to run as-is, or to adapt and extend. + +A team may find it useful to build Kedro starters to create reusable projects that bootstrap a common base and can be extended. + +## Install the `cookiecutter` package +A Kedro starter is a [Cookiecutter](https://cookiecutter.readthedocs.io/) template that contains the boilerplate code for a Kedro project. First install `cookiecutter` as follows: + +```bash +pip install cookiecutter +``` + +To create a Kedro starter, you need a base project to convert to a template, which forms the boilerplate for all projects that use it. You then need to decide which are: + +* the common, boilerplate parts of the project +* the configurable elements, which need to be replaced by `cookiecutter` strings + +## Custom project creation variables + +When you create a new project using a Kedro starter, `kedro new` prompts you for a project name. This variable (`project_name`) is set in the [default starter setup](https://github.com/kedro-org/kedro/blob/main/kedro/templates/project/prompts.yml) in`prompts.yml`. + +Kedro automatically generates the following two variables from the entered `project_name`: + +* `repo_name` - A name for the directory that holds the project repository +* `python_package` - A Python package name for the project package (see [Python package naming conventions](https://www.python.org/dev/peps/pep-0008/#package-and-module-names)) + +As a starter creator, you can customise the prompts triggered by `kedro new` by adding your own prompts into the `prompts.yml` file in the root of your template. This is an example of a custom prompt: + +```yaml +custom_prompt: + title: "Prompt title" + text: | + Prompt description that explains to the user what + information they should provide. +``` + +At the very least, the prompt `title` must be defined for the prompt to be valid. After Kedro receives the user's input for each prompt, it passes the value to Cookiecutter, so every key in `prompts.yml` must have a corresponding key in [`cookiecutter.json`](https://cookiecutter.readthedocs.io/en/stable/tutorials/tutorial1.html#cookiecutter-json). + +If the input to the prompts needs to be validated, for example to make sure it only has alphanumeric characters, you can add regex validation rules via the `regex_validator` key. Consider using [cookiecutter pre/post-generate hooks](https://cookiecutter.readthedocs.io/en/stable/advanced/hooks.html) for more complex validation. + +If you want `cookiecutter` to provide sensible default values, in case a user doesn't provide any input, you can add those to `cookiecutter.json`. See [the default starter `cookiecutter.json`](https://github.com/kedro-org/kedro/blob/main/kedro/templates/project/cookiecutter.json) as example. + +## Example Kedro starter + +To review an example Kedro starter, check out the [`spaceflights-pandas` starter on GitHub](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas). + +When a new `spaceflights-pandas` project is created with `kedro new --starter=spaceflights-pandas`, the user is asked to enter a `project_name` variable, which is then used to generate the `repo_name` and `python_package` variables by default. + +If you use a configuration file, you must supply all three variables in the file. You can see how these variables are used by [inspecting the template](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas/%7B%7B%20cookiecutter.repo_name%20%7D%7D): + +### `project_name` + +The human-readable `project_name` variable is used in the [README.md](https://github.com/kedro-org/kedro-starters/blob/main/spaceflights-pandas/%7B%7B%20cookiecutter.repo_name%20%7D%7D/README.md) for the new project. + +### `repo_name` + +The top-level folder labelled [`{{ cookiecutter.repo_name }}`](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas/%7B%7B%20cookiecutter.repo_name%20%7D%7D), which forms the top-level folder to contain the starter project when it is created. + +### `python_package` + +Within the parent folder, inside the `src` subfolder, is another configurable variable [{{ cookiecutter.python_package }}](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas/%7B%7B%20cookiecutter.repo_name%20%7D%7D/src/%7B%7B%20cookiecutter.python_package%20%7D%7D) which contains the source code for the example pipelines. The variable is also used within [`__main__.py`](https://github.com/kedro-org/kedro-starters/blob/main/spaceflights-pandas/%7B%7B%20cookiecutter.repo_name%20%7D%7D/src/%7B%7B%20cookiecutter.python_package%20%7D%7D/__main__.py). + +Here is the layout of the project as a Cookiecutter template: + +``` +{{ cookiecutter.repo_name }} # Parent directory of the template +├── conf # Project configuration files +├── data # Local project data (not committed to version control) +├── docs # Project documentation +├── notebooks # Project related Jupyter notebooks (can be used for experimental code before moving the code to src) +├── pyproject.toml # +├── README.md # Project README +├── requirements.txt +└── src # Project source code + └── {{ cookiecutter.python_package }} + ├── __init.py__ + ├── pipelines + ├── pipeline_registry.py + ├── __main__.py + └── settings.py +└── tests +``` + + +## Extend starter aliases + +You can add an alias by creating a plugin using `kedro.starters` entry point which enables you to call `kedro new --starter=your_starters`. That is, it can be used directly through the `starter` argument in `kedro new` rather than needing to explicitly provide the `template` and `directory` arguments. + +A custom starter alias behaves in the same way as an official Kedro starter alias and is also picked up by the command `kedro starter list`. + +You need to extend the starters by providing a list of `KedroStarterSpec`, in this example it is defined in a file called `plugin.py`. + +Example for a non-git repository starter: + +```python +# plugin.py +starters = [ + KedroStarterSpec( + alias="test_plugin_starter", + template_path="your_local_directory/starter_folder", + ) +] +``` + +Example for a git repository starter: + +```python +# plugin.py +starters = [ + KedroStarterSpec( + alias="test_plugin_starter", + template_path="https://github.com/kedro-org/kedro-starters/", + directory="spaceflights-pandas", + ) +] +``` + +The `directory` argument is optional and should be used when you have multiple templates in one repository as for the [official kedro-starters](https://github.com/kedro-org/kedro-starters). If you only have one template, your top-level directory will be treated as the template. + +In your `pyproject.toml`, you need to register the specifications to `kedro.starters`: + +```toml +[project.entry-points."kedro.starters"] +starter = "plugin:starters" +``` + +After that you can use this starter with `kedro new --starter=test_plugin_starter`. + +```{note} +If your starter is stored on a git repository, Kedro defaults to use a tag or branch labelled with your version of Kedro, e.g. `0.18.12`. This means that you can host different versions of your starter template on the same repository, and the correct one will be used automatically. If you prefer not to follow this structure, you should override it with the `checkout` flag, e.g. `kedro new --starter=test_plugin_starter --checkout=main`. +``` diff --git a/docs/source/starters/index.md b/docs/source/starters/index.md new file mode 100644 index 0000000000..84a088a91d --- /dev/null +++ b/docs/source/starters/index.md @@ -0,0 +1,43 @@ +# New project tools (title TBD) +As you saw from the [First steps](../get_started/new_project.md) section, once you have [set up Kedro](../get_started/install.md), you can use `kedro new` to create a basic Kedro project containing project directories and basic code, which you can configure depending on the tooling and example code you need. + +There are options to the code you include when you create a new Kedro project, which the pages in this section describe in detail. + +```{toctree} +:maxdepth: 1 +:hidden: + +new_project_tools +starters +``` + +**Use `kedro new` to create a basic project**
+In the simplest instance, you can create a project using `kedro new` and select from a range of [tools and example code options](./new_project_tools.md) to extend the basic project. + +**Use `kedro new` with `--config`**
+Similarly, you can use `kedro new` but additionally pass in a configuration file, for example: + +```bash +kedro new --config=config.yml +``` + +The file enables you to customise details such as the project folder name and package name. + +The configuration file must contain: + +* `output_dir` The path in which to create the project directory, which can be set to `~` for the home directory or `.` for the current working directory +* `project_name` +* `repo_name` +* `python_package` +* TO DO -- tools and example code options + +The `output_dir` can be specified as `~` for the home directory or `.` for the current working directory. Here is an example `config.yml`, which assumes that a directory named `~/code` already exists: + +```yaml +output_dir: ~/code +project_name: My First Kedro Project +repo_name: testing-kedro +python_package: test_kedro +``` +**Use `kedro new` with a `--starter`**
+Alternatively, you can create a new Kedro project with a [starter](./starters.md) that adds a set of code for a common project use case. diff --git a/docs/source/starters/new_project_tools.md b/docs/source/starters/new_project_tools.md new file mode 100644 index 0000000000..9a6b03fee1 --- /dev/null +++ b/docs/source/starters/new_project_tools.md @@ -0,0 +1,4 @@ +# Configure a new project + + + diff --git a/docs/source/starters/starters.md b/docs/source/starters/starters.md new file mode 100644 index 0000000000..924db091c4 --- /dev/null +++ b/docs/source/starters/starters.md @@ -0,0 +1,82 @@ +# Kedro starters + +A Kedro starter contains code in the form of a [Cookiecutter](https://cookiecutter.readthedocs.io/) template for a Kedro project. Using a starter is like using a pre-defined layout when creating a presentation or document. + +You can create your own starters for reuse within a project or team, as described in the [how to create a Kedro starter](../starters/create_a_starter.md) documentation. + +## How to use a starter + +To create a Kedro project using a starter, apply the `--starter` flag to `kedro new`. For example: + +```bash +kedro new --starter= +``` + +```{note} +`path-to-starter` could be a local directory or a VCS repository, as long as [Cookiecutter](https://cookiecutter.readthedocs.io/en/stable/usage.html) supports it. +``` + +## Starter aliases + +We provide aliases for common starters maintained by the Kedro team so that you don't have to specify the full path. For example, to create a project using the `spaceflights-pandas` starter: + +```bash +kedro new --starter=spaceflights-pandas +``` +To list all the aliases we support: + +```bash +kedro starter list +``` + +## Official Kedro starters + +The Kedro team maintains the following starters for a range of Kedro projects: + +* [`astro-airflow-iris`](https://github.com/kedro-org/kedro-starters/tree/main/astro-airflow-iris): An example project using the [Iris dataset](https://www.kaggle.com/uciml/iris) with a minimal setup for deploying the pipeline on Airflow with [Astronomer](https://www.astronomer.io/). +* [`databricks-iris`](https://github.com/kedro-org/kedro-starters/tree/main/databricks-iris): An example project using the [Iris dataset](https://www.kaggle.com/uciml/iris) with a setup for [Databricks](https://docs.kedro.org/en/stable/deployment/databricks/index.html) deployment. +* [`spaceflights-pandas`](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas): The [spaceflights tutorial](../tutorial/spaceflights_tutorial.md) example code with `pandas` datasets. +* [`spaceflights-pandas-viz`](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas-viz): The [spaceflights tutorial](../tutorial/spaceflights_tutorial.md) example code with `pandas` datasets and visualisation and experiment tracking `kedro-viz` features. +* [`spaceflights-pyspark`](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pyspark): The [spaceflights tutorial](../tutorial/spaceflights_tutorial.md) example code with `pyspark` datasets. +* [`spaceflights-pyspark-viz`](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pyspark-viz): The [spaceflights tutorial](../tutorial/spaceflights_tutorial.md) example code with `pyspark` datasets and visualisation and experiment tracking `kedro-viz` features. + +### Archived starters + +The following Kedro starters have been archived and are unavailable in Kedro version 0.19.0 and beyond. + +* [`standalone-datacatalog`](https://github.com/kedro-org/kedro-starters/tree/main/standalone-datacatalog) +* [`pandas-iris`](https://github.com/kedro-org/kedro-starters/tree/main/pandas-iris) +* [`pyspark-iris`](https://github.com/kedro-org/kedro-starters/tree/main/pyspark-iris) +* [`pyspark`](https://github.com/kedro-org/kedro-starters/tree/main/pyspark) + +The latest version of Kedro that supports these starters is Kedro 0.18.14. + +* To check the version of Kedro you have installed, type `kedro -V` in your terminal window. +* To install a specific version of Kedro, e.g. 0.18.14, type `pip install kedro==0.18.14`. +* To create a project with one of these starters using `kedro new`, type the following (assuming Kedro version 0.18.14) `kedro new --starter=pandas-iris --checkout=0.18.14` (for example, to use the `pandas-iris` starter). + + +## Starter versioning + +By default, Kedro will use the latest version available in the repository. If you want to use a specific version of a starter, you can pass a `--checkout` argument to the command: + +```bash +kedro new --starter=spaceflights-pandas --checkout=0.1.0 +``` + +The `--checkout` value can point to a branch, tag or commit in the starter repository. + +Under the hood, the value will be passed to the [`--checkout` flag in Cookiecutter](https://cookiecutter.readthedocs.io/en/stable/usage.html#works-directly-with-git-and-hg-mercurial-repos-too). + + +## Use a starter with a configuration file + +By default, when you create a new project using a starter, `kedro new` asks you to enter the `project_name`, which it uses to set the `repo_name` and `python_package` name. This is the same behaviour as when you [create a new empty project](../get_started/new_project.md) + +Kedro also allows you to specify a configuration file when you create a project using a Kedro starter. Use the `--config` flag alongside the starter: + +```bash +kedro new --config=my_kedro_project.yml --starter=spaceflights-pandas +``` + +This option is useful when the starter requires more configuration than the default mode requires. diff --git a/docs/source/tutorial/tutorial_template.md b/docs/source/tutorial/tutorial_template.md index 0a919fa2db..e14be6c4da 100644 --- a/docs/source/tutorial/tutorial_template.md +++ b/docs/source/tutorial/tutorial_template.md @@ -1,16 +1,16 @@ # Set up the spaceflights project -This section shows how to create a new project (with `kedro new` using the [Kedro spaceflights starter](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas)) and install project dependencies (with `pip install -r requirements.txt`). +This section shows how to create a new project with `kedro new` using the [Kedro spaceflights starter](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas)) and install project dependencies (with `pip install -r requirements.txt`). ## Create a new project [Set up Kedro](../get_started/install.md) if you have not already done so. ```{important} -We recommend that you use the same version of Kedro that was most recently used to test this tutorial (0.18.6). To check the version installed, type `kedro -V` in your terminal window. +We recommend that you use the same version of Kedro that was most recently used to test this tutorial (0.19.0). To check the version installed, type `kedro -V` in your terminal window. ``` -In your terminal, navigate to the folder you want to store the project. Type the following to generate the project from the [Kedro spaceflights starter](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas). The project will be populated with a complete set of working example code: +Navigate to the folder you want to store the project. Type the following to generate the project from the [Kedro spaceflights starter](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas). The project will be populated with a complete set of working example code: ```bash kedro new --starter=spaceflights-pandas @@ -18,7 +18,7 @@ kedro new --starter=spaceflights-pandas When prompted for a project name, you should accept the default choice (`Spaceflights`) as the rest of this tutorial assumes that project name. -When Kedro has created the project, navigate to the [project root directory](./spaceflights_tutorial.md#project-root-directory): +After Kedro has created the project, navigate to the [project root directory](./spaceflights_tutorial.md#project-root-directory): ```bash cd spaceflights