Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ready] Docs changes to remove pandas-iris and update kedro new flow in onboarding docs #3317

Merged
merged 45 commits into from
Nov 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
9e9449c
Revise link to notebook docs and remove unnecessary intro page
stichbury Oct 30, 2023
c5b5b0d
Update starters content
stichbury Oct 30, 2023
ce80d99
Merge branch 'develop' into fix-starters-content
stichbury Nov 16, 2023
3cee9f4
relocate starters content
stichbury Nov 16, 2023
124bd90
Added some changes for add-ons and some to do notes
stichbury Nov 16, 2023
b58d9a4
Merge branch 'develop' into fix-starters-content
stichbury Nov 16, 2023
f9062ce
Merge branch 'develop' into fix-starters-content
stichbury Nov 20, 2023
233a4de
Merge branch 'develop' into fix-starters-content
stichbury Nov 21, 2023
4abbdbb
Some further fixes
stichbury Nov 21, 2023
fb5c335
Merge branch 'develop' into fix-starters-content
stichbury Nov 21, 2023
ab9bcd9
Move section about development version of Kedro
stichbury Nov 21, 2023
b787664
Add text for new project
stichbury Nov 21, 2023
47f7e05
Merge branch 'fix-starters-content' of https://github.com/kedro-org/k…
stichbury Nov 21, 2023
cd7938e
Remove mention of pandas-iris where possible, replacing with alternative
stichbury Nov 21, 2023
8724441
Merge branch 'develop' into fix-starters-content
stichbury Nov 21, 2023
c4d8ed5
Fix linter errors
stichbury Nov 21, 2023
f7cbffa
Merge branch 'fix-starters-content' of https://github.com/kedro-org/k…
stichbury Nov 21, 2023
fc4aed5
Update new project docs
stichbury Nov 22, 2023
2349b7d
Merge branch 'develop' into fix-starters-content
AhdraMeraliQB Nov 22, 2023
bdf10a3
Remove deprecated starters from architecture diagram
stichbury Nov 22, 2023
6e678e6
Merge branch 'fix-starters-content' of https://github.com/kedro-org/k…
stichbury Nov 22, 2023
f2a8868
Add warning for pandas-iris usage in generator section
stichbury Nov 22, 2023
50b51c9
Further updates for instances of kedro new
stichbury Nov 22, 2023
5db4cad
Remove TODO as no longer required
Nov 22, 2023
b397ccf
Merge
Nov 22, 2023
d83793e
Resolve some Vale issues and remove implication of tools + starters
stichbury Nov 22, 2023
8708aae
Merge branch 'fix-starters-content' of https://github.com/kedro-org/k…
stichbury Nov 22, 2023
edc4e51
fixes to internal links
stichbury Nov 22, 2023
8a832c5
Merge branch 'develop' into fix-starters-content
stichbury Nov 22, 2023
f6e1844
pandas-spaceflights bad, spaceflights-pandas good
stichbury Nov 22, 2023
1c64a78
Merge branch 'fix-starters-content' of https://github.com/kedro-org/k…
stichbury Nov 22, 2023
0c66ac3
Merge branch 'develop' into fix-starters-content
stichbury Nov 23, 2023
135ad6a
fix cookiecutter docs urls
stichbury Nov 23, 2023
711bc0b
Update the create a starter docs
stichbury Nov 23, 2023
ccc8f96
Merge branch 'fix-starters-content' of https://github.com/kedro-org/k…
stichbury Nov 23, 2023
201776d
Fix link to avoid linkcheck barf
stichbury Nov 23, 2023
b02d672
Update docs/source/get_started/new_project.md
stichbury Nov 23, 2023
57701f6
Update following review
stichbury Nov 23, 2023
aec7ebc
Update content
stichbury Nov 23, 2023
59fdfcf
Merge branch 'fix-starters-content' of https://github.com/kedro-org/k…
stichbury Nov 23, 2023
8c38a11
Update docs/source/nodes_and_pipelines/nodes.md
stichbury Nov 27, 2023
bd04596
Update docs/source/starters/starters.md
stichbury Nov 27, 2023
735ed4f
Update FAQ
stichbury Nov 27, 2023
c09cac5
Merge branch 'develop' into fix-starters-content
stichbury Nov 27, 2023
7cb1dd4
Updates following review
stichbury Nov 27, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/source/deployment/airflow_astronomer.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ The following tutorial uses a different approach and shows how to deploy a Kedro

[Astronomer](https://docs.astronomer.io/astro/install-cli) is a managed Airflow platform which allows users to spin up and run an Airflow cluster easily in production. Additionally, it also provides a set of tools to help users get started with Airflow locally in the easiest way possible.

The tutorial discusses how to run the [example Iris classification pipeline](../get_started/new_project.md#create-a-new-project-containing-example-code) on a local Airflow cluster with Astronomer. You may also consider using our [`astro-airflow-iris` starter](https://github.com/kedro-org/kedro-starters/tree/main/astro-airflow-iris) which provides a template containing the boilerplate code that the tutorial describes:
The tutorial discusses how to run the example Iris classification pipeline on a local Airflow cluster with Astronomer. You may also consider using our [`astro-airflow-iris` starter](https://github.com/kedro-org/kedro-starters/tree/main/astro-airflow-iris) which provides a template containing the boilerplate code that the tutorial describes:

```shell
kedro new --starter=astro-airflow-iris
Expand Down Expand Up @@ -44,10 +44,10 @@ To follow this tutorial, ensure you have the following:
astro dev init
```

2. Create a new Kedro project using the `pandas-iris` starter. You can use the default value in the project creation process:
2. Create a new Kedro project using the `astro-airflow-iris` starter. You can use the default value in the project creation process:

```shell
kedro new --starter=pandas-iris
kedro new --starter=astro-airflow-iris
```

3. Copy all files and directories under `new-kedro-project`, which was the default project name created in step 2, to the root directory so Kedro and Astro CLI share the same project root:
Expand Down
2 changes: 1 addition & 1 deletion docs/source/deployment/distributed.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,4 +40,4 @@

## 4. (Optional) Create starters

This is an optional step, but it may speed up your work in the long term. If you find yourself having to deploy in a similar environment or to a similar platform fairly often, you may want to [build your own Kedro starter](../kedro_project_setup/starters.md). That way you will be able to re-use any deployment scripts written as part of step 2.
You may opt to [build your own Kedro starter](../starters/starters.md) if you regularly have to deploy in a similar environment or to a similar platform. The starter enables you to re-use any deployment scripts written as part of step 2.

Check warning on line 43 in docs/source/deployment/distributed.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/deployment/distributed.md#L43

[Kedro.weaselwords] 'regularly' is a weasel word!
Raw output
{"message": "[Kedro.weaselwords] 'regularly' is a weasel word!", "location": {"path": "docs/source/deployment/distributed.md", "range": {"start": {"line": 43, "column": 79}}}, "severity": "WARNING"}
2 changes: 1 addition & 1 deletion docs/source/extend_kedro/architecture_overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ Kedro framework serves as the interface between a Kedro project and Kedro librar

## Kedro starter

You can use a [Kedro starter](../kedro_project_setup/starters.md) to generate a Kedro project that contains boilerplate code. We maintain a set of [official starters](https://github.com/kedro-org/kedro-starters/) but you can also use a custom starter of your choice.
You can use a [Kedro starter](../starters/starters.md) to generate a Kedro project that contains boilerplate code. We maintain a set of [official starters](https://github.com/kedro-org/kedro-starters/) but you can also use a custom starter of your choice.

## Kedro library

Expand Down
2 changes: 1 addition & 1 deletion docs/source/extend_kedro/common_use_cases.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,4 +39,4 @@ Your plugin's implementation can take advantage of other extension mechanisms su

## Use Case 4: How to customise the initial boilerplate of your project

Sometimes you might want to tailor the starting boilerplate of a Kedro project to your specific needs. For example, your organisation might have a standard CI script that you want to include in every new Kedro project. To this end, see the [guide for creating Kedro starters](../kedro_project_setup/starters.md#how-to-create-a-kedro-starter).
Sometimes you might want to tailor the starting boilerplate of a Kedro project to your specific needs. For example, your organisation might have a standard CI script that you want to include in every new Kedro project. To this end, see the [guide for creating Kedro starters](../starters/create_a_starter.md).
1 change: 1 addition & 0 deletions docs/source/extend_kedro/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,5 @@
common_use_cases
plugins
architecture_overview
../starters/create_a_starter
stichbury marked this conversation as resolved.
Show resolved Hide resolved
```
43 changes: 0 additions & 43 deletions docs/source/extend_kedro/plugins.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,49 +42,6 @@ Once the plugin is installed, you can run it as follows:
kedro to_json
```

## Extend starter aliases
It is possible to extend the list of starter aliases built into Kedro. This means that a [custom Kedro starter](../kedro_project_setup/starters.md#how-to-create-a-kedro-starter) can be used directly through the `starter` argument in `kedro new` rather than needing to explicitly provide the `template` and `directory` arguments. A custom starter alias behaves in the same way as an official Kedro starter alias and is also picked up by `kedro starter list`.

You need to extend the starters by providing a list of `KedroStarterSpec`, in this example it is defined in a file called `plugin.py`.

Example for a non-git repository starter:
```python
# plugin.py
starters = [
KedroStarterSpec(
alias="test_plugin_starter",
template_path="your_local_directory/starter_folder",
)
]
```

Example for a git repository starter:
```python
# plugin.py
starters = [
KedroStarterSpec(
alias="test_plugin_starter",
template_path="https://github.com/kedro-org/kedro-starters/",
directory="spaceflights-pandas",
)
]
```

The `directory` argument is optional and should be used when you have multiple templates in one repository as for the [official kedro-starters](https://github.com/kedro-org/kedro-starters). If you only have one template, your top-level directory will be treated as the template. For an example, see the [spaceflights-pandas starter](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas).

In your `pyproject.toml`, you need to register the specifications to `kedro.starters`:

```toml
[project.entry-points."kedro.starters"]
starter = "plugin:starters"
```

After that you can use this starter with `kedro new --starter=test_plugin_starter`.

```{note}
If your starter lives on a git repository, by default Kedro attempts to use a tag or branch labelled with your version of Kedro, e.g. `0.18.12`. This means that you can host different versions of your starter template on the same repository, and the correct one will automatically be used. If you do not wish to follow this structure, you should override it with the `checkout` flag, e.g. `kedro new --starter=test_plugin_starter --checkout=main`.
```

## Working with `click`

Commands must be provided as [`click` `Groups`](https://click.palletsprojects.com/en/7.x/api/#click.Group)
Expand Down
8 changes: 7 additions & 1 deletion docs/source/faq/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,12 @@

This is a growing set of technical FAQs. The [product FAQs on the Kedro website](https://kedro.org/#faq) explain how Kedro can answer the typical use cases and requirements of data scientists, data engineers, machine learning engineers and product owners.


## Installing Kedro
* [How do I install a development version of Kedro](https://github.com/kedro-org/kedro/wiki/Guidelines-for-contributing-developers)?

Check warning on line 7 in docs/source/faq/faq.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/faq/faq.md#L7

[Kedro.pronouns] Avoid first-person singular pronouns such as 'I'.
Raw output
{"message": "[Kedro.pronouns] Avoid first-person singular pronouns such as 'I'.", "location": {"path": "docs/source/faq/faq.md", "range": {"start": {"line": 7, "column": 11}}}, "severity": "WARNING"}

* **How can I check the version of Kedro installed?** To check the version installed, type `kedro -V` in your terminal window.

Check warning on line 9 in docs/source/faq/faq.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/faq/faq.md#L9

[Kedro.pronouns] Avoid first-person singular pronouns such as 'I'.
Raw output
{"message": "[Kedro.pronouns] Avoid first-person singular pronouns such as 'I'.", "location": {"path": "docs/source/faq/faq.md", "range": {"start": {"line": 9, "column": 13}}}, "severity": "WARNING"}

## Kedro documentation
* {doc}`Where can I find the documentation about Kedro-Viz<kedro-viz:kedro-viz_visualisation>`?
* {doc}`Where can I find the documentation for Kedro's datasets<kedro-datasets:kedro_datasets>`?
Expand All @@ -13,7 +19,7 @@

## Kedro project development

* [How do I write my own Kedro starter projects](../kedro_project_setup/starters.md#how-to-create-a-kedro-starter)?
* [How do I write my own Kedro starter projects](../starters/create_a_starter.md)?

Check warning on line 22 in docs/source/faq/faq.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/faq/faq.md#L22

[Kedro.pronouns] Avoid first-person singular pronouns such as 'I'.
Raw output
{"message": "[Kedro.pronouns] Avoid first-person singular pronouns such as 'I'.", "location": {"path": "docs/source/faq/faq.md", "range": {"start": {"line": 22, "column": 11}}}, "severity": "WARNING"}

Check warning on line 22 in docs/source/faq/faq.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/faq/faq.md#L22

[Kedro.pronouns] Avoid first-person singular pronouns such as 'my'.
Raw output
{"message": "[Kedro.pronouns] Avoid first-person singular pronouns such as 'my'.", "location": {"path": "docs/source/faq/faq.md", "range": {"start": {"line": 22, "column": 19}}}, "severity": "WARNING"}

## Configuration

Expand Down
22 changes: 0 additions & 22 deletions docs/source/get_started/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,28 +162,6 @@ When migrating an existing project to a newer Kedro version, make sure you also
* For projects generated with versions of Kedro > 0.17.0, you'll do this in the `pyproject.toml` file from the project root directory.
* If your project was generated with a version of Kedro <0.17.0, you will instead need to update the `ProjectContext`, which is found in `src/<package_name>/run.py`.

## How to install a development version of Kedro

This section explains how to try out a development version of Kedro direct from the [Kedro GitHub repository](https://github.com/kedro-org/kedro).

```{important}
The development version of Kedro is not guaranteed to be bug-free and/or compatible with any of the [stable versions](https://pypi.org/project/kedro/#history). We do not recommend that you use a development version of Kedro in any production systems. Please install and use with caution.
```

To try out latest, unreleased functionality from the `develop` branch of the Kedro GitHub repository, run the following installation command:

```bash
pip install git+https://github.com/kedro-org/kedro.git@develop
```

This will install Kedro from the `develop` branch of the GitHub repository, which is always the most up to date. This command will install Kedro from source, unlike `pip install kedro` which installs Kedro from PyPI.

If you want to roll back to a stable version of Kedro, execute the following in your environment:

```bash
pip uninstall kedro -y
pip install kedro
```

## Summary

Expand Down
108 changes: 23 additions & 85 deletions docs/source/get_started/new_project.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,20 @@
# Create a new Kedro project

## Summary
There are several ways to create a new Kedro project. This page explains the flow to create a basic project using `kedro new` to output a project directory containing the basic files and subdirectories that make up a Kedro project.

There are a few ways to create a new project once you have [set up Kedro](install.md):
You can also create a new Kedro project with a starter that adds a set of code for a common project use case. [Starters are explained separately](../starters/starters.md) later in the documentation set and illustrated with the [spaceflights tutorial](../tutorial/tutorial_template.md).

* You can use `kedro new` to [create a basic Kedro project](#create-a-new-empty-project) containing project directories and basic code, but empty to extend as you need.
* You can use `kedro new` and [pass in a configuration file](#create-a-new-project-from-a-configuration-file) to manually control project details such as the name, folder and package name.
* You can [create a Kedro project populated with template code](#create-a-new-project-containing-example-code) that acts as a starter example. This guide illustrates with the `pandas-iris` starter, and there is a [range of Kedro starter projects](../kedro_project_setup/starters.md#list-of-official-starters).
## Introducing `kedro new`


Once you've created a project:

* You need to **navigate to its project folder** and **install its dependencies**: `pip install -r requirements.txt`
* **To run the project**: `kedro run`
* **To visualise the project**: `kedro viz`

## Create a new empty project

The simplest way to create a default Kedro project is to navigate to your preferred directory and type:
You can create a basic Kedro project containing the default code needed to set up your own nodes and pipelines. Navigate to your preferred directory and type:

```bash
kedro new
```

Enter a name for the project, which can be human-readable and may contain alphanumeric symbols, spaces, underscores and hyphens. It must be at least two characters long.
### Project name

The command line interface then asks you to enter a name for the project. This is the human-readable name, and it may contain alphanumeric symbols, spaces, underscores, and hyphens. It must be at least two characters long.

It's best to keep the name simple because the choice is set as the value of `project_name` and is also used to generate the folder and package names for the project automatically.

Expand All @@ -35,48 +26,27 @@
| Local directory to store the project | `repo_name` | `get-started` |
| The Python package name for the project (short, all-lowercase) | `python_package` | `get_started` |

### Project tools

The output of `kedro new` is a directory containing all the project files and subdirectories required for a basic Kedro project, ready to extend with the code.

## Create a new project from a configuration file

To customise a new project's directory and package name, use a configuration file to specify those values. The configuration file must contain:

- `output_dir` The path in which to create the project directory
- `project_name`
- `repo_name`
- `python_package`
The command line interface then asks which tools you'd like to include in the project. The options are as follows and described in more detail above in the [documentation about the new project tools](../starters/new_project_tools.md).

The `output_dir` can be set to customised. For example, `~` for the home directory or `.` for the current working directory. Here is an example `config.yml`, which assumes that a directory named `~/code` already exists:
You can add one or more of the options, or follow the default and add none at all:

```yaml
output_dir: ~/code
project_name: My First Kedro Project
repo_name: testing-kedro
python_package: test_kedro
```

To create this new project:

```bash
kedro new --config=<path>/config.yml
```
* Linting: A basic linting setup with Black and ruff
* Testing: A basic testing setup with pytest
* Custom Logging: Additional logging options
* Documentation: Configuration for basic documentation built with Sphinx
* Data Structure: The [directory structure](../faq/faq.md#what-is-data-engineering-convention) for storing data locally
* PySpark: Setup and configuration for working with PySpark
* Kedro Viz: Kedro's native visualisation tool.

## Create a new project containing example code
### Project examples

Use a [Kedro starter](../kedro_project_setup/starters.md) to create a project containing template code, to run as-is or to adapt and extend.
TO DO

The following illustrates a project created with example code based on the familiar [Iris dataset](https://www.kaggle.com/uciml/iris).
## Run the new project

The first step is to create the Kedro project using a starter to add the example code and data.

```bash
kedro new --starter=pandas-iris
```

## Run the project

However you create a Kedro project, once `kedro new` has completed, the next step is to navigate to the project folder (`cd <project-name>`) and install dependencies with `pip` as follows:
Whichever options you selected for tools and example code, once `kedro new` has completed, the next step is to navigate to the project folder (`cd <project-name>`) and install dependencies with `pip` as follows:

Check warning on line 49 in docs/source/get_started/new_project.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/get_started/new_project.md#L49

[Kedro.words] Use 'after' instead of 'once'.
Raw output
{"message": "[Kedro.words] Use 'after' instead of 'once'.", "location": {"path": "docs/source/get_started/new_project.md", "range": {"start": {"line": 49, "column": 60}}}, "severity": "WARNING"}

```bash
pip install -r requirements.txt
Expand All @@ -102,7 +72,7 @@
pip install kedro-viz
```

To start Kedro-Viz, enter the following in your terminal:
To start Kedro-Viz, navigate to the project folder (`cd <project-name>`) and enter the following in your terminal:

```bash
kedro viz
Expand All @@ -113,7 +83,7 @@
To exit the visualisation, close the browser tab. To regain control of the terminal, enter `^+c` on Mac or `Ctrl+c` on Windows or Linux machines.

## Where next?
You have completed the section on Kedro project creation for new users. Now choose how to learn more:
You have completed the section on Kedro project creation for new users. Here are some useful resources to learn more:

* Understand more about Kedro: The following page explains the [fundamental Kedro concepts](./kedro_concepts.md).

Expand All @@ -122,35 +92,3 @@
* How-to guide for notebook users: The documentation section following the tutorial explains [how to combine Kedro with a Jupyter notebook](../notebooks_and_ipython/kedro_and_notebooks.md).

If you've worked through the documentation listed and are unsure where to go next, review the [Kedro repositories on GitHub](https://github.com/kedro-org) and [Kedro's Slack channels](https://slack.kedro.org).


## More information about the `pandas-iris` example project

If you used the `pandas-iris` starter to create an example project, the rest of this page gives further information.

<details>
<summary>Expand for more details.</summary>

### Background information
The Iris dataset was generated in 1936 by the British statistician and biologist Ronald Fisher. The dataset contains 150 samples, comprising 50 each of 3 different species of Iris plant (*Iris Setosa*, *Iris Versicolour* and *Iris Virginica*). For each sample, the flower measurements are recorded for the sepal length, sepal width, petal length and petal width.

![](../meta/images/iris_measurements.png)

A machine learning model can use the Iris dataset to illustrate classification (a method used to determine the type of an object by comparison with similar objects that have previously been categorised). Once trained on known data, the machine learning model can make a predictive classification by comparing a test object to the output of its training data.

The Kedro starter contains a single [pipeline](../resources/glossary.md#pipeline) comprising three [nodes](../resources/glossary.md#node) responsible for splitting the data into training and testing samples, running a 1-nearest neighbour classifier algorithm to make predictions and accuracy-reporting.

The nodes are stored in `src/get_started/nodes.py`:

| Node | Description |
| --------------- | ----------------------------------------------------------------------------------- |
| `split_data` | Splits the example Iris dataset into train and test samples |
| `make_predictions`| Makes class predictions (using 1-nearest neighbour classifier and train-test set) |
| `report_accuracy` | Reports the accuracy of the predictions performed by the previous node. |

### Iris example: visualisation

If you [visualise your project with Kedro-Viz](#visualise-a-kedro-project) you should see the following:

![](../meta/images/pipeline_visualisation_iris_starter.png)
</details>
Loading