Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update all readmes #185

Merged
merged 22 commits into from
Nov 23, 2023
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
a408ee5
Update databricks-iris readmes
merelcht Nov 20, 2023
2c96afc
Update spaceflights-pandas readmes
merelcht Nov 20, 2023
29fed3c
Update spaceflights-pandas-viz readmes
merelcht Nov 21, 2023
209902c
Update spaceflights-pyspark readmes
merelcht Nov 21, 2023
2e531c5
Update spaceflights-pyspark-viz readmes
merelcht Nov 21, 2023
2967160
Clean up
merelcht Nov 21, 2023
a2f05fe
Merge branch 'main' into update-all-readmes
merelcht Nov 21, 2023
47f6225
Update README.md
stichbury Nov 21, 2023
ecb25d4
Some tweaks to the READMEs
stichbury Nov 22, 2023
f12a9db
Update spaceflights-pandas-viz/{{ cookiecutter.repo_name }}/README.md
stichbury Nov 22, 2023
2ba4356
Docs subprojects links
stichbury Nov 22, 2023
1d2eb7d
Update spaceflights-pyspark-viz/README.md
stichbury Nov 22, 2023
5a821ea
Update README.md
stichbury Nov 22, 2023
b77e22a
Update spaceflights-pyspark-viz/{{ cookiecutter.repo_name }}/README.md
stichbury Nov 22, 2023
a0eabfc
Merge branch 'main' into update-all-readmes
merelcht Nov 22, 2023
82bec97
Update spaceflights-pyspark-viz/{{ cookiecutter.repo_name }}/README.md
stichbury Nov 23, 2023
84b7b90
Update spaceflights-pyspark-viz/{{ cookiecutter.repo_name }}/README.md
stichbury Nov 23, 2023
e354c8b
Update spaceflights-pandas/{{ cookiecutter.repo_name }}/README.md
stichbury Nov 23, 2023
9f97619
Update spaceflights-pyspark/{{ cookiecutter.repo_name }}/README.md
stichbury Nov 23, 2023
007c113
Update spaceflights-pandas-viz/{{ cookiecutter.repo_name }}/README.md
stichbury Nov 23, 2023
9a6fbbd
Update spaceflights-pandas-viz/{{ cookiecutter.repo_name }}/README.md
stichbury Nov 23, 2023
01a00b0
Update spaceflights-pyspark/{{ cookiecutter.repo_name }}/README.md
stichbury Nov 23, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion databricks-iris/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

The code in this repository demonstrates best practice when working with Kedro and PySpark on Databricks. It contains a Kedro starter template with some initial configuration and an example pipeline, it accompanies the documentation on [developing and deploying Kedro projects on Databricks](https://docs.kedro.org/en/stable/deployment/databricks/index.html).

This repository is a fork of the `pyspark-iris` starter that has been modified to run natively on Databricks.
This starter contains a project created with example code based on the familiar [Iris dataset](https://www.kaggle.com/datasets/uciml/iris).

## Getting started

Expand Down
15 changes: 6 additions & 9 deletions spaceflights-pandas-viz/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,23 +2,19 @@

## Overview

This is a completed version of the [spaceflights tutorial project](https://docs.kedro.org/en/stable/tutorial/spaceflights_tutorial.html) described in the [online Kedro documentation](https://docs.kedro.org) and the extra tutorial sections on [visualisation with Kedro-Viz](https://docs.kedro.org/en/stable/visualisation/index.html) and [experiment tracking with Kedro-Viz](https://docs.kedro.org/en/stable/experiment_tracking/index.html). This project includes the data required to run it.
This is a completed version of the [spaceflights tutorial project](https://docs.kedro.org/en/stable/tutorial/spaceflights_tutorial.html) described in the [online Kedro documentation](https://docs.kedro.org) and the extra tutorial sections on [visualisation with Kedro-Viz](https://docs.kedro.org/en/stable/visualisation/index.html) and [experiment tracking with Kedro-Viz](https://docs.kedro.org/en/stable/experiment_tracking/index.html). It includes the data required to run the project.

The tutorial works through the steps necessary to create this project. To learn the most about Kedro, we recommend that you start with a blank template as the tutorial describes, and follow the workflow. However, if you prefer to read swiftly through the documentation and get to work on the code, you may want to generate a new Kedro project using this [starter](https://docs.kedro.org/en/stable/kedro_project_setup/starters.html) because the steps have been done for you.

To use this starter, create a new Kedro project using the commands below. To make sure you have the required dependencies, run it in your virtual environment (see [our documentation about virtual environments](https://docs.kedro.org/en/stable/get_started/install.html#virtual-environments) for guidance on how to get set up):
To create a project based on this starter, [ensure you have installed Kedro into a virtual environment](https://docs.kedro.org/en/stable/get_started/install.html). Then use the following command:

```bash
pip install kedro
kedro new --starter=spaceflights-pandas-viz
cd <my-project-name> # change directory into newly created project directory
```

This will give you the complete project and project template. If you would prefer to have a reduced project template you can use `add-ons` instead and select `Kedro-Viz` as add-on with an example:
After the project is created, navigate to the newly created project directory:

```bash
pip install kedro
kedro new --add-ons=XXX
cd <my-project-name> # change directory into newly created project directory
cd <my-project-name> # change directory
```

Install the required dependencies:
Expand All @@ -41,3 +37,4 @@ kedro viz
This will open the default browser and display the following pipeline visualisation:

![](./images/pipeline_visualisation_with_layers.png)

67 changes: 66 additions & 1 deletion spaceflights-pandas-viz/{{ cookiecutter.repo_name }}/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Overview

This is your new Kedro project, which was generated using `kedro {{ cookiecutter.kedro_version }}`.
This is your new Kedro project for the [spaceflights tutorial](https://docs.kedro.org/en/stable/tutorial/spaceflights_tutorial.html) and the extra tutorial sections on [visualisation with Kedro-Viz](https://docs.kedro.org/en/stable/visualisation/index.html) and [experiment tracking with Kedro-Viz](https://docs.kedro.org/en/stable/experiment_tracking/index.html), which was generated using `kedro {{ cookiecutter.kedro_version }}`.
stichbury marked this conversation as resolved.
Show resolved Hide resolved

Take a look at the [Kedro documentation](https://docs.kedro.org) to get started.

Expand Down Expand Up @@ -32,3 +32,68 @@ You can run your Kedro project with:
```
kedro run
```

## How to test your Kedro project

Have a look at the files `src/tests/test_run.py` and `src/tests/pipelines/test_data_science.py` for instructions on how to write your tests. You can run your tests as follows:
stichbury marked this conversation as resolved.
Show resolved Hide resolved

```
pytest
```

To configure the coverage threshold, look at the `.coveragerc` file.

## Project dependencies

To see and update the dependency requirements for your project use `requirements.txt`. You can install the project requirements with `pip install -r requirements.txt`.
stichbury marked this conversation as resolved.
Show resolved Hide resolved

[Further information about project dependencies](https://docs.kedro.org/en/stable/kedro_project_setup/dependencies.html#project-specific-dependencies)

## How to work with Kedro and notebooks

> Note: Using `kedro jupyter` or `kedro ipython` to run your notebook provides these variables in scope: `catalog`, `context`, `pipelines` and `session`.
>
> Jupyter, JupyterLab, and IPython are already included in the project requirements by default, so once you have run `pip install -r requirements.txt` you will not need to take any extra steps before you use them.

### Jupyter
To use Jupyter notebooks in your Kedro project, you need to install Jupyter:

```
pip install jupyter
```

After installing Jupyter, you can start a local notebook server:

```
kedro jupyter notebook
```

### JupyterLab
To use JupyterLab, you need to install it:

```
pip install jupyterlab
```

You can also start JupyterLab:

```
kedro jupyter lab
```

### IPython
And if you want to run an IPython session:

```
kedro ipython
```

### How to ignore notebook output cells in `git`
To automatically strip out all output cell contents before committing to `git`, you can use tools like [`nbstripout`](https://github.com/kynan/nbstripout). For example, you can add a hook in `.git/config` with `nbstripout --install`. This will run `nbstripout` before anything is committed to `git`.

> *Note:* Your output cells will be retained locally.

[Further information about using notebooks for experiments within Kedro projects](https://docs.kedro.org/en/develop/notebooks_and_ipython/kedro_and_notebooks.html).
## Package your Kedro project

[Further information about building project documentation and packaging your project](https://docs.kedro.org/en/stable/tutorial/package_a_project.html).
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,5 @@ The `base` folder is for shared configuration, such as non-sensitive and project

WARNING: Please do not put access credentials in the base configuration folder.

## Instructions

## Find out more
You can find out more about configuration from the [user guide documentation](https://docs.kedro.org/en/stable/configuration/configuration_basics.html).
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ shuttles:
type: pandas.ExcelDataset
filepath: data/01_raw/shuttles.xlsx
load_args:
engine: openpyxl # Use modern Excel engine, it is the default since Kedro 0.18.0
engine: openpyxl

preprocessed_companies:
type: pandas.ParquetDataset
Expand Down
14 changes: 9 additions & 5 deletions spaceflights-pandas/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,19 @@

## Overview

This is a completed version of the [spaceflights tutorial project](https://docs.kedro.org/en/stable/tutorial/spaceflights_tutorial.html) described in the [online Kedro documentation](https://docs.kedro.org), including the data required to run the project.
This is a completed version of the [spaceflights tutorial project](https://docs.kedro.org/en/stable/tutorial/spaceflights_tutorial.html) described in the [online Kedro documentation](https://docs.kedro.org), including the data required to run the project.

The tutorial works through the steps necessary to create this project. To learn the most about Kedro, we recommend that you start with a blank template as the tutorial describes, and follow the workflow. However, if you prefer to read swiftly through the documentation and get to work on the code, you may want to generate a new Kedro project using this [starter](https://docs.kedro.org/en/stable/kedro_project_setup/starters.html) because the steps have been done for you.

To use this starter, create a new Kedro project using the commands below. To make sure you have the required dependencies, run it in your virtual environment (see [our documentation about virtual environments](https://docs.kedro.org/en/stable/get_started/install.html#virtual-environments) for guidance on how to get set up):
To create a project based on this starter, [ensure you have installed Kedro into a virtual environment](https://docs.kedro.org/en/stable/get_started/install.html). Then use the following command:

```bash
pip install kedro
kedro new --starter=spaceflights-pandas
cd <my-project-name> # change directory into newly created project directory
```

After the project is created, navigate to the newly created project directory:

```bash
cd <my-project-name> # change directory
```

Install the required dependencies:
Expand All @@ -34,3 +37,4 @@ kedro viz
This will open the default browser and display the following pipeline visualisation:

![](./images/pipeline_visualisation_with_layers.png)

2 changes: 1 addition & 1 deletion spaceflights-pandas/{{ cookiecutter.repo_name }}/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ kedro run

## How to test your Kedro project

Have a look at the file `src/tests/test_run.py` for instructions on how to write your tests. You can run your tests as follows:
Have a look at the files `src/tests/test_run.py` and `src/tests/pipelines/test_data_science.py` for instructions on how to write your tests. You can run your tests as follows:
stichbury marked this conversation as resolved.
Show resolved Hide resolved

```
pytest
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,5 @@ The `base` folder is for shared configuration, such as non-sensitive and project

WARNING: Please do not put access credentials in the base configuration folder.

## Instructions





## Find out more
You can find out more about configuration from the [user guide documentation](https://docs.kedro.org/en/stable/configuration/configuration_basics.html).
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ shuttles:
type: pandas.ExcelDataset
filepath: data/01_raw/shuttles.xlsx
load_args:
engine: openpyxl # Use modern Excel engine, it is the default since Kedro 0.18.0
engine: openpyxl

preprocessed_companies:
type: pandas.ParquetDataset
Expand Down
16 changes: 10 additions & 6 deletions spaceflights-pyspark-viz/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,20 @@

## Overview

This is a variation of the [spaceflights tutorial project](https://docs.kedro.org/en/stable/tutorial/spaceflights_tutorial.html) described in the [online Kedro documentation](https://docs.kedro.org) with `PySpark` and `Kedro-Viz` setup.
This is a completed version of the [spaceflights tutorial project](https://docs.kedro.org/en/stable/tutorial/spaceflights_tutorial.html) and the extra tutorial sections on [visualisation with Kedro-Viz](https://docs.kedro.org/en/stable/visualisation/index.html) and [experiment tracking with Kedro-Viz](https://docs.kedro.org/en/stable/experiment_tracking/index.html) with PySpark setup that originates from the [Kedro documentation about how to work with PySpark](https://docs.kedro.org/en/stable/integrations/pyspark_integration.html).
stichbury marked this conversation as resolved.
Show resolved Hide resolved
stichbury marked this conversation as resolved.
Show resolved Hide resolved
This project includes the data required to run it. The code in this repository demonstrates best practice when working with Kedro and PySpark.

The code in this repository demonstrates best practice when working with Kedro and PySpark. It contains a Kedro starter template with some initial configuration and two example pipelines, and originates from the [Kedro documentation about how to work with PySpark](https://docs.kedro.org/en/stable/integrations/pyspark_integration.html).

To use this starter, create a new Kedro project and select `pyspark` and `Kedro-Viz` as add-ons.
To create a project based on this starter, [ensure you have installed Kedro into a virtual environment](https://docs.kedro.org/en/stable/get_started/install.html). Then use the following command:

```bash
pip install kedro
kedro new
cd <my-project-name> # change directory into newly created project directory
kedro new --starter=spaceflights-pyspark-viz
```

After the project is created, navigate to the newly created project directory:

```bash
cd <my-project-name> # change directory
```

Install the required dependencies:
Expand Down
66 changes: 65 additions & 1 deletion spaceflights-pyspark-viz/{{ cookiecutter.repo_name }}/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Overview

This is your new Kedro project, which was generated using `kedro {{ cookiecutter.kedro_version }}`.
This is your new Kedro project for the [spaceflights tutorial](https://docs.kedro.org/en/stable/tutorial/spaceflights_tutorial.html) and the extra tutorial sections on [visualisation with Kedro-Viz](https://docs.kedro.org/en/stable/visualisation/index.html) and [experiment tracking with Kedro-Viz](https://docs.kedro.org/en/stable/experiment_tracking/index.html) with PySpark setup, which was generated using `kedro {{ cookiecutter.kedro_version }}`.
stichbury marked this conversation as resolved.
Show resolved Hide resolved
stichbury marked this conversation as resolved.
Show resolved Hide resolved

Take a look at the [Kedro documentation](https://docs.kedro.org) to get started.

Expand Down Expand Up @@ -32,3 +32,67 @@ You can run your Kedro project with:
```
kedro run
```

## How to test your Kedro project

Have a look at the files `src/tests/test_run.py` and `src/tests/pipelines/test_data_science.py` for instructions on how to write your tests. You can run your tests as follows:
stichbury marked this conversation as resolved.
Show resolved Hide resolved

```
pytest
```

To configure the coverage threshold, look at the `.coveragerc` file.

## Project dependencies

To see and update the dependency requirements for your project use `requirements.txt`. You can install the project requirements with `pip install -r requirements.txt`.
stichbury marked this conversation as resolved.
Show resolved Hide resolved

[Further information about project dependencies](https://docs.kedro.org/en/stable/kedro_project_setup/dependencies.html#project-specific-dependencies)

## How to work with Kedro and notebooks

> Note: Using `kedro jupyter` or `kedro ipython` to run your notebook provides these variables in scope: `catalog`, `context`, `pipelines` and `session`.
>
> Jupyter, JupyterLab, and IPython are already included in the project requirements by default, so once you have run `pip install -r requirements.txt` you will not need to take any extra steps before you use them.

### Jupyter
To use Jupyter notebooks in your Kedro project, you need to install Jupyter:

```
pip install jupyter
```

After installing Jupyter, you can start a local notebook server:

```
kedro jupyter notebook
```

### JupyterLab
To use JupyterLab, you need to install it:

```
pip install jupyterlab
```

You can also start JupyterLab:

```
kedro jupyter lab
```

### IPython
And if you want to run an IPython session:

```
kedro ipython
```

### How to ignore notebook output cells in `git`
To automatically strip out all output cell contents before committing to `git`, you can use tools like [`nbstripout`](https://github.com/kynan/nbstripout). For example, you can add a hook in `.git/config` with `nbstripout --install`. This will run `nbstripout` before anything is committed to `git`.

> *Note:* Your output cells will be retained locally.

## Package your Kedro project

[Further information about building project documentation and packaging your project](https://docs.kedro.org/en/stable/tutorial/package_a_project.html)
11 changes: 8 additions & 3 deletions spaceflights-pyspark/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,17 @@ This is a variation of the [spaceflights tutorial project](https://docs.kedro.or

The code in this repository demonstrates best practice when working with Kedro and PySpark. It contains a Kedro starter template with some initial configuration and two example pipelines, and originates from the [Kedro documentation about how to work with PySpark](https://docs.kedro.org/en/stable/integrations/pyspark_integration.html).

To use this starter, create a new Kedro project and select `pyspark` as add-on.
To create a project based on this starter, [ensure you have installed Kedro into a virtual environment](https://docs.kedro.org/en/stable/get_started/install.html). Then use the following command:

```bash
pip install kedro
kedro new
cd <my-project-name> # change directory into newly created project directory
kedro new --starter=spaceflights-pyspark
```

After the project is created, navigate to the newly created project directory:

```bash
cd <my-project-name> # change directory
```

Install the required dependencies:
Expand Down
66 changes: 65 additions & 1 deletion spaceflights-pyspark/{{ cookiecutter.repo_name }}/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Overview

This is your new Kedro project, which was generated using `kedro {{ cookiecutter.kedro_version }}`.
This is your new Kedro project for the [spaceflights tutorial](https://docs.kedro.org/en/stable/tutorial/spaceflights_tutorial.html) with PySpark setup, which was generated using `kedro {{ cookiecutter.kedro_version }}`.

Take a look at the [Kedro documentation](https://docs.kedro.org) to get started.

Expand Down Expand Up @@ -32,3 +32,67 @@ You can run your Kedro project with:
```
kedro run
```

## How to test your Kedro project

Have a look at the files `src/tests/test_run.py` and `src/tests/pipelines/test_data_science.py` for instructions on how to write your tests. You can run your tests as follows:
stichbury marked this conversation as resolved.
Show resolved Hide resolved

```
pytest
```

To configure the coverage threshold, look at the `.coveragerc` file.

## Project dependencies

To see and update the dependency requirements for your project use `requirements.txt`. You can install the project requirements with `pip install -r requirements.txt`.
stichbury marked this conversation as resolved.
Show resolved Hide resolved

[Further information about project dependencies](https://docs.kedro.org/en/stable/kedro_project_setup/dependencies.html#project-specific-dependencies)

## How to work with Kedro and notebooks

> Note: Using `kedro jupyter` or `kedro ipython` to run your notebook provides these variables in scope: `catalog`, `context`, `pipelines` and `session`.
>
> Jupyter, JupyterLab, and IPython are already included in the project requirements by default, so once you have run `pip install -r requirements.txt` you will not need to take any extra steps before you use them.

### Jupyter
To use Jupyter notebooks in your Kedro project, you need to install Jupyter:

```
pip install jupyter
```

After installing Jupyter, you can start a local notebook server:

```
kedro jupyter notebook
```

### JupyterLab
To use JupyterLab, you need to install it:

```
pip install jupyterlab
```

You can also start JupyterLab:

```
kedro jupyter lab
```

### IPython
And if you want to run an IPython session:

```
kedro ipython
```

### How to ignore notebook output cells in `git`
To automatically strip out all output cell contents before committing to `git`, you can use tools like [`nbstripout`](https://github.com/kynan/nbstripout). For example, you can add a hook in `.git/config` with `nbstripout --install`. This will run `nbstripout` before anything is committed to `git`.

> *Note:* Your output cells will be retained locally.

## Package your Kedro project

[Further information about building project documentation and packaging your project](https://docs.kedro.org/en/stable/tutorial/package_a_project.html)
Loading