Skip to content

Commit

Permalink
Improve get started docs and guide to working with notebooks (#2031)
Browse files Browse the repository at this point in the history
* Revised the Introduction to make it short and sweet.
* Revised the Get Started section. Gone is "Hello Kedro". Gone are the installation pre-requisites (that's just part of the Install Kedro page now). Gone is the "Standalone use of the data catalog - woot woot" and GONE is the page on Kedro starters.
* Reordered the create project material to put the project structure breakdown in the section that introduces key concepts and shorten the Iris tutorial to the bare minimum. I did add visualisation at this point though, to highlight Kedro Viz, as I felt it was coming far too late in the spaceflights tutorial and needed to be more prominent as a feature.
* Added a TL;DR page to Get Started which some people could probably just use as-is and ignore the rest of the section.
* Starters material has moved into a new section all about "Kedro project setup". Much of that section still needs review/revision but I have updated the Starters page so it reads more clearly.
* Improved the Kedro-Viz page somewhat (still more to come for Plotly)
* Notebooks/IPython materials now merged and simplified
  • Loading branch information
stichbury authored Nov 23, 2022
1 parent 82d69f4 commit ebf3d64
Show file tree
Hide file tree
Showing 45 changed files with 850 additions and 880 deletions.
20 changes: 13 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,13 @@

## What is Kedro?

Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code. It borrows concepts from software engineering and applies them to machine-learning code; applied concepts include modularity, separation of concerns and versioning. Kedro is hosted by the [LF AI & Data Foundation](https://lfaidata.foundation/).
Kedro is an open-source Python framework to create reproducible, maintainable, and modular data science code. It uses software engineering best practices to help you build production-ready data engineering and data science pipelines.

Kedro is hosted by the [LF AI & Data Foundation](https://lfaidata.foundation/).

## How do I install Kedro?

To install Kedro from the Python Package Index (PyPI) simply run:
To install Kedro from the Python Package Index (PyPI) run:

```
pip install kedro
Expand All @@ -28,7 +30,7 @@ It is also possible to install Kedro using `conda`:
conda install -c conda-forge kedro
```

Our [Get Started guide](https://kedro.readthedocs.io/en/stable/get_started/prerequisites.html) contains full installation instructions, and includes how to set up Python virtual environments.
Our [Get Started guide](https://kedro.readthedocs.io/en/stable/get_started/install.html) contains full installation instructions, and includes how to set up Python virtual environments.


## What are the main features of Kedro?
Expand All @@ -48,10 +50,14 @@ Our [Get Started guide](https://kedro.readthedocs.io/en/stable/get_started/prere

## How do I use Kedro?

The [Kedro documentation](https://kedro.readthedocs.io/en/stable/) includes three examples to help get you started:
- A typical "Hello World" example, for an [entry-level description of the main Kedro concepts](https://kedro.readthedocs.io/en/stable/get_started/hello_kedro.html)
- An [introduction to the project template](https://kedro.readthedocs.io/en/stable/get_started/example_project.html) using the Iris dataset
- A more detailed [spaceflights tutorial](https://kedro.readthedocs.io/en/stable/tutorial/tutorial_template.html) to give you hands-on experience
The [Kedro documentation](https://kedro.readthedocs.io/en/stable/) first explains [how to install Kedro](https://kedro.readthedocs.io/en/stable/get_started/install.html) and then introduces [key Kedro concepts](https://kedro.readthedocs.io/en/stable/get_started/kedro_concepts.html).

- The first example illustrates the [basics of a Kedro project](https://kedro.readthedocs.io/en/stable/get_started/new_project.html) using the Iris dataset
- You can then review the [spaceflights tutorial](https://kedro.readthedocs.io/en/stable/tutorial/tutorial_template.html) to build a Kedro project for hands-on experience

For new and intermediate Kedro users, there's a comprehensive section on [how to visualise Kedro projects using Kedro-Viz](https://kedro.readthedocs.io/en/stable/visualisation/kedro-viz_visualisation.html) and [how to work with Kedro and Jupyter notebooks](https://kedro.readthedocs.io/en/stable/notebooks_and_ipython/kedro_and_notebooks).

Further documentation is available for more advanced Kedro usage and deployment. We also recommend the [glossary](https://kedro.readthedocs.io/en/stable/resources/glossary.html) and the [API reference documentation](/kedro) for additional information.


## Why does Kedro exist?
Expand Down
4 changes: 2 additions & 2 deletions docs/build-docs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,9 @@ mkdir docs/build/
cp -r docs/_templates docs/conf.py docs/*.svg docs/*.json docs/build/

if [ "$action" == "linkcheck" ]; then
sphinx-build -c docs/ -WETan -j auto -D language=en -b linkcheck docs/build/ docs/build/html
sphinx-build -c docs/ -ETan -j auto -D language=en -b linkcheck docs/build/ docs/build/html
elif [ "$action" == "docs" ]; then
sphinx-build -c docs/ -WETa -j auto -D language=en docs/build/ docs/build/html
sphinx-build -c docs/ -ETa -j auto -D language=en docs/build/ docs/build/html
fi

# Clean up build artefacts
Expand Down
4 changes: 2 additions & 2 deletions docs/source/contribution/backwards_compatibility.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Your change is **not** considered a breaking change, and so is backwards compati

We aim to minimise the number of breaking changes to keep Kedro software stable and reduce the overhead for users as they migrate their projects. However, there are cases where a breaking change brings considerable value or increases the maintainability of the codebase. In these cases, breaking backwards compatibility can make sense.

Before you contribute a breaking change, you should create a [Github Issue](https://github.com/kedro-org/kedro/issues) that describes the change and justifies the value gained by breaking backwards compatibility.
Before you contribute a breaking change, you should create a [GitHub Issue](https://github.com/kedro-org/kedro/issues) that describes the change and justifies the value gained by breaking backwards compatibility.

## The Kedro release model

Expand All @@ -22,4 +22,4 @@ All breaking changes go into `develop`, from which a major release can be deploy

![Kedro Gitflow Diagram](../meta/images/kedro_gitflow.svg)

Please check the Q&A on [GitHub discussions](https://github.com/kedro-org/kedro/discussions) and ask any new questions about the development process there too!
Got a question about the development process? Ask the community on [Slack](https://slack.kedro.org) if you need to!
2 changes: 1 addition & 1 deletion docs/source/contribution/contribute_to_kedro.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ We welcome any and all contributions to Kedro, at whatever level you can manage.
- Join the community on [Slack](https://slack.kedro.org)
- Review Kedro's [GitHub isusses](https://github.com/kedro-org/kedro/issues) or raise your own issue to report a bug or feature request
- Start a conversation about the Kedro project on [GitHub discussions](https://github.com/kedro-org/kedro/discussions)
- Make a pull request on the [Kedro-Community Github repo](https://github.com/kedro-org/kedro-community) to update the curated list of Kedro community content.
- Make a pull request on the [Kedro-Community GitHub repo](https://github.com/kedro-org/kedro-community) to update the curated list of Kedro community content.
- Report a bug or propose a new feature on [GitHub issues](https://github.com/kedro-org/kedro/issues)
- [Review other contributors' PRs](https://github.com/kedro-org/kedro/pulls)
- [Contribute code](./developer_contributor_guidelines.md), for example to fix a bug or add a feature
Expand Down
8 changes: 4 additions & 4 deletions docs/source/contribution/developer_contributor_guidelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,14 +26,14 @@ To work on the Kedro codebase, you will need to be set up with Git, and Make.
If your development environment is Windows, you can use the `win_setup_conda` and `win_setup_env` commands from [Circle CI configuration](https://github.com/kedro-org/kedro/blob/main/.circleci/config.yml) to guide you in the correct way to do this.
```

You will also need to create and activate virtual environment. If this is unfamiliar to you, read through our [pre-requisites documentation](../get_started/prerequisites.md).
You will also need to create and activate virtual environment. If this is unfamiliar to you, read through our [pre-requisites documentation](../get_started/install.md#installation-prerequisites).

Next, you'll need to fork the [Kedro source code from the Github repository](https://github.com/kedro-org/kedro):
Next, you'll need to fork the [Kedro source code from the GitHub repository](https://github.com/kedro-org/kedro):

* Fork the project by clicking **Fork** in the top-right corner of the [Kedro GitHub repository](https://github.com/kedro-org/kedro)
* Choose your target account

If you need further guidance, consult the [Github documentation about forking a repo](https://docs.github.com/en/get-started/quickstart/fork-a-repo#forking-a-repository).
If you need further guidance, consult the [GitHub documentation about forking a repo](https://docs.github.com/en/get-started/quickstart/fork-a-repo#forking-a-repository).

You are almost ready to go. In your terminal, navigate to the folder into which you forked the Kedro code.

Expand Down Expand Up @@ -194,4 +194,4 @@ Working on your first pull request? You can learn how from these resources:
* [First timers only](https://www.firsttimersonly.com/)
* [How to contribute to an open source project on GitHub](https://egghead.io/courses/how-to-contribute-to-an-open-source-project-on-github)

Please check the Q&A on [GitHub discussions](https://github.com/kedro-org/kedro/discussions) and ask any new questions about the development process there too!
Previous Q&A on [GitHub discussions](https://github.com/kedro-org/kedro/discussions) and our [searchable archive of Discord discussions](https://linen-discord.kedro.org). You can ask new questions about the development process on [Slack](https://slack.kedro.org) too!
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,8 @@ Do not pass "Go", do not collect £200.

* You will need to use restructured text formatting within the box. Aim to keep the formatting of the callout text plain, although you can include bold, italic, code and links.
* Keep the amount of text (and the number of callouts used) to a minimum.
* Prefer to use `note`, `warning` and `important` only, rather than a number of different colours/types of callout.
* Prefer to use `note`, `warning` and `important` only, rather than a larger range of callout.

* Use `note` for notable information
* Use `warning` to indicate a potential `gotcha`
* Use `important` when highlighting a key point that cannot be ignored
Expand Down
13 changes: 6 additions & 7 deletions docs/source/contribution/technical_steering_committee.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,10 @@ In this section, we detail:

- Make sure that ongoing pull requests are moving forward at the right pace or closing them
- Guide the community to use the right channel:
- [Github](https://github.com/kedro-org/kedro/) for feature requests and bug reports
- [GitHub discussions](https://github.com/kedro-org/kedro/discussions) to discuss
the Kedro project
- [Slack](https://slack.kedro.org/)
for questions and to support other users

- [GitHub issues](https://github.com/kedro-org/kedro/issues) for feature requests and bug reports
- [GitHub discussions](https://github.com/kedro-org/kedro/discussions) to discuss the future of the Kedro project
- [Slack](https://slack.kedro.org) for questions and to support other users

## Requirements to become a maintainer

Expand All @@ -52,11 +51,11 @@ and the `kedro-team` channel on the Kedro Slack organisation.

## Voting process

Voting can change project maintainers and decide on the future of Kedro. The TSC leads it as voting maintainers of Kedro. The voting period is one week and is either performed on GitHub Discussions or through a pull request.
Voting can change project maintainers and decide on the future of Kedro. The TSC leads it as voting maintainers of Kedro. The voting period is one week and is either performed on GitHub discussions or through a pull request.

### Other issues or proposals

Open Github Discussions host votes on issues, proposals and changes affecting the future of Kedro, including amendments to our ways of working described in this document. These votes require **a 1/2 majority**.
GitHub discussions is used to host votes on issues, proposals and changes affecting the future of Kedro, including amendments to our ways of working described on this page. These votes require **a 1/2 majority**.

### Adding or removing maintainers

Expand Down
2 changes: 1 addition & 1 deletion docs/source/data/data_catalog.md
Original file line number Diff line number Diff line change
Expand Up @@ -535,7 +535,7 @@ The code API allows you to:

### Configure a Data Catalog

In a file like `catalog.py`, you can construct a `DataCatalog` object programmatically. In the following, we are using a number of pre-built data loaders documented in the [API reference documentation](/kedro.extras.datasets).
In a file like `catalog.py`, you can construct a `DataCatalog` object programmatically. In the following, we are using several pre-built data loaders documented in the [API reference documentation](/kedro.extras.datasets).

```python
from kedro.io import DataCatalog
Expand Down
2 changes: 1 addition & 1 deletion docs/source/deployment/airflow_astronomer.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

This tutorial explains how to deploy a Kedro project on [Apache Airflow](https://airflow.apache.org/) with [Astronomer](https://www.astronomer.io/). Apache Airflow is an extremely popular open-source workflow management platform. Workflows in Airflow are modelled and organised as [DAGs](https://en.wikipedia.org/wiki/Directed_acyclic_graph), making it a suitable engine to orchestrate and execute a pipeline authored with Kedro. [Astronomer](https://docs.astronomer.io/astro/install-cli) is a managed Airflow platform which allows users to spin up and run an Airflow cluster easily in production. Additionally, it also provides a set of tools to help users get started with Airflow locally in the easiest way possible.

The following discusses how to run the [example Iris classification pipeline](../get_started/example_project.md) on a local Airflow cluster with Astronomer.
The following discusses how to run the [example Iris classification pipeline](../get_started/new_project.md#create-the-example-project) on a local Airflow cluster with Astronomer.

## Strategy

Expand Down
2 changes: 1 addition & 1 deletion docs/source/deployment/aws_batch.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
## Why would you use AWS Batch?
[AWS Batch](https://aws.amazon.com/batch/) is optimised for batch computing and applications that scale with the number of jobs running in parallel. It manages job execution and compute resources, and dynamically provisions the optimal quantity and type. AWS Batch can assist with planning, scheduling, and executing your batch computing workloads, using [Amazon EC2](https://aws.amazon.com/ec2/) On-Demand and [Spot Instances](https://aws.amazon.com/ec2/spot/), and it has native integration with [CloudWatch](https://aws.amazon.com/cloudwatch/) for log collection.

AWS Batch helps you run massively parallel Kedro pipelines in a cost-effective way, and allows you to parallelise the pipeline execution across a number of compute instances. Each Batch job is run in an isolated Docker container environment.
AWS Batch helps you run massively parallel Kedro pipelines in a cost-effective way, and allows you to parallelise the pipeline execution across multiple compute instances. Each Batch job is run in an isolated Docker container environment.

The following sections are a guide on how to deploy a Kedro project to AWS Batch, and uses the [spaceflights tutorial](../tutorial/spaceflights_tutorial.md) as primary example. The guide assumes that you have already completed the tutorial, and that the project was created with the project name **Kedro Tutorial**.

Expand Down
2 changes: 1 addition & 1 deletion docs/source/deployment/dask.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Dask offers both a default, single-machine scheduler and a more sophisticated, d

## Prerequisites

The only additional requirement, beyond what was already required by your Kedro pipeline, is to [install `dask.distributed`](http://distributed.dask.org/en/stable/install.html). To review the full installation instructions, including how to set up Python virtual environments, see our [Get Started guide](../get_started/prerequisites.md).
The only additional requirement, beyond what was already required by your Kedro pipeline, is to [install `dask.distributed`](http://distributed.dask.org/en/stable/install.html). To review the full installation instructions, including how to set up Python virtual environments, see our [Get Started guide](../get_started/install.md#installation-prerequisites).

## How to distribute your Kedro pipeline using Dask

Expand Down
2 changes: 1 addition & 1 deletion docs/source/deployment/databricks.md
Original file line number Diff line number Diff line change
Expand Up @@ -234,7 +234,7 @@ You can interact with Kedro in Databricks through the Kedro [IPython extension](

The Kedro IPython extension launches a [Kedro session](../kedro_project_setup/session.md) and makes available the useful Kedro variables `catalog`, `context`, `pipelines` and `session`. It also provides the `%reload_kedro` [line magic](https://ipython.readthedocs.io/en/stable/interactive/magics.html) that reloads these variables (for example, if you need to update `catalog` following changes to your Data Catalog).

The IPython extension can be used in a Databricks notebook in a similar way to how it is used in [Jupyter notebooks](../tools_integration/ipython.md).
The IPython extension can be used in a Databricks notebook in a similar way to how it is used in [Jupyter notebooks](../notebooks_and_ipython/kedro_and_notebooks.md).

If you encounter a `ContextualVersionConflictError`, it is likely caused by Databricks using an old version of `pip`. Hence there's one additional step you need to do in the Databricks notebook to make use of the IPython extension. After you load the IPython extension using the below command:

Expand Down
2 changes: 1 addition & 1 deletion docs/source/deployment/deployment_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Deployment choices

Your choice of deployment method will depend on a number of factors. In this section we provide a number of guides for different approaches.
Your choice of deployment method will depend on various factors. In this section we provide guides for different approaches.

If you decide to deploy your Kedro project on a single machine, you should consult our [guide to single-machine deployment](single_machine.md), and decide whether to [use Docker for container-based deployment](./single_machine.md#container-based) or to use [package-based deployment](./single_machine.md#package-based) or to [use the CLI to clone and deploy](./single_machine.md#cli-based) your codebase to a server.

Expand Down
2 changes: 1 addition & 1 deletion docs/source/deployment/distributed.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,4 +40,4 @@ We encourage you to play with different ways of parameterising your runs as you

## 4. (Optional) Create starters

This is an optional step, but it may speed up your work in the long term. If you find yourself having to deploy in a similar environment or to a similar platform fairly often, you may want to [build your own Kedro starter](../get_started/starters.md). That way you will be able to re-use any deployment scripts written as part of step 2.
This is an optional step, but it may speed up your work in the long term. If you find yourself having to deploy in a similar environment or to a similar platform fairly often, you may want to [build your own Kedro starter](../kedro_project_setup/starters.md). That way you will be able to re-use any deployment scripts written as part of step 2.
2 changes: 1 addition & 1 deletion docs/source/development/commands_reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -506,7 +506,7 @@ To start an IPython shell:
kedro ipython
```

The [Kedro IPython extension](../tools_integration/ipython.md) will make the following variables available in your IPython or Jupyter session:
The [Kedro IPython extension](../notebooks_and_ipython/kedro_and_notebooks.md#a-custom-kedro-kernel) makes the following variables available in your IPython or Jupyter session:

* `catalog` (type `DataCatalog`): [Data Catalog](../data/data_catalog.md) instance that contains all defined datasets; this is a shortcut for `context.catalog`
* `context` (type `KedroContext`): Kedro project context that provides access to Kedro's library components
Expand Down
6 changes: 4 additions & 2 deletions docs/source/development/set_up_pycharm.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,9 +153,11 @@ You can configure Pycharm's IPython to load Kedro's Extension.

Click **PyCharm | Preferences** for macOS or **File | Settings**, inside **Build, Execution, Deployment** and **Console**, enter the **Python Console** configuration.

You can append the configuration necessary to use Kedro IPython to the **Starting script** as described in the [IPython configuring documentation](../tools_integration/ipython.md).
You can append the configuration necessary to use Kedro IPython to the **Starting script**:

![](../meta/images/pycharm_ipython_starting_script.png)
```
%load_ext kedro.ipython
```

With this configuration, when you create a Python Console you should be able to use context, session and catalog.

Expand Down
2 changes: 1 addition & 1 deletion docs/source/extend_kedro/common_use_cases.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,4 +39,4 @@ Your plugin's implementation can take advantage of other extension mechanisms su

## Use Case 4: How to customise the initial boilerplate of your project

Sometimes you might want to tailor the starting boilerplate of a Kedro project to your specific needs. For example, your organisation might have a standard CI script that you want to include in every new Kedro project. To this end, please visit our [guide to create Kedro starters](./create_kedro_starters.md) to solve this extension requirement.
Sometimes you might want to tailor the starting boilerplate of a Kedro project to your specific needs. For example, your organisation might have a standard CI script that you want to include in every new Kedro project. To this end, please visit the [guide for creating Kedro starters](../kedro_project_setup/starters.md#how-to-create-a-kedro-starter) to solve this extension requirement.
Loading

0 comments on commit ebf3d64

Please sign in to comment.