Skip to content

Commit

Permalink
Change location of API docs to /api subdir (#3553)
Browse files Browse the repository at this point in the history
* move api docs again

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Change links to API docs

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Update docs/source/data/advanced_data_catalog_usage.md

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Update docs/source/data/advanced_data_catalog_usage.md

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Update docs/source/data/advanced_data_catalog_usage.md

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Update docs/source/deployment/amazon_emr_serverless.md

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

---------

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>
Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
  • Loading branch information
stichbury and astrojuanlu authored Jan 26, 2024
1 parent 358a5d9 commit 285cf1b
Show file tree
Hide file tree
Showing 43 changed files with 54 additions and 62 deletions.
3 changes: 1 addition & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -137,8 +137,7 @@ venv.bak/
# Sphinx documentation
# Additional files created by sphinx.ext.autosummary
# Some of them are actually tracked to control the output
/docs/source/kedro.*
/docs/source/kedro_datasets.*
/docs/source/api/kedro.*

# mypy
.mypy_cache/
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
6 changes: 3 additions & 3 deletions docs/source/configuration/advanced_configuration.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Advanced configuration
The documentation on [configuration](./configuration_basics.md) describes how to satisfy most common requirements of standard Kedro project configuration:

By default, Kedro is set up to use the [OmegaConfigLoader](/kedro.config.OmegaConfigLoader) class.
By default, Kedro is set up to use the [OmegaConfigLoader](/api/kedro.config.OmegaConfigLoader) class.

## Advanced configuration for Kedro projects
This page also contains a set of guidance for advanced configuration requirements of standard Kedro projects:
Expand All @@ -20,7 +20,7 @@ This page also contains a set of guidance for advanced configuration requirement


### How to use a custom configuration loader
You can implement a custom configuration loader by extending the [`AbstractConfigLoader`](/kedro.config.AbstractConfigLoader) class:
You can implement a custom configuration loader by extending the [`AbstractConfigLoader`](/api/kedro.config.AbstractConfigLoader) class:

```python
from kedro.config import AbstractConfigLoader
Expand Down Expand Up @@ -281,7 +281,7 @@ CONFIG_LOADER_ARGS = {
}
```
### How to load credentials through environment variables
The [`OmegaConfigLoader`](/kedro.config.OmegaConfigLoader) enables you to load credentials from environment variables. To achieve this you have to use the `OmegaConfigLoader` and the `omegaconf` [`oc.env` resolver](https://omegaconf.readthedocs.io/en/2.3_branch/custom_resolvers.html#oc-env).
The [`OmegaConfigLoader`](/api/kedro.config.OmegaConfigLoader) enables you to load credentials from environment variables. To achieve this you have to use the `OmegaConfigLoader` and the `omegaconf` [`oc.env` resolver](https://omegaconf.readthedocs.io/en/2.3_branch/custom_resolvers.html#oc-env).
You can use the `oc.env` resolver to access credentials from environment variables in your `credentials.yml`:

```yaml
Expand Down
2 changes: 1 addition & 1 deletion docs/source/configuration/config_loader_migration.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Migration guide for config loaders
The `ConfigLoader` and `TemplatedConfigLoader` classes have been deprecated since Kedro `0.18.12` and were removed in Kedro `0.19.0`. To use that release or later, you must adopt the [`OmegaConfigLoader`](/kedro.config.OmegaConfigLoader).
The `ConfigLoader` and `TemplatedConfigLoader` classes have been deprecated since Kedro `0.18.12` and were removed in Kedro `0.19.0`. To use that release or later, you must adopt the [`OmegaConfigLoader`](/api/kedro.config.OmegaConfigLoader).
This migration guide outlines the primary distinctions between the old loaders and the `OmegaConfigLoader`, providing step-by-step instructions on updating your code base to utilise the new class effectively.

## `ConfigLoader` to `OmegaConfigLoader`
Expand Down
8 changes: 4 additions & 4 deletions docs/source/configuration/configuration_basics.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

This section contains detailed information about Kedro project configuration, which you can use to store settings for your project such as [parameters](./parameters.md), [credentials](./credentials.md), the [data catalog](../data/data_catalog.md), and [logging information](../logging/index.md).

Kedro makes use of a configuration loader to load any project configuration files, which is [`OmegaConfigLoader`](/kedro.config.OmegaConfigLoader) by default since Kedro 0.19.0.
Kedro makes use of a configuration loader to load any project configuration files, which is [`OmegaConfigLoader`](/api/kedro.config.OmegaConfigLoader) by default since Kedro 0.19.0.

```{note}
`ConfigLoader` and `TemplatedConfigLoader` have been removed in Kedro `0.19.0`. Refer to the [migration guide for config loaders](./config_loader_migration.md) for instructions on how to update your code base to use `OmegaConfigLoader`.
Expand All @@ -12,7 +12,7 @@ Kedro makes use of a configuration loader to load any project configuration file

[OmegaConf](https://omegaconf.readthedocs.io/) is a Python library designed to handle and manage settings. It serves as a YAML-based hierarchical system to organise configurations, which can be structured to accommodate various sources, allowing you to merge settings from multiple locations.

From Kedro 0.18.5 you can use the [`OmegaConfigLoader`](/kedro.config.OmegaConfigLoader) which uses `OmegaConf` to load data.
From Kedro 0.18.5 you can use the [`OmegaConfigLoader`](/api/kedro.config.OmegaConfigLoader) which uses `OmegaConf` to load data.

`OmegaConfigLoader` can load `YAML` and `JSON` files. Acceptable file extensions are `.yml`, `.yaml`, and `.json`. By default, any configuration files used by the config loaders in Kedro are `.yml` files.

Expand Down Expand Up @@ -63,7 +63,7 @@ Do not add any local configuration to version control.
```

## Configuration loading
Kedro-specific configuration (e.g., `DataCatalog` configuration for I/O) is loaded using a configuration loader class, by default, this is [`OmegaConfigLoader`](/kedro.config.OmegaConfigLoader).
Kedro-specific configuration (e.g., `DataCatalog` configuration for I/O) is loaded using a configuration loader class, by default, this is [`OmegaConfigLoader`](/api/kedro.config.OmegaConfigLoader).
When you interact with Kedro through the command line, e.g. by running `kedro run`, Kedro loads all project configuration in the configuration source through this configuration loader.

The loader recursively scans for configuration files inside the `conf` folder, firstly in `conf/base` (`base` being the default environment) and then in `conf/local` (`local` being the designated overriding environment).
Expand Down Expand Up @@ -128,7 +128,7 @@ kedro run --conf-source=<path-to-new-conf-folder>
```

### How to read configuration from a compressed file
You can read configuration from a compressed file in `tar.gz` or `zip` format by using the [`OmegaConfigLoader`](/kedro.config.OmegaConfigLoader).
You can read configuration from a compressed file in `tar.gz` or `zip` format by using the [`OmegaConfigLoader`](/api/kedro.config.OmegaConfigLoader).

How to reference a `tar.gz` file:

Expand Down
4 changes: 2 additions & 2 deletions docs/source/configuration/parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ node(
)
```

In both cases, under the hood parameters are added to the Data Catalog through the method `add_feed_dict()` in [`DataCatalog`](/kedro.io.DataCatalog), where they live as `MemoryDataset`s. This method is also what the `KedroContext` class uses when instantiating the catalog.
In both cases, under the hood parameters are added to the Data Catalog through the method `add_feed_dict()` in [`DataCatalog`](/api/kedro.io.DataCatalog), where they live as `MemoryDataset`s. This method is also what the `KedroContext` class uses when instantiating the catalog.

```{note}
You can use `add_feed_dict()` to inject any other entries into your `DataCatalog` as per your use case.
Expand Down Expand Up @@ -110,7 +110,7 @@ The `kedro.framework.context.KedroContext` class uses the approach above to load

## How to specify parameters at runtime

Kedro also allows you to specify runtime parameters for the `kedro run` CLI command. Use the `--params` command line option and specify a comma-separated list of key-value pairs that will be added to [KedroContext](/kedro.framework.context.KedroContext) parameters and made available to pipeline nodes.
Kedro also allows you to specify runtime parameters for the `kedro run` CLI command. Use the `--params` command line option and specify a comma-separated list of key-value pairs that will be added to [KedroContext](/api/kedro.framework.context.KedroContext) parameters and made available to pipeline nodes.

Each key-value pair is split on the first equals sign. The following example is a valid command:

Expand Down
2 changes: 1 addition & 1 deletion docs/source/data/advanced_data_catalog_usage.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Advanced: Access the Data Catalog in code

You can define a Data Catalog in two ways. Most use cases can be through a YAML configuration file as [illustrated previously](./data_catalog.md), but it is possible to access the Data Catalog programmatically through [`kedro.io.DataCatalog`](/kedro.io.DataCatalog) using an API that allows you to configure data sources in code and use the IO module within notebooks.
You can define a Data Catalog in two ways. Most use cases can be through a YAML configuration file as [illustrated previously](./data_catalog.md), but it is possible to access the Data Catalog programmatically through [`kedro.io.DataCatalog`](/api/kedro.io.DataCatalog) using an API that allows you to configure data sources in code and use the IO module within notebooks.

```{warning}
Datasets are not included in the core Kedro package from Kedro version **`0.19.0`**. Import them from the [`kedro-datasets`](https://github.com/kedro-org/kedro-plugins/tree/main/kedro-datasets) package instead.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/data/data_catalog.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@ kedro run --load-versions=cars:YYYY-MM-DDThh.mm.ss.sssZ
```
where `--load-versions` is dataset name and version timestamp separated by `:`.

A dataset offers versioning support if it extends the [`AbstractVersionedDataset`](/kedro.io.AbstractVersionedDataset) class to accept a version keyword argument as part of the constructor and adapt the `_save` and `_load` method to use the versioned data path obtained from `_get_save_path` and `_get_load_path` respectively.
A dataset offers versioning support if it extends the [`AbstractVersionedDataset`](/api/kedro.io.AbstractVersionedDataset) class to accept a version keyword argument as part of the constructor and adapt the `_save` and `_load` method to use the versioned data path obtained from `_get_save_path` and `_get_load_path` respectively.

To verify whether a dataset can undergo versioning, you should examine the dataset class code to inspect its inheritance [(you can find contributed datasets within the `kedro-datasets` repository)](https://github.com/kedro-org/kedro-plugins/tree/main/kedro-datasets/kedro_datasets). Check if the dataset class inherits from the `AbstractVersionedDataset`. For instance, if you encounter a class like `CSVDataset(AbstractVersionedDataset[pd.DataFrame, pd.DataFrame])`, this indicates that the dataset is set up to support versioning.

Expand Down
10 changes: 5 additions & 5 deletions docs/source/data/how_to_create_a_custom_dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

## AbstractDataset

If you are a contributor and would like to submit a new dataset, you must extend the [`AbstractDataset` interface](/kedro.io.AbstractDataset) or [`AbstractVersionedDataset` interface](/kedro.io.AbstractVersionedDataset) if you plan to support versioning. It requires subclasses to override the `_load` and `_save` and provides `load` and `save` methods that enrich the corresponding private methods with uniform error handling. It also requires subclasses to override `_describe`, which is used in logging the internal information about the instances of your custom `AbstractDataset` implementation.
If you are a contributor and would like to submit a new dataset, you must extend the [`AbstractDataset` interface](/api/kedro.io.AbstractDataset) or [`AbstractVersionedDataset` interface](/api/kedro.io.AbstractVersionedDataset) if you plan to support versioning. It requires subclasses to override the `_load` and `_save` and provides `load` and `save` methods that enrich the corresponding private methods with uniform error handling. It also requires subclasses to override `_describe`, which is used in logging the internal information about the instances of your custom `AbstractDataset` implementation.


## Scenario
Expand All @@ -29,7 +29,7 @@ Consult the [Pillow documentation](https://pillow.readthedocs.io/en/stable/insta

## The anatomy of a dataset

At the minimum, a valid Kedro dataset needs to subclass the base [AbstractDataset](/kedro.io.AbstractDataset) and provide an implementation for the following abstract methods:
At the minimum, a valid Kedro dataset needs to subclass the base [AbstractDataset](/api/kedro.io.AbstractDataset) and provide an implementation for the following abstract methods:

* `_load`
* `_save`
Expand Down Expand Up @@ -307,7 +307,7 @@ Versioning doesn't work with `PartitionedDataset`. You can't use both of them at
```

To add versioning support to the new dataset we need to extend the
[AbstractVersionedDataset](/kedro.io.AbstractVersionedDataset) to:
[AbstractVersionedDataset](/api/kedro.io.AbstractVersionedDataset) to:

* Accept a `version` keyword argument as part of the constructor
* Adapt the `_save` and `_load` method to use the versioned data path obtained from `_get_save_path` and `_get_load_path` respectively
Expand Down Expand Up @@ -507,9 +507,9 @@ Inspect the content of the data directory to find a new version of the data, wri

## Thread-safety

Kedro datasets should work with the [SequentialRunner](/kedro.runner.SequentialRunner) and the [ParallelRunner](/kedro.runner.ParallelRunner), so they must be fully serialisable by the [Python multiprocessing package](https://docs.python.org/3/library/multiprocessing.html). This means that your datasets should not make use of lambda functions, nested functions, closures etc. If you are using custom decorators, you need to ensure that they are using [`functools.wraps()`](https://docs.python.org/3/library/functools.html#functools.wraps).
Kedro datasets should work with the [SequentialRunner](/api/kedro.runner.SequentialRunner) and the [ParallelRunner](/api/kedro.runner.ParallelRunner), so they must be fully serialisable by the [Python multiprocessing package](https://docs.python.org/3/library/multiprocessing.html). This means that your datasets should not make use of lambda functions, nested functions, closures etc. If you are using custom decorators, you need to ensure that they are using [`functools.wraps()`](https://docs.python.org/3/library/functools.html#functools.wraps).

There is one dataset that is an exception: {class}`SparkDataset<kedro-datasets:kedro_datasets.spark.SparkDataset>`. The explanation for this exception is that [Apache Spark](https://spark.apache.org/) uses its own parallelism and therefore doesn't work with Kedro [ParallelRunner](/kedro.runner.ParallelRunner). For parallelism within a Kedro project that uses Spark, use [ThreadRunner](/kedro.runner.ThreadRunner) instead.
There is one dataset that is an exception: {class}`SparkDataset<kedro-datasets:kedro_datasets.spark.SparkDataset>`. The explanation for this exception is that [Apache Spark](https://spark.apache.org/) uses its own parallelism and therefore doesn't work with Kedro [ParallelRunner](/api/kedro.runner.ParallelRunner). For parallelism within a Kedro project that uses Spark, use [ThreadRunner](/api/kedro.runner.ThreadRunner) instead.

To verify whether your dataset is serialisable by `multiprocessing`, use the console or an IPython session to try dumping it using `multiprocessing.reduction.ForkingPickler`:

Expand Down
2 changes: 1 addition & 1 deletion docs/source/data/kedro_dataset_factories.md
Original file line number Diff line number Diff line change
Expand Up @@ -220,7 +220,7 @@ The matches are ranked according to the following criteria:

## How to override the default dataset creation with dataset factories

You can use dataset factories to define a catch-all pattern which will overwrite the default [`MemoryDataset`](/kedro.io.MemoryDataset) creation.
You can use dataset factories to define a catch-all pattern which will overwrite the default [`MemoryDataset`](/api/kedro.io.MemoryDataset) creation.

```yaml
"{a_default_dataset}":
Expand Down
2 changes: 1 addition & 1 deletion docs/source/data/partitioned_and_incremental_datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ The dataset definition should be passed into the `dataset` argument of the `Part

#### Shorthand notation

Requires you only to specify a class of the underlying dataset either as a string (e.g. `pandas.CSVDataset` or a fully qualified class path like `kedro_datasets.pandas.CSVDataset`) or as a class object that is a subclass of the [AbstractDataset](/kedro.io.AbstractDataset).
Requires you only to specify a class of the underlying dataset either as a string (e.g. `pandas.CSVDataset` or a fully qualified class path like `kedro_datasets.pandas.CSVDataset`) or as a class object that is a subclass of the [AbstractDataset](/api/kedro.io.AbstractDataset).

#### Full notation

Expand Down
2 changes: 1 addition & 1 deletion docs/source/deployment/argo.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ To use Argo Workflows, ensure you have the following prerequisites in place:

- [Argo Workflows is installed](https://github.com/argoproj/argo/blob/master/README.md#quickstart) on your Kubernetes cluster
- [Argo CLI is installed](https://github.com/argoproj/argo/releases) on your machine
- A `name` attribute is set for each [Kedro node](/kedro.pipeline.node) since it is used to build a DAG
- A `name` attribute is set for each [Kedro node](/api/kedro.pipeline.node) since it is used to build a DAG
- [All node input/output datasets must be configured in `catalog.yml`](../data/data_catalog_yaml_examples.md) and refer to an external location (e.g. AWS S3); you cannot use the `MemoryDataset` in your workflow

```{note}
Expand Down
2 changes: 1 addition & 1 deletion docs/source/deployment/aws_batch.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ The following sections are a guide on how to deploy a Kedro project to AWS Batch
To use AWS Batch, ensure you have the following prerequisites in place:

- An [AWS account set up](https://aws.amazon.com/premiumsupport/knowledge-center/create-and-activate-aws-account/).
- A `name` attribute is set for each [Kedro node](/kedro.pipeline.node). Each node will run in its own Batch job, so having sensible node names will make it easier to `kedro run --nodes=<node_name>`.
- A `name` attribute is set for each [Kedro node](/api/kedro.pipeline.node). Each node will run in its own Batch job, so having sensible node names will make it easier to `kedro run --nodes=<node_name>`.
- [All node input/output datasets must be configured in `catalog.yml`](../data/data_catalog_yaml_examples.md) and refer to an external location (e.g. AWS S3). A clean way to do this is to create a new configuration environment `conf/aws_batch` containing a `catalog.yml` file with the appropriate configuration, as illustrated below.

<details>
Expand Down
4 changes: 2 additions & 2 deletions docs/source/development/debugging.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ For guides on how to set up debugging with IDEs, please visit the [guide for deb

## Debugging a node

To start a debugging session when an uncaught error is raised within your `node`, implement the `on_node_error` [Hook specification](/kedro.framework.hooks):
To start a debugging session when an uncaught error is raised within your `node`, implement the `on_node_error` [Hook specification](/api/kedro.framework.hooks):

```python
import pdb
Expand Down Expand Up @@ -48,7 +48,7 @@ HOOKS = (PDBNodeDebugHook(),)

## Debugging a pipeline

To start a debugging session when an uncaught error is raised within your `pipeline`, implement the `on_pipeline_error` [Hook specification](/kedro.framework.hooks):
To start a debugging session when an uncaught error is raised within your `pipeline`, implement the `on_pipeline_error` [Hook specification](/api/kedro.framework.hooks):

```python
import pdb
Expand Down
Loading

0 comments on commit 285cf1b

Please sign in to comment.