Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update logging documentation #1572

Merged
merged 18 commits into from
May 31, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,15 @@

## Major features and improvements
* Added `abfss` to list of cloud protocols, enabling abfss paths.
* Kedro now uses the [https://github.com/Textualize/rich](Rich) library to format terminal logs.
* The file `conf/base/logging.yml` is now optional. See [our documentation](https://kedro.readthedocs.io/en/0.18.2/logging/logging.html) for details.

## Bug fixes and other changes
* Bumped `pyyaml` upper-bound to make Kedro compatible with the [pyodide](https://pyodide.org/en/stable/usage/loading-packages.html#micropip) stack.
* Updated project template's Sphinx configuration to use `myst_parser` instead of `recommonmark`.
* Reduced number of log lines by changing the logging level from `INFO` to `DEBUG` for low priority messages.
* Kedro's framework-side logging configuration no longer performs file-based logging. Hence superfluous `info.log`/`errors.log` files are no longer created in your project root, and running Kedro on read-only file systems such as Databricks Repos is now possible.
* The `root` logger is now set to the Python default level of `WARNING` rather than `INFO`. Kedro's logger is still set to emit `INFO` level messages.

## Upcoming deprecations for Kedro 0.19.0
* `kedro.extras.ColorHandler` will be removed in 0.19.0.
Expand Down
4 changes: 4 additions & 0 deletions docs/source/deployment/databricks.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

This tutorial uses the [PySpark Iris Kedro Starter](https://github.com/kedro-org/kedro-starters/tree/main/pyspark-iris) to illustrate how to bootstrap a Kedro project using Spark and deploy it to a [Databricks cluster on AWS](https://databricks.com/aws).

```{note}
If you are using [Databricks Repos](https://docs.databricks.com/repos/index.html) to run a Kedro project then you should [disable file-based logging](../logging/logging.md#disable-file-based-logging). This prevents Kedro from attempting to write to the read-only file system.
```

## Prerequisites

* New or existing [AWS account](https://aws.amazon.com/premiumsupport/knowledge-center/create-and-activate-aws-account/) with administrative privileges
Expand Down
71 changes: 53 additions & 18 deletions docs/source/logging/logging.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,58 @@
# Logging

Kedro uses, and facilitates, the use of Python’s `logging` library by providing a default logging configuration. This can be found in `conf/base/logging.yml` in every project generated using Kedro’s CLI `kedro new` command.
Kedro uses [Python's `logging` library](https://docs.python.org/3/library/logging.html). Configuration is provided as a dictionary according to the [Python logging configuration schema](https://docs.python.org/3/library/logging.config.html#logging-config-dictschema) in two places:
1. [Default configuration built into the Kedro framework](https://github.com/kedro-org/kedro/blob/main/kedro/config/logging.yml). This cannot be altered.
2. Your project-side logging configuration. Every project generated using Kedro's CLI `kedro new` command includes a file `conf/base/logging.yml`. You can alter this configuration and provide different configurations for different run environment according to the [standard Kedro mechanism for handling configuration](../kedro_project_setup/configuration.md).

## Configure logging
```{note}
Providing project-side logging configuration is entirely optional. You can delete the `conf/base/logging.yml` file and Kedro will run using the framework's built in configuration.
```

Framework-side and project-side logging configuration are loaded through subsequent calls to [`logging.config.dictConfig`](https://docs.python.org/3/library/logging.config.html#logging.config.dictConfig). This means that, when it is provided, the project-side logging configuration typically _fully overwrites_ the framework-side logging configuration. [Incremental configuration](https://docs.python.org/3/library/logging.config.html#incremental-configuration) is also possible if the `incremental` key is explicitly set to `True` in your project-side logging configuration.

## Default framework-side logging configuration

Kedro's [default logging configuration](https://github.com/kedro-org/kedro/blob/main/kedro/config/logging.yml) defines a handler called `rich` that uses the [Rich logging handler](https://rich.readthedocs.io/en/stable/logging.html) to format messages and handle exceptions.

By default, Python only shows logging messages at level `WARNING` and above. Kedro's logging configuration specifies that `INFO` level messages from Kedro should also be emitted. This makes it easier to track the progress of your pipeline when you perform a `kedro run`.

## Project-side logging configuration

In addition to the `rich` handler defined in Kedro's framework, the [project-side `conf/base/logging.yml`](https://github.com/kedro-org/kedro/blob/main/kedro/templates/project/%7B%7B%20cookiecutter.repo_name%20%7D%7D/conf/base/logging.yml) defines three further logging handlers:
* `console`: show logs on standard output (typically your terminal screen) without any rich formatting
* `info_file_handler`: write logs of level `INFO` and above to `logs/info.log`
* `error_file_handler`: write logs of level `ERROR` and above to `logs/error.log`

The logging handlers that are actually used by default are `rich`, `info_file_handler` and `error_file_handler`.

The project-side logging configuration also ensures that [logs emitted from your project's logger](#perform-logging-in-your-project) should be shown if they are `INFO` level or above (as opposed to the Python default of `WARNING`).

We now give some common examples of how you might like to change your project's logging configuration.

You can customise project logging in `conf/<env>/logging.yml` using [standard Kedro mechanisms for handling configuration](../kedro_project_setup/configuration.md). The configuration should comply with the guidelines from the `logging` library. Find more about it in [the documentation for `logging` module](https://docs.python.org/3/library/logging.html).
### Disable file-based logging
SajidAlamQB marked this conversation as resolved.
Show resolved Hide resolved

## Use logging
You might sometimes need to disable file-based logging, e.g. if you are running Kedro on a read-only file system such as [Databricks Repos](https://docs.databricks.com/repos/index.html). The simplest way to do this is to delete your `conf/base/logging.yml` file. The `logs` directory can then also be safely removed. With no project-side logging configuration specified, Kedro uses the default framework-side logging configuration, which does not include any file-based handlers.

After reading and applying project logging configuration, `kedro` will start emitting the logs automatically. To log your own code, you are advised to do the following:
Alternatively, if you would like to keep other configuration in `conf/base/logging.yml` and just disable file-based logging, then you can remove the file-based handlers from the `root` logger as follows:
```diff
root:
- handlers: [console, info_file_handler, error_file_handler]
+ handlers: [console]
```

### Use plain console logging

To use plain rather than rich logging, swap the `rich` handler for the `console` one as follows:

```diff
root:
- handlers: [rich, info_file_handler, error_file_handler]
+ handlers: [console, info_file_handler, error_file_handler]
```

## Perform logging in your project

To perform logging in your own code (e.g. in a node), you are advised to do as follows:

```python
import logging
Expand All @@ -19,19 +63,10 @@ log.info("Send information")
```

```{note}
The name of a logger corresponds to a key in the `loggers` section in `logging.yml` (e.g. `kedro.io`). See [Python's logging documentation](https://docs.python.org/3/library/logging.html#logger-objects) for more information.
The name of a logger corresponds to a key in the `loggers` section in `logging.yml` (e.g. `kedro`). See [Python's logging documentation](https://docs.python.org/3/library/logging.html#logger-objects) for more information.
```

## Logging for `anyconfig`

By default, [anyconfig](https://github.com/ssato/python-anyconfig) library that is used by `kedro` to read configuration files emits a log message with `INFO` level on every read. To reduce the amount of logs being sent for CLI calls, default project logging configuration in `conf/base/logging.yml` sets the level for `anyconfig` logger to `WARNING`.

If you would like `INFO` level messages to propagate, you can update `anyconfig` logger level in `conf/base/logging.yml` as follows:

```yaml
loggers:
anyconfig:
level: INFO # change
handlers: [console, info_file_handler, error_file_handler]
propagate: no
You can take advantage of rich's [console markup](https://rich.readthedocs.io/en/stable/markup.html) when enabled in your logging calls:
```python
log.error("[bold red blink]Important error message![/]", extra={"markup": True})
```