03_testing.qmd

---
title: "Testing, linting and formatting"
format: 
    html: default
    revealjs:
        output-file: 03_testing_slides.html
        slide-number: true
footer: Python package development
logo: academy_logo.png
---

## Testing

Verify code is working as expected.

Simplest way to test is to run code and check output.

. . .

**Automated** testing checks output automatically.

Code changes can break other parts of code.

Automatic testing verifies code is still working.


## Testing workflow

```{mermaid}
flowchart TD
    A[Prepare inputs]
    B[Describe expected output]
    C[Obtain actual output]
    D[Compare actual and\n expected output]

    A --> B --> C --> D
```


## Unit testing

::: {.callout-note}
## Definition "Unit"

* A small, fundamental piece of code.
* Executed in isolation with appropriate inputs.
::: 

::: {.incremental}
* A function is typically considered a "unit"
* Lines of code within functions are smaller (can't be isolated)
* Classes are considered bigger (but can be treated as units)
:::


## A **good** unit test {.smaller}

:::: {.columns}

::: {.column width=70%}

::: {.incremental}
* Fully automated (next week)
* Has full control over all the pieces running ("fake" external dependencies)
* Can be run in any order 
* Runs in memory (no DB or file access, for example)
* Consistently returns the same result (no random numbers)
* Runs fast
* Tests a single logical concept in the system
* Readable
* Maintainable
* Trustworthy
:::

:::

::: {.column width="30%"}


[![](https://images.manning.com/360/480/resize/book/6/5b5ac1d-637b-4f00-9f5d-7fb583b6dac5/Osherove-UT-3ed-MEAP-HI.png)](https://www.artofunittesting.com/)

:::

::::


## Example

1. get a timeseries of water levels
2. find the maxiumum water level each year
3. create a summary report for the subset of data

---

1. get a timeseries of water levels

```python
def test_get_water_level_includes_start_and_end():
    wl = get_water_level(time="2019-01-01", location="Aarhus")
    
    assert len(wl) == 25
```

---

2. find the maxiumum water level each year

```python
def test_get_max_water_level():
    ts = TimeSeries([1.0, .., 3.0], start = "2019-01-01")
    max_wls = get_max_water_level(ts, freq="Y")
    
    assert len(max_wls) == 1
    assert max_wls[0] == 3.0
```

---

3. create a summary report for the subset of data

```python
def test_summary_report():
    max_wls = [1.0, 2.0, 3.0]
    report = summary_report(max_wls)

    assert report.title == "Summary report"
    assert report.text == "The maximum water level in 2021 was 3.0 m"
```

## Integration testing


```python
def test_integration():
    wl = get_water_level(time="2019-01-01", location="Aarhus")
    max_wls = get_max_water_level(wl, freq="Y")
    report = summary_report(max_wls)

    assert report.title == "Summary report"
    assert report.text == "The maximum water level in 2021 was 3.0 m"
```


## Testing in VS Code

![](images/vs_code_test.png)


## Fixtures

::: {.incremental}

* A piece of code that is used by multiple tests
* Provide data or services to tests
* Defined with @pytest.fixture
* Set up test environment
* Pass fixtures as test arguments

:::


## Fixture example

```python
@pytest.fixture
def water_level():
    return TimeSeries([1.0, .., 3.0], start = "2019-01-01")

def test_get_max_water_level(water_level):
    max_wls = get_max_water_level(water_level, freq="Y")
    
    assert len(max_wls) == 1
    assert max_wls[0] == 3.0
```


## Test coverage

::: {.incremental}

* A measure of how much of your code is tested
* A good test suite should cover all the code
* Install `pytest-cov`
* Run tests with coverage report
    - `pytest --cov=myproj`
* Use coverage report to identify untested code

:::


## Test coverage report
```bash
pytest --cov=myproj tests/
```


```bash
-------------------- coverage: ... ---------------------
Name                 Stmts   Miss  Cover
----------------------------------------
myproj/__init__          2      0   100%
myproj/myproj          257     13    94%
myproj/feature4286      94      7    92%
----------------------------------------
TOTAL                  353     20    94%
```


## Testing advice

::: {.callout-tip}
## Test edge cases

- empty lists
- lists with a single element
- empty strings
- empty dictionaries
- None
- np.nan

::: 

## Tests act as specification

```python
def test_operable_period_can_be_missing():

    assert is_operable(height=1.0, period=None)
    assert is_operable(height=1.0, period=np.nan)
    assert is_operable(height=1.0)
    assert not is_operable(height=11.0)

def test_height_can_not_be_missing():

    with pytest.raises(ValueError) as excinfo:
        is_operable(height=None)
        is_operable(height=np.nan)
        
    assert "height" in str(excinfo.value)
```

## Test driven development

::: {.incremental}

1. Write a test that fails ❌
2. Write the code to make the test pass ✅
3. Refactor the code ⚒️

:::


. . .

The benefit of this approach is that you are forced to think about the expected behaviour of your code before you write it.

It is also too easy to write a test that passes without actually testing the code.


##

::: {.r-fit-text}

and now for
 something completely different...

:::

## [The Zen of Python](https://peps.python.org/pep-0020/) {.smaller}
Beautiful is better than ugly.

Explicit is better than implicit.

Simple is better than complex.

Complex is better than complicated.

Flat is better than nested.

Sparse is better than dense.

Readability counts.

. . .

...

**Errors should never pass silently.**

**Unless explicitly silenced.**

...


## Exceptions {.smaller}

::: {.incremental}
* Exceptions are a way to handle errors in your code.
* Raising an exception can prevent propagating bad values. 
* Exceptions are *communication* between the programmer and the user.
* There are many built-in exceptions in Python
    + `IndexError`
    + `KeyError`
    + `ValueError`
    + `FileNotFoundError`
* You can also create your own custom exceptions, e.g. `ModelInitialistionError`, `MissingLicenseError`?

:::


## Example {.smaller}

```{.python filename="src/ops.py"}
def is_operable(height:float, period:float) -> bool:
    if height < 0.0:
        raise ValueError(f"Supplied value of {height=} is unphysical.")

>>> is_operable(height=-1.0, period=4.0)

Traceback (most recent call last):
  ...
ValueError: Supplied value of height=-1.0 is unphysical.
```

. . .

It is better to raise an exception (that can terminate the program), than to propagate a bad value.


## Warnings 

Warnings are a way to alert users of your code to potential issues or usage errors without actually halting the program's execution.

```{.python filename="src/ops.py"}
import warnings
warnings.warn("This is a warning")
```


## [How to test exceptions](https://docs.pytest.org/en/7.2.x/how-to/assert.html#assertions-about-expected-exceptions)

```{.python filename="tests/test_ops.py"}
import pytest
from ops import is_operable

def test_negative_heights_are_not_valid():
    with pytest.raises(ValueError):
        is_operable(height=-1.0, period=4.0)
```

The same can be done with warnings.


## Linting

A way to check your code for common errors and style issues.

[`ruff`](https://docs.astral.sh/ruff/) is a new tool for linting Python code.

* syntax errors
* unused imports
* unused variables
* undefined names
* code style (e.g. line length, indentation, whitespace, etc.)


## Linting with ruff {.smaller}

```{.python code-line-numbers="|1|6|7" filename="examples/04_testing/process.py"}
import requests
import scipy

def preprocess(x, y, xout):

    x = x[~np.isnan(x)] 
    method = "cubic"
    # interpolate missing values with cubic spline
    return scipy.interpolate.interp1d(x, y)(xout)
```

Run `ruff check`:

```bash
$ ruff check process.py
process.py:1:8: F401 [*] `requests` imported but unused
process.py:6:12: F821 Undefined name `np`
process.py:7:5: F841 [*] Local variable `method` is assigned to but never used
Found 3 errors.
[*] 1 fixable with the `--fix` option (1 hidden fix can be enabled with the `--unsafe-fixes` option).
```

::: {.notes}
* Linting is a fast way to find common errors.
* Unused imports are confusing.
* Unused and undefined variables are usually a typo or a mistake. Fixing them can prevent bugs.

:::


## Formatting

::: {.incremental}

* Formatting code for readability and maintainability is essential.
* `ruff` can be used for code formatting (in addition to linting).
* `ruff` is a faster replacement of `black`, a previously commonly used code formatter.
* It enforces its own rules for formatting, which are only minorly configurable.
* Having a unified style makes code changes easier to understand and collaborate on.

:::


## Running ruff format

Check if files need formatting (optional):
```bash
$ ruff format --check
Would reformat: data_utils.py
Would reformat: dfsu/__init__.py
Would reformat: dataarray.py
Would reformat: dataset.py
Would reformat: spatial/geometry.py
Would reformat: pfs/pfssection.py
6 files would be reformatted
```

Format files:
```bash
$ ruff format
6 files reformatted
```


## Ruff format in Visual Studio Code

Visual Studio Code can be configured to run `ruff format` automatically when saving a file using the `Ruff` extension.

![](images/vs_code_format.png)

  
## Profiling

::: {.incremental}

* Profiling is a way to measure the performance of your code.
* It can help you identify bottlenecks in your code.
* Your intuition about what is slow is often wrong.
* The `line_profiler` package reports the time spent on each line of code.
* It can be run inside a notebook using the `lprun` magic command.

:::

## Profiling - *example code* {.smaller}

```python
import numpy as np

def top_neighbors(points, radius="0.1"):
    """Don't use this function, it's only purpose is to be profiled."""
    n = len(points)
    idx = np.array([int(x) for x in str.split("0 "* n)])

    for i in range(n):
        for j in range(n):
            if i != j:
                d = np.sqrt(np.sum((points[i] - points[j])**2))
                if d < float(radius): 
                    idx[i] += 1
    for i in range(n):
        for j in range(n - i - 1):
            if idx[j] < idx[j + 1]:
                idx[j], idx[j + 1] = idx[j + 1], idx[j]
                points[j], points[j + 1] = points[j + 1], points[j]
    return points

def main():
    points = np.random.rand(1000, 2)
    top = top_neighbors(points)
```

## Profiling - *output* {.smaller}

Invoking the jupyter magic command `lprun` with:

* function to profile - `top_neighbors`
* code to run - `main()`

```python
%lprun -f top_neighbors main()
```
. . .

```

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     3                                           def top_neighbors(points, radius="0.1"):
     4                                               """Don't use this function, it's only purpose is to be profiled."""
     5         1       2800.0   2800.0      0.0      n = len(points)
     6         1     353300.0 353300.0      0.0      idx = np.array([int(x) for x in str.split("0 "* n)])
     7                                           
     8      1001     345100.0    344.8      0.0      for i in range(n):
     9   1001000  378191701.0    377.8      2.2          for j in range(n):
    10   1000000  328387205.0    328.4      1.9              if i != j:
    11    999000        1e+10  14473.0     83.8                  d = np.sqrt(np.sum((points[i] - points[j])**2))
    12    999000  933778605.0    934.7      5.4                  if d < float(radius): 
    13     28952   57010001.0   1969.1      0.3                      idx[i] += 1
    14      1001     367100.0    366.7      0.0      for i in range(n):
    15    500500  144295203.0    288.3      0.8          for j in range(n - i - 1):
    16    499500  302166901.0    604.9      1.8              if idx[j] < idx[j + 1]:
    17    240227  212070500.0    882.8      1.2                  idx[j], idx[j + 1] = idx[j + 1], idx[j]
    18    240227  437538803.0   1821.4      2.5                  points[j], points[j + 1] = points[j + 1], points[j]
    19         1        500.0    500.0      0.0      return points
```

## Makefiles {.smaller}

::: {.incremental}
* Makefiles simplify running complex commands.
* They act as a single source of truth for how to run your tools.
* Self documenting, making it easier to onboard new team members.
* Run 'make &lt;command&gt;' to run a command from the Makefile.
* On Linux by default, for Windows install with [MSYS2](https://www.msys2.org/) or [Chocolatey](https://chocolatey.org/) (or use WSL).
:::

. . .

Example

```{.makefile filename="Makefile" code-line-numbers="|1|5-6|8-9|11-12|3|"}
LIB = my_library

check: lint typecheck test

lint:
	ruff check $(LIB)

format:
	ruff format $(LIB)

test:
	pytest --disable-warnings
```

## Summary {.smaller}

::: {.incremental}

* Testing is a way to verify that your code is working as expected.
* **Unit** tests are small, isolated tests that verify a single logical concept in your code.
* **Regression** tests are used to verify that your code is still working after changes.
* **Integration** tests are used to verify that your code works with other code.
* Exceptions are a way to handle errors in your code.
* Warnings are a way to alert users of your code to potential issues or usage errors.
* **Linting** is a way to check your code for common errors and style issues.
* **Ruff** is a fast linter that checks for common errors and style issues.
* **Ruff** is also an automatic code formatter that enforces a consistent style.
* Profiling is a way to measure the performance of your code.
* **Makefiles** simplify project related commands for everyone.

:::