Skip to content

Commit

Permalink
Merge pull request #9 from DHI/improve-project2
Browse files Browse the repository at this point in the history
Improve course project
  • Loading branch information
jsmariegaard authored Nov 3, 2023
2 parents ddb9f09 + c2e7678 commit eefd415
Show file tree
Hide file tree
Showing 9 changed files with 169 additions and 106 deletions.
41 changes: 23 additions & 18 deletions projects/data_cleaning/Project_module_01.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,31 @@
## Module 1: GitHub and basic functions

Let's get started on the project! You have received the script [`clean_project_data_v4_final2.py`](clean_project_data_v4_final2.py) from a colleague and in this module you will create a GitHub repository, add the script and improve the script by using functions.

- 1.1 GitHub repo
- 1.1.1 Create a new GitHub repository "timeseriescleaner"
- private, no template, add readme, gitignore python, no license
- 1.1.2 Go to repo settings/Collaborators add your instructors and your "buddy"
- 1.1.3 Clone repo to local machine
- Create a new GitHub repository "timeseriescleaner" on your own GitHub profile (not on your organization's GitHub)
- Make it private, no template, add readme, gitignore python, no license
- Go to repo settings/Collaborators add your instructors and your "buddy"
- Clone repo to local machine
- [Optional] Create virtual environment for this course project (use venv or mamba/conda environment)
- 1.1.4 Download the provided Python script and add it to the repo
- 1.1.5 Commit the file and push the changes (Check that the file can be found on GitHub)
- 1.1.6 Open the project in vscode and make a single character change to the file (add a comment)
- 1.1.7 Commit the changes (Check that it works on GitHub)
- Download the provided Python script and add it to the repo
- Commit the file and push the changes (Check that the file can be found on GitHub)
- Open the project in vscode and make a single character change to the file (add a comment)
- Commit and push the changes (Check that you can find it on GitHub)
- 1.2 Functions
- 1.2.1 Create a local branch "refactor-functions"
- 1.2.2 Refactor the code to use functions (`clean_spikes`, `clean_outofrange`, `clean_flat`, `plot_timeseries`)
- for data in [data1, data2, data3]:
- data_original = data.copy()
- data = clean_spikes(data, max_jump=10)
- data = clean_outofrange(data, min_val=0, max_val=50)
- data = clean_flat(data, flat_period=5)
- plot_timeseries(data_original, data)
- 1.2.3 Check that your code and produce the same results as before (you should not change the functionality!)
- 1.2.4 Commit your code in 1 or more commits (in the end your code should be approximately 75 lines long)
- Create a local branch "refactor-functions"
- Refactor the code to use functions (`clean_spikes`, `clean_outofrange`, `clean_flat`, `plot_timeseries`)
- You should be able to run the cleaning using this loop:
```python
for data in [data1, data2, data3]:
data_original = data.copy()
data = clean_spikes(data, max_jump=10)
data = clean_outofrange(data, min_val=0, max_val=50)
data = clean_flat(data, flat_period=5)
plot_timeseries(data_original, data)
```
- Check that your code runs and produce the same results as before (you should not change the functionality when refactoring!)
- Commit your code in one or more commits (in the end, your code should be approximately 75 lines long)
- Create a pull request in GitHub and "request review" from your reviewers
- Wait for feedback, Adjust code until approval, then merge (and delete branch)

Expand Down
45 changes: 28 additions & 17 deletions projects/data_cleaning/Project_module_02.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,41 @@
## Module 2: Modules and classes

After last module, your script now uses functions `clean_spikes`, `clean_outofrange`, `clean_flat`, `plot_timeseries` and has a nice home on GitHub. In this module, you will improve the functions, move them to separate modules and then refactor your code to use classes. Finally, you will check that it all works by running a notebook.

- Create new branch "modules-classes" (Make sure changes from last module have been merged, and that you start from the main branch)
- 2.1 Function arguments
- Add default arguments to the functions. Commit.
- Make sure that you only use positional arguments where there is only one argument. Use keyword arguments everywhere else. Commit.
- Consider modifying the cleaning functions if they modify the input (remember that inputs are passed as reference, not a copy), e.g.
```python
data_cleaned = data.copy()
...
return data_cleaned
```
- 2.2 Modules
- Move cleaner functions into a separate module "cleaning.py". Commit.
- Move the plotting function into a separate module "plotting.py". Commit.
- Rename the script `main.py` and execute the cleaning and plotting.
- from cleaning import ...
- from plotting import ...
- Check that it runs!
- Move cleaner functions into a separate module `cleaning.py`. Commit.
- Move the plotting function into a separate module `plotting.py`. Commit.
- Rename `clean_project_data_v4_final2.py` to `main.py` and execute the cleaning and plotting.
- `from cleaning import ...`
- `from plotting import ...`
- Check that it runs! Commit.
- 2.3 Classes
- Organize the cleaning functions into classes that all have the same structure (an init method and a clean method)
- SpikeCleaner
- `def __init__(max_jump)`
- `def clean(data)`
- e.g. for SpikeCleaner
- create and init method: `def __init__(max_jump)`
- and a clean method: `def clean(data)`
- modify `main.py` and check that it runs
- cleaners = [
- SpikeCleaner(max_jump=10),
- OutOfRangeCleaner(min_val=0, max_val=50),
- FlatPeriodCleaner(flat_period=5),
- ]
- for cleaner in cleaners:
- data = cleaner.clean(data)
- Download notebook_A and csv file and make sure it runs. (remove any remaining print statements)
```python
cleaners = [
SpikeCleaner(max_jump=10),
OutOfRangeCleaner(min_val=0, max_val=50),
FlatPeriodCleaner(flat_period=5),
]
for cleaner in cleaners:
data = cleaner.clean(data)
```
- commit
- Download [`notebook_A.ipynb`](notebook_A.ipynb) and csv file [`example_data1.csv`](example_data1.csv) and make sure it runs. (remove any remaining print statements)
- Create pull request in GitHub and "request review" from your reviewers
- Get feedback, Adjust code until approval, then merge (and delete branch)

Expand Down
43 changes: 25 additions & 18 deletions projects/data_cleaning/Project_module_03.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,20 @@
## Module 3: Installable package and pytest

In the last module, you introduced modules, classes and a new notebook in you repo. In this module, you will add tests to your code base. But first, you will make your package installable.

- Create new branch "package-test" (Make sure changes from last module have been merged, and that you start from the main branch)
- Make sure pytest and pytest-cov are installed
- 3.1 Installable package
- 3.1.1 Organize the files into folders and add setup.py. Call your package tscleaner.
- Organize the files into folders and add `setup.py`. Call your package `tscleaner`.
- subfolders: tscleaner, scripts, notebooks, tests
- make init-file in tscleaner with
- `from .cleaning import SpikeCleaner, FlatPeriodCleaner, OutOfRangeCleaner`
- `from .plotting import plot_timeseries`
- create a setup.py in the root with the following content (change with your data):
- from setuptools import setup, find_packages
- setup(
- make an init-file `__init__.py` in the tscleaner folder with the following content:
```python
from .cleaning import SpikeCleaner, FlatPeriodCleaner, OutOfRangeCleaner
from .plotting import plot_timeseries
```
- create a `setup.py` in the root with the following content (change with your data):
```python
from setuptools import setup, find_packages
setup(
name='MyPackageName',
version='0.0.1',
url='https://github.com/mypackage.git',
Expand All @@ -19,18 +23,21 @@
description='Description of my package',
packages=find_packages(),
install_requires=['numpy', 'matplotlib'],
)
- 3.1.2 Install the package in editable mode.
- `>pip install -e .`
- 3.1.3 Modify import statements in notebook_A and script main.py and make sure they run.
- 3.1.4 Modify cleaner tools by raising exceptions for invalid inputs.
- 3.1.5 Move the csv file to `/tests/testdata` and update notebook with relative path to the file
)
```
- Install the package in editable mode, by running the below command from the project root.
- `> pip install -e .`
- Modify import statements in `notebook_A` and script `main.py` and make sure they run.
- Modify the cleaner tools by raising exceptions for invalid inputs.
- Move the csv file to `/tests/testdata` and update notebook with relative path to the file
- 3.2 Pytest
- 3.2.1 Write unit tests with pytest in the `/tests` folder. Create an empty init-py file in the folder. Create a file `test_cleaning.py` and create at least five tests that verify that the cleaning tools work as intended
- Make sure `pytest` and `pytest-cov` are installed
- Write unit tests with pytest in the `/tests` folder. Create an empty `__init__.py` file in the folder. Create a file `test_cleaning.py` and create at least three tests that verify that the cleaning tools work as intended
- If all your tests are failing, consider if you have given the right requirements in the `setup.py`...
- [Optional] Consider to make a fixture that reads the csv file and you can read in all tests
- 3.2.2 Run the tests from the commandline by writting `>pytest` in the project root (can you also run the tests from VSCode?)
- 3.2.3 Assess the test coverage with `>pytest --cov=tscleaner tests`
- Optional: Get coverage as html with `>pytest --cov=tscleaner --cov-report html` (check the index.html in the htmlcov subfolder afterwards)
- Run the tests from the commandline by writting `>pytest` in the project root (can you also run the tests from VSCode?)
- Assess the test coverage with `>pytest --cov=tscleaner tests`
- [Optional] Get coverage as html with `>pytest --cov=tscleaner --cov-report html` (check the index.html in the htmlcov subfolder afterwards)
- Create pull request in GitHub and "request review" from your reviewers
- Get feedback, Adjust code until approval, then merge (and delete branch)

Expand Down
38 changes: 22 additions & 16 deletions projects/data_cleaning/Project_module_04.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,34 @@
## Module 4: GitHub actions and auto-formatting

Your package is now installable and testable. In this module, you will make the tests automatic in GitHub and make sure that your code style adhere to the standards in PEP-8.

In this module, we will use some files from the python library template. When you, after this course, needs to create a new package, you can simply start from this template when creating the repo.

- Create new branch "action-formatting" (Make sure changes from last module have been merged, and that you start from the main branch)
- 4.1 Github Action
- 4.1.1 Copy the GitHub action "python-app.yml" from the python template https://github.com/DHI/template-python-library to your own library (make sure it sits in the same folder).
- 4.1.2 Change all occurrences of "my_library" in the yml file to your package name "tscleaner"
- 4.1.3 Comment out the line with "ruff-action" with "#"
- 4.1.4 Commit, push and create a pull request; the tests should now run, verify that they all run before you move on
- Copy the GitHub action file `python-app.yml` (in the `.github` folder) from the python template https://github.com/DHI/template-python-library to your own library (make sure it sits in the same folder).
- Change all occurrences of "my_library" in the yml file to your package name "tscleaner"
- Comment out the line with "ruff-action" with "#"
- Commit, push and create a pull request; the tests should now run, verify that they all run before you move on
- 4.2 Ruff
- 4.2.1 Enable the "ruff-action" be removing the "#" you added above
- 4.2.2 Commit and push, your actions will probably fail now - inspect the problems by clicking the red cross (did you also get an email?)
- 4.2.3 Install "ruff" on your local machine with mamba/conda/pip
- 4.2.4 Navigate to your project root folder and run ruff with "ruff ."
- 4.2.5 Add `__all__ = ["SpikeCleaner", "FlatPeriodCleaner", "OutOfRangeCleaner", "plot_timeseries"]` to your `__init__.py` file and fix remaining issues until ruff passes
- 4.2.6 Commit, push and verify that you action now succeeds
- Enable the "ruff-action" be removing the "#" you added above
- Commit and push; your actions will probably fail now - inspect the problems by clicking the red cross (did you also get an email?)
- Install "ruff" on your local machine with mamba/conda/pip
- Navigate to your project root folder and run ruff with "ruff ."
- Add the following line to your `__init__.py` file
`__all__ = ["SpikeCleaner", "FlatPeriodCleaner", "OutOfRangeCleaner", "plot_timeseries"]`
- fix remaining issues until ruff passes
- Commit, push and verify that you action now succeeds
- 4.3 Black
- 4.3.1 Install "black" on your local machine with mamba/conda/pip
- 4.3.2 Run black from your project root folder; inspect the differences; commit
- Install "black" on your local machine with mamba/conda/pip
- Run black from your project root folder; inspect the differences; commit
- 4.4 pyproject.toml
- Copy the pyproject.toml from the python template https://github.com/DHI/template-python-library (this file will replace your setup.py)
- Modify to fit your package
- Remove the setup.py
- Copy the `pyproject.toml` from the python template https://github.com/DHI/template-python-library (this file should replace your `setup.py`)
- Modify the file contentes to fit your package
- Remove the `setup.py` file
- Commit, push and verify that the GitHub action runs
- If it fails, you probably forgot some dependencies - go back and fix
- [Optional] You should also re-install your local package with ">pip install --upgrade -e ."
- [Optional] You should also re-install your local package with `>pip install --upgrade -e .`
- 4.5 [Optional] Enable black and ruff extensions in VSCode; set black to run on save
- Create pull request in GitHub and "request review" from your reviewers
- Get feedback, Adjust code until approval, then merge (and delete branch)
Expand Down
50 changes: 37 additions & 13 deletions projects/data_cleaning/Project_module_05.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,41 @@
## Module 5: Documentation
## Module 5: Object-oriented design

- Create new branch "docs" (Make sure changes from last module have been merged, and that you start from the main branch)
- 6.1 README
- Write a README file with basic information about the project.
- 6.2 Docstrings
- Write NumPy style docstrings for all functions and classes.
- [Optional] Install the autodocstrings extension in VSCode (set the style to NumPy)
- 6.3 mkdocs
- Install mkdocs, mkdocstrings and material design `mamba/pip install mkdocstrings-python mkdocs-material`
- Create a `mkdocs.yml` file (copy from https://github.com/DHI/template-python-library and adapt).
- Create a docs folder and create a markdown file `index.md` inside.
- Create API documentation locally using `>mkdocs serve`.
- Check the generated HTML documentation.
In this module you will benefit from the automatic testing that you have added in the last module. Let's explore some other object-oriented designs for our code base...

- Create new branch "oop-dataclasses" (Make sure changes from last module have been merged, and that you start from the main branch)
- 5.1 Type Hints
- Add type hints to all functions and methods. Commit
- 5.2 Data class
- Make all the cleaner classes dataclasses. e.g.:
```python
from dataclasses import dataclass
...
@dataclass
class SpikeCleaner:
```
- remove the `__init__` method (not needed anymore)
- Check that the notebook still runs and that the classes indeed work as data classes (e.g. have a string representation and support equality testing etc)
- Commit
- 5.3 Module level function
- Make a private module function `_print_stats()` that prints the number of cleaned values
- call the function from each of the clean methods (note: inheritance is not required to obtain common functionality)
- 5.4 Composition or inheritance
- Create a new cleaner class called CleanerWorkflow that takes a list of cleaners when constructed and has a clean method that run all the cleaners' clean methods.
```python
class CleanerWorkflow:

def __init__(self, cleaners) -> None:
self.cleaners = cleaners
def clean(self, data: pd.Series) -> pd.Series:
data_cleaned = data.copy()
for cleaner in self.cleaners:
...
```
- Modify the notebook to use the CleanerWorkflow instead of looping over the cleaners
- Consider what type of validation you would want CleanerWorkflow to have? Is it better check validity up front or to just go ahead and handle problems afterwards?
- Consider whether it would be better to create a base class BaseCleaner - write down your considerations as a comment in the pull request, refer to specific lines of code
- e.g. how would you handle e.g. common plotting functionality in the cleaner classes?
- Create pull request in GitHub and "request review" from your reviewers
- Get feedback, Adjust code until approval, then merge (and delete branch)

Expand Down
Loading

0 comments on commit eefd415

Please sign in to comment.