CU-86948uv4g docstring signature consistency #413

mart-r · 2024-04-11T10:08:55Z

Many people work on this project. And while we try our best to do reviews on PRs, not everything is always noticed. Especially with respect to documentation.

So when there's a change / PR for a method, sometimes the method's signature changes. But sometimes that change isn't reflected in the actual documentation (doc string).

So this PR attempts to automate the validation of doc strings to make sure they match the actual method signature they're trying to describe.
The way it does that is this:

Update to flake8==7.0.0 for linting
- We were on 4.0.1 before
- This newer version has built in checks for method signature validation in doc string (if they exist)

While looking for a solution for this, I also looked at a few other tools. But it felt easier / better to stick to what we have and update that.
With that said, due to the many-many new checks the new flake8 version provides, there were many minor changes - mostly to doc strings and/or method signature type hints - in many modules.

If we merge this PR, going forward, flake8 will always run as part of GitHub Actions workflow (as it has so far). With the caveat that it will now catch any discrepancies in doc strings and method bodies as well.

…strings)

…c strings)

…h doc strings)

…oc strings)

…strings)

…c strings)

… doc strings)

…ures match doc strings)

… signatures match doc strings)

tomolopolis · 2024-04-11T10:08:58Z

Task linked: CU-86948uv4g Ensure docstrings match signature

tomolopolis · 2024-04-17T09:05:01Z

medcat/cat.py

@@ -604,7 +616,7 @@ def unlink_concept_name(self, cui: str, name: str, preprocessed_name: bool = Fal

        cuis = [cui]
        if preprocessed_name:
-            names = {name: 'nothing'}
+            names = {name: {'nothing': 'nothing'}}


interesting - so this was a bug before?

The issue had to do with later calling CDB.remove_names with the names argument.
The names could either have the type Dict[str, Set[str]] (if preprocessed_name) or Dict[str, Dict] otherwise from medcat.preprocessing.cleaners.prepare_name (although the type hasn't been explicitly type-hinted by the method return type).

Before I started this procedure, the type hint for the CDB.remove_names method's names argument was simply Dict. So that worked with either of the aforementioned inputs.
However, the doc string said Dict[str, Dict]. So since flake8 was now checking doc strings, I assumed the more restrictive type would be appropriate and changed the type hint for the method.
After having made all the doc string changes, mypy was now having issues for the argument type. Which is why I made this change.

With that said, the CDB.remove_names method only acts on the keys of this dict. So it doesn't really matter what the values are. It's just that in most cases, the results from medcat.preprocessing.cleaners.prepare_name would be expected to be provided.

So all in all not necessarily a bug. Behaviour isn't changed, after all. But rather a clarification of the types of data generally expected.

EDIT:
Though in retrospect, we may want to just accept any Iterable[str] for CDB.remove_names and just iterate over it. Clearly it doesn't use the values so we don't really need to restrict the input to dicts.

yeah that makes more sense . slightly less confusing

tomolopolis

lgtm - good stuff - was long to even review!!

tomolopolis · 2024-04-17T10:34:24Z

medcat/cat.py

@@ -604,7 +616,7 @@ def unlink_concept_name(self, cui: str, name: str, preprocessed_name: bool = Fal

        cuis = [cui]
        if preprocessed_name:
-            names = {name: 'nothing'}
+            names = {name: {'nothing': 'nothing'}}


yeah that makes more sense . slightly less confusing

mart-r added 19 commits April 9, 2024 09:01

CU-86948uv4g: Add pydoctest to dev requirements

575e206

CU-86948uv4g: Run pydoctest during main workflow

fd1e2c8

CU-86948uv4g: Fix docstrings for CDB (make sure signatures match doc …

33cbdaf

…strings)

CU-86948uv4g: Fix docstrings for Vocab (make sure signatures match do…

7662a96

…c strings)

CU-86948uv4g: Fix docstrings for cdb_maker (make sure signatures matc…

62eb79e

…h doc strings)

CU-86948uv4g: Fix docstrings for config (make sure signatures match d…

d6543e1

…oc strings)

CU-86948uv4g: Move to darglint

850ac82

CU-86948uv4g: Remove unnecessary pydoctest fixes

b2fce68

CU-86948uv4g: Fix docstrings for CAT (make sure signatures match doc …

14f8ef7

…strings)

CU-86948uv4g: Fix docstrings for stats (make sure signatures match do…

70ed455

…c strings)

CU-86948uv4g: Fix docstrings for CDB aker (make sure signatures match…

87f6855

… doc strings)

CU-86948uv4g: Fix docstrings for a few more modules (make sure signat…

6da354d

…ures match doc strings)

CU-86948uv4g: Fix docstrings for a few more modules (make sure signat…

53262a8

…ures match doc strings)

CU-86948uv4g: Update flake8 config to support later versions of flake8

5a8ae1c

CU-86948uv4g: Fix docstrings for a the rest of the modules (make sure…

7621dea

… signatures match doc strings)

CU-86948uv4g: Move away from darglint

f7487e3

CU-86948uv4g: Bump flake8 to 7.0.0 for documentation checks

e275d0d

CU-86948uv4g: Fix typing issues

52612f6

Merge branch 'master' into CU-86948uv4g-docstring-signature-consitency

12530e2

tomolopolis reviewed Apr 17, 2024

View reviewed changes

tomolopolis approved these changes Apr 17, 2024

View reviewed changes

mart-r added 2 commits April 18, 2024 11:56

Merge branch 'master' into CU-86948uv4g-docstring-signature-consitency

37364c8

CU-86948uv4g: Fix additional doc string issues from new things in master

f672dec

mart-r merged commit 91e2bc8 into master Apr 18, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CU-86948uv4g docstring signature consistency #413

CU-86948uv4g docstring signature consistency #413

mart-r commented Apr 11, 2024

tomolopolis commented Apr 11, 2024

tomolopolis Apr 17, 2024

mart-r Apr 17, 2024 •

edited

Loading

tomolopolis Apr 17, 2024

tomolopolis left a comment

tomolopolis Apr 17, 2024

CU-86948uv4g docstring signature consistency #413

CU-86948uv4g docstring signature consistency #413

Conversation

mart-r commented Apr 11, 2024

tomolopolis commented Apr 11, 2024

tomolopolis Apr 17, 2024

Choose a reason for hiding this comment

mart-r Apr 17, 2024 • edited Loading

Choose a reason for hiding this comment

tomolopolis Apr 17, 2024

Choose a reason for hiding this comment

tomolopolis left a comment

Choose a reason for hiding this comment

tomolopolis Apr 17, 2024

Choose a reason for hiding this comment

mart-r Apr 17, 2024 •

edited

Loading