Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CU-86948uv4g docstring signature consistency #413

Merged
merged 21 commits into from
Apr 18, 2024

Conversation

mart-r
Copy link
Collaborator

@mart-r mart-r commented Apr 11, 2024

Many people work on this project. And while we try our best to do reviews on PRs, not everything is always noticed. Especially with respect to documentation.

So when there's a change / PR for a method, sometimes the method's signature changes. But sometimes that change isn't reflected in the actual documentation (doc string).

So this PR attempts to automate the validation of doc strings to make sure they match the actual method signature they're trying to describe.
The way it does that is this:

  • Update to flake8==7.0.0 for linting
    • We were on 4.0.1 before
    • This newer version has built in checks for method signature validation in doc string (if they exist)

While looking for a solution for this, I also looked at a few other tools. But it felt easier / better to stick to what we have and update that.
With that said, due to the many-many new checks the new flake8 version provides, there were many minor changes - mostly to doc strings and/or method signature type hints - in many modules.

If we merge this PR, going forward, flake8 will always run as part of GitHub Actions workflow (as it has so far). With the caveat that it will now catch any discrepancies in doc strings and method bodies as well.

mart-r added 19 commits April 9, 2024 09:01
@tomolopolis
Copy link
Member

@@ -604,7 +616,7 @@ def unlink_concept_name(self, cui: str, name: str, preprocessed_name: bool = Fal

cuis = [cui]
if preprocessed_name:
names = {name: 'nothing'}
names = {name: {'nothing': 'nothing'}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting - so this was a bug before?

Copy link
Collaborator Author

@mart-r mart-r Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue had to do with later calling CDB.remove_names with the names argument.
The names could either have the type Dict[str, Set[str]] (if preprocessed_name) or Dict[str, Dict] otherwise from medcat.preprocessing.cleaners.prepare_name (although the type hasn't been explicitly type-hinted by the method return type).

Before I started this procedure, the type hint for the CDB.remove_names method's names argument was simply Dict. So that worked with either of the aforementioned inputs.
However, the doc string said Dict[str, Dict]. So since flake8 was now checking doc strings, I assumed the more restrictive type would be appropriate and changed the type hint for the method.
After having made all the doc string changes, mypy was now having issues for the argument type. Which is why I made this change.

With that said, the CDB.remove_names method only acts on the keys of this dict. So it doesn't really matter what the values are. It's just that in most cases, the results from medcat.preprocessing.cleaners.prepare_name would be expected to be provided.

So all in all not necessarily a bug. Behaviour isn't changed, after all. But rather a clarification of the types of data generally expected.

EDIT:
Though in retrospect, we may want to just accept any Iterable[str] for CDB.remove_names and just iterate over it. Clearly it doesn't use the values so we don't really need to restrict the input to dicts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that makes more sense . slightly less confusing

Copy link
Member

@tomolopolis tomolopolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - good stuff - was long to even review!!

@@ -604,7 +616,7 @@ def unlink_concept_name(self, cui: str, name: str, preprocessed_name: bool = Fal

cuis = [cui]
if preprocessed_name:
names = {name: 'nothing'}
names = {name: {'nothing': 'nothing'}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that makes more sense . slightly less confusing

@mart-r mart-r merged commit 91e2bc8 into master Apr 18, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants