Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve PATIENT/PERSOON processing and more #144

Open
wants to merge 41 commits into
base: main
Choose a base branch
from

Conversation

mkorvas
Copy link

@mkorvas mkorvas commented Jul 12, 2024

This is the result of my first encounter with this codebase (Docdeid and Deduce), the second part (Deduce). My goal was to understand the inner workings of it and then make sure that capitalized street names are pseudonymized (all-caps or titlecased, and covering also the special case of the "IJ" digraph in Dutch). When at it, I noticed unexpected behaviour for patient names v. other person names and improved that as well.

This depends on changes in Docdeid, filed as vmenger/docdeid#20.

To use that Docdeid version, I checked out the two repos side by side and added the following configuration in Deduce's pyproject.toml:

[tool.poetry.dependencies]
docdeid = {path = "../docdeid", develop = true}

FWIW, I also see a diff in my local (non-committed) version of base_config.json affecting "initiaal_patient" mentions but it's been 4 months since I intensively worked on this codebase so I don't remember anymore whether it's useful or even necessary anymore. But if some tests fail without it for you, let me know, this may well be the reason.

Beware! `poetry.lock` is not up-to-date in this commit
(and most recent commits wouldn't work with the current
last released version of `docdeid`, anyway).
Leaving the test case commented out for now.
This won't be a frequent problem but it's something
I noticed when first trying out this tool.
Otherwise, random names are labeled as "patient",
which will be wrong in most cases.
...and use it to determine where patient name is to
be merged with a neighbouring person mention and when
not.
...as required by pylint.
This makes Pylint happier and the code simpler.
This is needed so as to reduce the number of arguments
for the `_match_sequence` method and creates a cleaner
inheritance hierarchy between annotators, too.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants