-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve PATIENT/PERSOON processing and more #144
Open
mkorvas
wants to merge
41
commits into
vmenger:main
Choose a base branch
from
mkorvas:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Beware! `poetry.lock` is not up-to-date in this commit (and most recent commits wouldn't work with the current last released version of `docdeid`, anyway).
Leaving the test case commented out for now. This won't be a frequent problem but it's something I noticed when first trying out this tool.
Otherwise, random names are labeled as "patient", which will be wrong in most cases.
...and use it to determine where patient name is to be merged with a neighbouring person mention and when not.
...as required by pylint.
This makes Pylint happier and the code simpler.
This is needed so as to reduce the number of arguments for the `_match_sequence` method and creates a cleaner inheritance hierarchy between annotators, too.
This bug sent Deduce into an endless loop. I wonder whether that's still possible with carefully crafted inputs, and something to be fixed.
This is a change I had locally and surfaced during merge conflict resolution, which may be necessary for some tests to work (and by extension to the system at runtime, too).
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is the result of my first encounter with this codebase (Docdeid and Deduce), the second part (Deduce). My goal was to understand the inner workings of it and then make sure that capitalized street names are pseudonymized (all-caps or titlecased, and covering also the special case of the "IJ" digraph in Dutch). When at it, I noticed unexpected behaviour for patient names v. other person names and improved that as well.
This depends on changes in Docdeid, filed as vmenger/docdeid#20.
To use that Docdeid version, I checked out the two repos side by side and added the following configuration in Deduce's
pyproject.toml
:FWIW, I also see a diff in my local (non-committed) version of
base_config.json
affecting "initiaal_patient" mentions but it's been 4 months since I intensively worked on this codebase so I don't remember anymore whether it's useful or even necessary anymore. But if some tests fail without it for you, let me know, this may well be the reason.