CU-8693t24ed: Add workaround for older DeID models in newer MedCAT #397
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In
medcat==1.9.3
there was a change to thetransformers
dependency range (fromtransformers>=4.19.2,<4.22.0
totransformers>=4.34.0
). That change also removed the comment that the pin was for the DeID model to work.I've looked into the issue and believe I've got a workaround for it.
The newer versions of transformers (starting rom
4.22.0
) expect the tokenizer to have a specific attribute (_in_target_context_manager
). However, as we load it off disk, the attribute doesn't exist since the saved version is older.The change reference:
huggingface/transformers#18325
So what this PR does is add the missing attribute to the tokenizer.
I've tested locally with the DeID model and after this change, it worked just fine.
PS:
This isn't really easy to test automatically. The only way we could do that would be to include (or potentially download during test time) an older DeID model. But I don't want to clutter the repo with that.