Relation extraction #173

vladd-bit · 2021-11-16T10:56:07Z

No description provided.

…tion_extraction

w-is-h · 2021-11-16T11:47:41Z

@vladd-bit this should be a spacy component, so needs to have the __call__ and pipe as weel as save and load with the same args and structure as in meta_cat.py

The relation_extraction.py is a mix of everything ever, has training pre-processing functions, losses, datasets, train/test stuff, tokenizers and so on. Would be good to organize this slightly

tomolopolis · 2021-11-18T13:03:39Z

@vladd-bit - don't think this is ready to merged. I think we're after a simple 'reference' or baseline implementation of a RelationExtraction model.

is there a reference paper that you've built this implementation on - Is it this one? and some reference data / results that you've tested this on?

Is this PR essentially a conversion of that code to pyTorch and integration into spacy Docs so once relations are found by the model these are attached to the Doc?

Why was the re-implementation of the BERT components i.e. why is all of module https://github.com/CogStack/MedCAT/pull/173/files#diff-cc36c887710fb29a577dce375998afef4bfe9b54ba77ad0438b70688a8f81b51 necessary? Seems like you're just re-implementing the entirety of the model? Are there any differences in the implementation that are worth noting, please comment them if so?
From what I could tell, BERTMLMHead, BERTOnlyNSPHead aren't used anywhere?
The import of the BERT_RelationExctracton, is wrong and should be medcat.utils.relation_extraction.models import ...
Are you sure this PR works / and produces results?
This implementation should work with Relation Annotation data as outputted by the trainer. Similarly to how CAT and MetaCAT classes accept MedCATtrainer_export.json exports to train instances

medcat/relation_extraction.py

medcat/utils/relation_extraction/models.py

tomolopolis

This isn't ready to be merged in. Can you answer Zeljko and mine's previous comments (i.e. respond, then resolve), and remove any code that doesn't need to be reviewed. i.e. all of medcat.utils.relation_extraction.models (??)

We're looking for an API similar to MetaCAT, i.e. CAT( cdb, meta_cats=[MetaCAT(...), ], rel_cats=[ RelCAT( ...)],

Your RelationExtraction class can't be used this way, and is a general purpose RelationExtraction model I think? This is fine, but should be documented as such.

Alternatively - if all these changes are simply an experiment that uses MedCAT annotations and classifies a tuple of annotations (or spans of text) with a relation, maybe all this code should just live in its own repo?

It would also be super helpful to create an example notebook or Colab saved notebook demonstrating this being used, with a dataset, you previously mentioned you tested this on the i2b2 dataset and others?

medcat/relation_extraction.py

medcat/config_re.py

tomolopolis · 2022-01-05T12:10:08Z

medcat/relation_extraction.py

+from medcat.utils.relation_extraction.rel_dataset import RelData
+from seqeval.metrics import precision_score, recall_score, f1_score
+
+class RelationExtraction(object):


To Zeljko's prior comment. This module should be similar to MetaCAT so, i.e. subclasses PipeRunner, so that pipe and call etc. are available. We want ultimately want API like:

CAT(cdb, meta_cats=[ MetaCAT .. ] , rel_cats=[ RelCAT(... ), ... ])

partially done, probably need to add this to the pipe file as well now

medcat/relation_extraction.py

medcat/utils/model_utils.py

tomolopolis · 2024-04-12T10:47:23Z

@vladd-bit - still working on those tests?

into relation_extraction

mart-r · 2024-04-15T09:37:43Z

@vladd-bit Do you mind fixing typing stuff

mart-r

Overall looking good.

For doc strings and/or type hints, it it's not supposed to be a public method, we can use _ for it.

medcat/cat.py

medcat/rel_cat.py

mart-r · 2024-04-15T15:38:23Z

medcat/rel_cat.py

+
+    log = logging.getLogger(__package__)
+
+    def __init__(self, cdb: CDB, tokenizer: TokenizerWrapperBERT, config: ConfigRelCAT = ConfigRelCAT(), task="train", init_model=False):


The mutable default value is generated once. So if we create one RelCAT with default config and go on to change its config. If we then create a subsequent RelCAT instance, it will share the changed config instance.
So we probably want to do what's done in CAT. I.e have the default be None, the type be Optional[ConfigRelCAT] and then check for None and set value if appropriate.

mart-r · 2024-04-15T15:43:15Z

medcat/rel_cat.py

+
+        self.log.setLevel(self.config.general.log_level)
+
+        self.learning_rate = config.train.lr


Maybe properties? Otherwise changes in config won't be reflected.
Otherwise document in config.

mart-r · 2024-04-15T16:17:53Z