Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I use BioBERT for Relation extraction and further finding the effective entities in a sentence? #17

Open
Meghna-Goyal opened this issue Mar 17, 2021 · 3 comments

Comments

@Meghna-Goyal
Copy link

Meghna-Goyal commented Mar 17, 2021

Hi team,

Please find below an example of the problem that I am trying to solve

Original Sentence:
PD98059, a specific inhibitor of MEK, had little effect on the TNF-alpha-induced phosphorylation of Akt

Input sentence (Masking the entities):
DRUG, a specific inhibitor of GENE, had little effect on the GENE-induced phosphorylation of GENE.

Expected output:

Relation: Inhibitor
Effective Entities: DRUG and GENE (first occurrence)

Can you please let me know if I can use BIOBERT to do multilabel classification for relation extraction and then finding the effective entities in the sentence (if the sentence has multiple occurrences of the same entity type)

Thanks And Regards,
Meghna Goyal

@SRL94
Copy link

SRL94 commented Aug 18, 2022

Hi Meghna,

I have the same question. Have you figured it out?

Best regards
Sirui

@wonjininfo
Copy link
Member

wonjininfo commented Aug 18, 2022

Hi all,
And apologies for the delay in response, Meghna Goyal.

Last year, I worked on the multi-label RE task (DrugProt) using LMs and made our code available on https://github.com/dmis-lab/BioRE-drugprot-kuaz

You will need to write some code to pre-process your input data as preprocessing codes are not available yet. (I wish I can do it soon but has a list of things to do for my graduation these days - Apologies for this)

To predict relation classes for a plain text, you need to

  1. Run NER tools to recognize named entities
  2. pre-process your input data (post-NER) to match the format (check pre-processed datasets in the BioRE repo)
  3. Use a trained model to predict relation classes for your input data

Also please note that in our participation in the BioCreative VII challenge (DrugProt), we wrapped entities with markers, and this showed better performance than masking entities (i.e. replacing entities with masks). A short description of our participation is available here. Figure 3 may be informative for your question.
When there are multiple entities in a sentence, we made multiple datapoints, or samples, from the sentence.

For example,

DRUG, a specific inhibitor of GENE, had little effect on the TNF-induced phosphorylation of Akt.
DRUG, a specific inhibitor of MEK, had little effect on the GENE-induced phosphorylation of Akt.
DRUG, a specific inhibitor of MEK, had little effect on the TNF-induced phosphorylation of GENE.

Thank you for your interest in our work!
Best,
Wonjin

@anuragpande1977
Copy link

Hi I am lookin for help in using the RE for biomedical text, I have used Scispacy for NER using the bc5cdr for CHEMICAL and DISEASE entity on pubmed abstract. The NER functions quite well ,but RE is not something that comes in domain for Scispacy. I created semantic based graphs but need RE for the actual work. Any suggestions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants