Detecting New Entities In Model #13

Evangel-coder · 2023-10-25T05:13:40Z

Hi,

Would like to ask how do I train/ fine-tune the model to detect for new entities with the "ID" tag , e.g medicare number, phone number with +65. Real appreciate any insights on that!

prajwal967 · 2023-11-14T15:39:45Z

Hi,

Sorry for the delayed response.
Do you have a dataset with these new entities that you want to train the model on?

If you do, then you would need to get the data in this form: notes.jsonl

Then you can follow the steps given in this notebook: Train.ipynb and replace the files accordingly.

Let us know if that doesn't work!

Evangel-coder · 2023-11-15T00:05:49Z

Hi,
That’s alright! Really appreciate the quick comment, still annotating the dataset manually
Have some qns to clarify,

either we have to use prodigy or just annotate it manually for the dataset ?
For me to train it to detect Medicare number, can I do this with just a dataset that only contains Medicare number. If so, what would be the optimal number of data points needed.
Also, saw that there was this model available on hugging face:
would like to ask if it was possible to have just use the autotokenisor & Auto-model library to load up and train the model as it skips the tedious process of getting the libraries in for the model to work.

Really appreciate the insights given!

prajwal967 · 2023-12-20T09:54:20Z

Hi, sorry for the delay.

We used prodigy for annotation - while you can do it manually, it it more efficient to do it using prodigy.
Yes, if you want to train a model only to detect medicare number, you can train it against a dataset with only medicare number. However, this model won't be able to predict other attributes (e.g. name, date etc).
Yes, if you are training a new model from scratch you can use the AutoClasses. If you have the dataset you can follow the steps given here: https://github.com/huggingface/transformers/blob/main/examples/pytorch/token-classification/run_ner.py
- Our code mostly follows thier approach, but their code is more up to date and might be a better starting point if you're training something new.

Let us know if you have any other questions, thanks!

Evangel-coder · 2024-01-02T02:15:16Z

Hi,

Appreciate the informative response and wanted to clarify so I can fine-tune the model to work with a higher capability of detecting Medicare numbers as well. Just that I would need the I2B2 data with the I2B2 data, including a variety of Medicare number in it. And that data has to be in the stated data format that was described in repo. Hope to hear from you soon!

prajwal967 · 2024-01-02T05:06:21Z

Yes, that sounds about right! Let us know if there are any issues, thanks!

Evangel-coder · 2024-01-03T01:08:26Z

Alright, thanks for that clarification. I was under the impression that if I were to just fine-tune the model with just US Medicare numbers, the model would just add on to its capability of not just detecting 'Medicare number' but also continuing to detect other attributes like 'name' and 'date' at the same accuracy with Medicare as well?

prajwal967 · 2024-01-04T08:17:32Z

Yes, that could work, but there is a possibility that the model might forget what it has learnt previously (the accuracy of detecting other types of PHI might decrease)

Also, do you have a dataset with just medicare numbers? If so what does that dataset look like?

Evangel-coder · 2024-01-10T00:15:51Z

Hi,I've got a dataset; its just in the format of JSON, as described in the instructions provided. I'll try it out first, will let you on the results, is there any email contact I can use to contact you (if possible)?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detecting New Entities In Model #13

Detecting New Entities In Model #13

Evangel-coder commented Oct 25, 2023

prajwal967 commented Nov 14, 2023

Evangel-coder commented Nov 15, 2023 •

edited

Loading

prajwal967 commented Dec 20, 2023

Evangel-coder commented Jan 2, 2024

prajwal967 commented Jan 2, 2024

Evangel-coder commented Jan 3, 2024

prajwal967 commented Jan 4, 2024

Evangel-coder commented Jan 10, 2024 •

edited

Loading

Detecting New Entities In Model #13

Detecting New Entities In Model #13

Comments

Evangel-coder commented Oct 25, 2023

prajwal967 commented Nov 14, 2023

Evangel-coder commented Nov 15, 2023 • edited Loading

prajwal967 commented Dec 20, 2023

Evangel-coder commented Jan 2, 2024

prajwal967 commented Jan 2, 2024

Evangel-coder commented Jan 3, 2024

prajwal967 commented Jan 4, 2024

Evangel-coder commented Jan 10, 2024 • edited Loading

Evangel-coder commented Nov 15, 2023 •

edited

Loading

Evangel-coder commented Jan 10, 2024 •

edited

Loading