Finetuning coreference model for custom spacy model #12021
Replies: 4 comments 48 replies
-
I see you linked to sample annotations in #11585, so let me link that post here: #11585 (comment) To clarify one thing, there is no such thing as "LitBank format" - LitBank distributes coref annotations in BRAT, CoNLL, and TSV formats. We actually use LitBank CoNLL data in our tests. It looks like what you have is BRAT data. I haven't used it before, but it looks like there's enough information in the BRAT files to convert it to spaCy coref annotations on Docs, or to the CoNLL format (though that is kind of complex). It looks like the T lines are mentions and the R lines are references that connect mentions. Can you clarify how you used coreferee? What you refer to in this part:
Note that |
Beta Was this translation helpful? Give feedback.
-
Hi thanks for replying back @polm , Let me try to clear everything until now that I did for implementing finetuning of coreference model
How should I proceed to train ? I am not sure if changing the training data to tsv/CoNLL/spaCy coref annotations format would help? |
Beta Was this translation helpful? Give feedback.
-
HI @polm, I think this is really unfortunate but after upgrading the colab gpu, I have around 90gb ram but the pytorch version 1.12 and spacy throws an error of compatibility with the high ram gpu assigned.
With pytorch 1.13 the gpu seems compatible but spacy-experimental wont work with it. Any work arounds or solutions? |
Beta Was this translation helpful? Give feedback.
-
HI @polm, I curated 192 small documents out of the big chunk but I am still getting terrible results as well as the training epochs for both models end after 1 epoch. Can you look at the repo ? Some interesting observations:
Will the output even get better since on 200 sentences its same as 10 sentences, I am not sure if I am doing it correctly? And it takes a lot of time and effort to find raw sentences and annotate, so doing it for 500 sentences seems like a bad idea to me. |
Beta Was this translation helpful? Give feedback.
-
Hello everyone,
I trained my spacy model pipeline just for NER and sourced other pipeline components from pretrained en_core_web_sm model.
I wanted to implement coreference resolution just for a single entity detected by my custom spacy model.
Eg, "The section ABC of XYZ Act states that......... Section ABC of the act also proves .... "
Here, I wanted model to tag ["section ABC of XYZ Act", section ABC of the act].
And I thought this of implementing like the following
So to implement coreference part I first came across the coreferee model and annotated some dataset according to litbank format but while training the rules written were not tagging anything true (maybe because of span.root value ).
For example, when training the coreferee model I saw that my span.root value gives incorrect value
In Litbank data: "Her father" -> "father"
for my data: "The Copy right Act" -> "act" (but I want copyright act)
Is there any simpler way of finetuning the coreference model in spacy experimental or if you someone could help training the coreferee model since I have already annotated data in Litbank format?
Thanks in advance !
Beta Was this translation helpful? Give feedback.
All reactions