-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What format to use for training data and NER-model #959
Comments
Hi,
Now my question, somehow the training is not performing well, I supposed this is due to the small training data. I get all entries in test corpus tagged as FINANCE or all tagged by BANKING. How big does my train data need to be to get a better performance? I guess I will have to annotate a bigger corpus for may training data. Can this be done in a different way? What algorithm is behind the spacy Named Entity Recognizer? Thanks |
The new version 1.8.0 comes with bug fixes to the NER training procedure and a new
I hope this helps! |
Thanks Ines, yes this helps a lot. One last thing, I am also interest in finding relations between the entities. For example using regex to find specific words between two entities. Is there an approachable way to this with spacy once a have a trained model with new entities? |
Well the document class returned by If you want to analyse more complex relationships you could walk up the syntactic tree to deduce the distance between the words. |
Hi everyone, I have similar questions as that of ramonrod as to how to generate the training data.I was successful in training the model for new entities on one type of entity where the training data was manually tagged as shown in the docs but however this would be tedious if I were to train it for 1000 or more examples. I wanted to know if there was a better way in doing them / other formats in which I can annotate them . I am new to NLP and Spacy and would appreciate the help. |
Have you figured out any way how to automate the annotation of training data. I would like to train my model to recognize the finance/technology related word. I have some job names like 'capex', 'DefCalc' etc. I want my model to recognize these entities. Appreciate the help ! Thanks, |
Hi all,
Hope this can help you. |
Hi Could you please help in making the training data. It was commented that he has managed to import the data using text file and training on top of it. Will this works? |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Hello,
I have been trying to train a model with the same method as #887 is using, just for a test case.
I have a question, what would be the best format for a training corpus to import in spacy. I have a text-file with a list of of entities that requires new entities for tagging.
Let me explain my case, I follow the update.training script like this:
I add my training data as entity_offsets:
This is working fine for the one example and new entity tag. Obviously I want to be able to add more than one example. The Idea is to create a text file with tagged sentences, the question is what format does spacy needs for training data, should I keep with entity_offset from the examples (this will be a very tedious task for 1000's of sentences) or is there another method to prepare the file, like:
And how can I pass the corpus in spcay using the mentioned method? Do I have to use the new created model or can I add the new entities to the old model, how can this be achieved?
Thanks
Your Environment
The text was updated successfully, but these errors were encountered: