-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NER Training does not work when using BILOU tagging #665
Comments
Same here. I get this error:
when executing this:
I've traced the value being passed to
Now this returns an integer, not a string which |
Fixed in v1.8.0 🎉 >>> nlp = spacy.load('en')
>>> from spacy.gold import GoldParse
>>> doc = nlp.make_doc(u'Facebook is a company')
>>> nlp.tagger(doc)
>>> gold = GoldParse(doc, entities=['U-ORG', 'O', 'O', 'O'])
>>> [t for t in gold.ner]
['U-ORG', 'O', 'O', 'O']
>>> nlp.entity.update(doc, gold)
1.0
>>> nlp.entity.update(doc, gold)
1.0
>>> nlp.entity.update(doc, gold)
0.0
>>> nlp.entity(doc)
>>> for ent in doc:
... print(ent.text, ent.label_)
Facebook ORG
... |
I am using spacy 1.9.0 for updating 'en' model with my own tag. Here is a code snippet:
Raw text is like : "view of the fact that the suit "
What's wrong here? Any suggestions? |
Try |
Yes that helped. Thanks!! So add_label takes the new NER tag word, but the training data needs to be provided with "U-" at the start, else there is error of not finding U- or B- at the start of the tags. I ran with a few sample sentences and IOB tags, converted them to iob_to_biluo and that seems to work ok. But when I ran with another training set, I ran into another problem (sorry to keep bugging you).
and the error is:
I checked that number of words in doc and tags in entity_tags are of same number. So there is one-to-one correspondence.
still the error of "index out of range" persisted. Can you guess? |
For experiment, I skipped the Index Exception, using Try block, and allowed non-error sentences to be passed for training.
This obviously does not look very appropriate. Some extra tokens are appearing towards end. Am I missing something, say hyperparamaters? Is it unidirectional or bidirectional RNN/LSTM that is being used? Or just that low volume of training data is causing it? |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
NER training is not working per document/tutorials.
Specifically, offsets do appear to work. entity labels do not appear to work. Also the documentation is in conflict with itself which confuses the situation.
Using the entity-label NER training example
e.g.
As far as I know my syntax is correct. This doesn't work either:
I get the following error:
TypeError Traceback (most recent call last)
in ()
----> 1 ner.update(doc, gold)
/usr/local/lib/python3.5/dist-packages/spacy/syntax/parser.cpython-35m-x86_64-linux-gnu.so in spacy.syntax.parser.Parser.update (spacy/syntax/parser.cpp:7788)()
/usr/local/lib/python3.5/dist-packages/spacy/syntax/ner.cpython-35m-x86_64-linux-gnu.so in spacy.syntax.ner.BiluoPushDown.preprocess_gold (spacy/syntax/ner.cpp:4782)()
/usr/local/lib/python3.5/dist-packages/spacy/syntax/ner.cpython-35m-x86_64-linux-gnu.so in spacy.syntax.ner.BiluoPushDown.lookup_transition (spacy/syntax/ner.cpp:5145)()
TypeError: argument of type 'NoneType' is not iterable
I think this is a bug in GoldParse since offsets appear to work.
e.g.
Also, the documentation is very inconsistent/confusing right now.
Conflicting examples:
example 1 does not work
example 2 works for token offsets, does not work for token-level entity annotation.
example 3 is linked from 1 (as the 'full example'), and they are totally different examples.
Your Environment
Ubuntu
Python 3.5.2
1.2, latest PIP
The text was updated successfully, but these errors were encountered: