Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SpaCy NER training example from version 1.5.0 doesn't work in 1.6.0 #773

Closed
ejschoen opened this issue Jan 24, 2017 · 7 comments
Closed
Labels
bug Bugs and behaviour differing from documentation

Comments

@ejschoen
Copy link

I tried to use the training example here:

https://github.com/explosion/spaCy/blob/master/examples/training/train_ner.py

with SpaCy 1.6.0. I get results like this:

Who is Shaka Khan?
Who 1228 554 WP  2
is 474 474 VBZ PERSON 3
Shaka 57550 129921 NNP PERSON 1
Khan 12535 48600 NNP LOC 3
? 482 482 . LOC 3

I like London and Berlin
I 467 570 PRP LOC 3
like 502 502 VBP LOC 1
London 4003 24340 NNP LOC 3
and 470 470 CC PERSON 3
Berlin 11964 60816 NNP PERSON 1

The tagging is odd, and from Khan is recognized as a LOC and Berlin as a PERSON. If I back up to version 1.5.0, the result is as expected:

Who is Shaka Khan?
Who 1228 554 WP  2
is 474 474 VBZ  2
Shaka 57550 129921 NNP PERSON 3
Khan 12535 48600 NNP PERSON 1
? 482 482 .  2

I like London and Berlin
I 467 570 PRP  2
like 502 502 VBP  2
London 4003 24340 NNP LOC 3
and 470 470 CC  2
Berlin 11964 60816 NNP LOC 3

Could this be an issue with the off the shelf English model that spacy.en.download 1.6.0 fetched?

@honnibal honnibal added the bug Bugs and behaviour differing from documentation label Jan 27, 2017
@honnibal
Copy link
Member

TL;DR

I made a bug fix to thinc for 1.6 that's messed up the example, as it's written.

The best fix is to not call .end_training() after updating the model. I'm working on making this less confusing.

What's going on

spaCy 1.x uses the Averaged Perceptron algorithm for all its machine learning. You can read about the algorithm in the POS tagger blog post, where you can also find a straight-forward Python implementation: https://explosion.ai/blog/part-of-speech-pos-tagger-in-python

AP uses the Averaged Parameter Trick for SGD. There are two copies of the weights:

  1. The current weights,
  2. The averaged weights

During training predictions are made with the current weights, and the averaged weights are updated in the background. At the end of training, we swap the current for the averages. This makes a huge difference for most training scenarios.

However, when I wrote the code, I didn't pay much attention to the current use-case of "resuming" training, in order to add another class. I recently fixed a long-standing error in the averaged perceptron code:

After loading a model, Thinc was not initialising the averages to the newly loaded weights. This saves memory, because the averages require another copy of the weights, and also additional book-keeping. The consequence of this bug was that when you updated a feature after resuming training, you wiped the weights that were previously associated with it. This is really bad --- it means that as you train new examples, you're deleting all the information previously associated with it.

I finally fixed this bug in this commit: explosion/thinc@09b030b

The consequence of this is that the correction makes the model behave differently on these small-data example cases.

What's still unclear is, how should we compute an average between the old weights and the new ones? The old weights were trained on about 20 passes over about 80,000 sentences of annotation. So the new 5 passes over 5 examples shouldn't change the weights at all if we take an unbiased average. This seems undesirable.

If you have so little data, it's probably not a good idea to average.

About NER and training more generally (making this the megathread)

#762 , #612 , #701, #665 . Attn: @savvopoulos, @viksit

People are having a lot of pain with training the NER system. Some of the problems are easy to fix --- the current workflow around saving and loading data is pretty bad, and it's made worse by some Python 2/3 unicode save/load bugs in the example scripts.

What's hard to solve is that people seem to want to train the NER system on like, 5 examples. The current algorithm expects more like 5,000. I realise I never wrote this anywhere, and the examples all show five examples. I guess I've been doing this stuff too long, and it's no longer obvious to me what is and isn't obvious. I think has been the root cause of a lot of confusion.

Things will improve with spaCy 2.0 a little bit. You might be able to get a useful model with as little as 500 or 1,000 sentences annotated with a new NER class. Maybe.

We're working on ways to make all of this more efficient. We're working on making annotation projects less expensive and more consistent, and we're working on algorithms that require fewer annotated examples. But there will always be limits.

The thing is...I think most teams should be annotating literally 10,000x as much data as they're currently trying to get away with. You should have at least 1,000 sentences just of evaluation data, that your machine learning model never sees. Otherwise how will you know that your system is working? By typing stuff into it, manually? You wouldn't test your other code like that, would you? :)

@etchen99
Copy link

Are there alternative models that are more robust with respect to smaller datasets? Playing with luis.ai and wit.ai, their NERs seem to handle smaller datasets, but I'm not sure what they're using behind the scenes. Their models retrain pretty quickly, so they're likely not complex.

@honnibal
Copy link
Member

honnibal commented Feb 15, 2017

@etchen99 : Neural network models will do better at this, because we'll be able to use transfer learning --- we can import knowledge from other tasks, about the language in general. That helps a lot when you don't have much data.

But, again: "not much data" is here "a few thousand sentences". I get that people want to train on a few dozen sentences. I think people shouldn't want that.

Annotated data will never not be a part of this type of machine learning, no matter what algorithm you're using --- because you're always going to need evaluation data. That won't ever change. If you're making a few thousand sentences of evaluation data, you may as well make a few thousand more for training.

@badbye
Copy link
Contributor

badbye commented Mar 8, 2017

@honnibal
Thanks for your explanation.

Currently, the example code of training and updating NER in the document only use 2 sentences, which is obviously not enough (I realize it after reading your comment).

I think if you put your explanation in the document, that will be better. Everyone tries to read the doc to learn something, they go to the issues only if they could not find what they want in the doc.

More problems about the example code

  1. How to use the updated NER model?
    Update: find an example here: https://spacy.io/docs/usage/training#train-entity

  2. Seems the example is trying to retrain a NER model, not update the original one?

>>> # after running the example code, it does not work
>>> nlp(u'Who is Chaka Khan?').ents
()

@badbye
Copy link
Contributor

badbye commented Mar 8, 2017

According to this repo, I did find a way to update the original NER model. However, it does not support training new entities.

Example of training to extract the degress:

nlp = spacy.load('en')
ner = nlp.entity
text, tags = (u'B.S. in Mathmatics', [(0, 4, 'DEGREE')])  
doc = nlp.make_doc(text)
gold = GoldParse(doc, entities=tags)
ner.update(doc, gold)
Traceback (most recent call last):
  File "<stdin>", line 8, in <module>
  File "spacy/syntax/parser.pyx", line 247, in spacy.syntax.parser.Parser.update (spacy/syntax/parser.cpp:7892)
  File "spacy/syntax/ner.pyx", line 93, in spacy.syntax.ner.BiluoPushDown.preprocess_gold (spacy/syntax/ner.cpp:4783)
  File "spacy/syntax/ner.pyx", line 123, in spacy.syntax.ner.BiluoPushDown.lookup_transition (spacy/syntax/ner.cpp:5379)
KeyError: u'U-DEGREE'

@honnibal
Copy link
Member

The bugs around this should now be resolved, as of 1.7.3. See further discussion in #910.

Usability around the retrained NER still isn't great, but the situation is improving. This will be fully resolved once:

  1. The docs are improved
  2. The examples are updated
  3. The training CLI interface is finished and documented
  4. The save/load process is easier, once models can be pickled.

All of these things are underway in other threads, so I'll close this one.

@lock
Copy link

lock bot commented May 9, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs and behaviour differing from documentation
Projects
None yet
Development

No branches or pull requests

5 participants