Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different performance od displacy and spacy #933

Closed
ghost opened this issue Mar 27, 2017 · 5 comments
Closed

Different performance od displacy and spacy #933

ghost opened this issue Mar 27, 2017 · 5 comments
Labels
lang / en English language data and models models Issues related to the statistical models

Comments

@ghost
Copy link

ghost commented Mar 27, 2017

I have noticed a peculiar thing with displacy. When I give it the sentence: 'a man has been driving a car on the road', the root of the sentence is verb 'driving', which is correct. However, spacy says that the root is 'been', and 'driving' is treated as xcompl to 'been', just like 'walking' would be in: 'the man loves walking down the street'. This is clearly wrong.

I noticed that displacy uses version 1.0.1 of the models, and my spacy uses 1.2.0. Could it be that 1.2.0 is making these mistakes while 1.0.1 is not?

@honnibal
Copy link
Member

Two things:

  1. Are you using the sm or the md model? The sm model is 50mb and scores around 89.5 UAS on the OntoNotes 5 benchmark when no gold-standard information is available. The md model is scoring 90.6.
  1. There was a 0.2% regression in UAS in the md model in version 1.7, I think from from a recently fixed lemmatization issue. I doubt this small accuracy difference is the root cause here.

  2. Different models might make different mistakes. I think the mistake you're pointing to here does look a bit suspicious, but it's hard to say whether something's really wrong.

@ghost
Copy link
Author

ghost commented Mar 27, 2017

I used the md model. I also tried the sm model, and they performed the same. What model does displaCy use? Because it returns the correct parse in this case, and it distinguishes from the 'the man loves walking down the street' case.

@honnibal
Copy link
Member

displaCy is using the 1.1.0 model, which is only compatible up to 1.6.0. I'd be interested to hear whether you're seeing a general degradation in performance between the 1.1.0 model and the 1.2.x series.

@ines ines added docs Documentation and website models Issues related to the statistical models lang / en English language data and models and removed docs Documentation and website models Issues related to the statistical models labels May 13, 2017
@ines
Copy link
Member

ines commented May 13, 2017

Closing this and making #1057 the master issue – work in progress for spaCy v2.0!

@ines ines closed this as completed May 13, 2017
@lock
Copy link

lock bot commented May 8, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lang / en English language data and models models Issues related to the statistical models
Projects
None yet
Development

No branches or pull requests

2 participants