Dependency parser/tagger misidentifies a verb as a noun #1021

anna-hope · 2017-04-26T21:23:42Z

Here is an example input


nlp = spacy.load('en_depent_web_md')
doc = nlp("Does this phone work?")

for token in doc:
    print(token, token.pos_, token.tag_, token.dep_, token.head)
    print()

Here is the output:

Does VERB VBZ ROOT Does

this DET DT det work

phone NOUN NN compound work

work NOUN NN dobj Does

? PUNCT . punct Does

As you can see, spaCy incorrectly classifies work as a noun, which (I assume) leads to the dependency parser failing to label it as the root, and thus misidentifying the root as Does.

You can play with variations of the above input, such as "Will this phone work?" or "Would this phone work?" In all of the above cases, spaCy fails to pull out "work" as the root.

This would be a minor annoyance, but I rely on the dependency parse for a lot of my downstream tasks, and the "{does/will/would} this work" pattern is common for my data. (I can only think of one example where labelling work as a noun would be correct, such as "I did some work on this phone", but that strikes me as a rarer case than the one I have encountered, and that doesn't explain the case of a sentence starting with {would/will}.)

I don't know if this problem lies with the part of speech tagger, which assigns an erroneous tag to work, and thus messes up the dependency parser, or if it's something else.

Would you have any idea about what is causing this? If so, is there a way to fix it without re-training the whole model?

Thanks!

Info about spaCy

spaCy version: 1.8.2
Platform: Linux-4.4.0-43-Microsoft-x86_64-with-Ubuntu-16.04-xenial
Python version: 3.6.1
Installed models: cache, en, en-1.1.0, en_core_web_md, en_default, en_depent_web_md

The text was updated successfully, but these errors were encountered:

anna-hope · 2017-04-26T23:09:03Z

Could be related to #1015.

How many examples would one need to correctly update the pre-trained model?

anna-hope · 2017-04-27T14:49:24Z

I tried the following code, based on the one from #1015, but even after 100,000 iterations I had no luck making it recognise work as a verb:

training_data = [
    ('Will this phone work?', 'MD DT NN VB .'),
    ('Would this phone work?', 'MD DT NN VB .'),
    ('Does this car work?', 'VBZ DT NN VB .'),
    ('This does work', 'DT VBZ VB'),
    ('Can this work?', 'MD DT VB .'),
    ('work', 'VB')
]

def update_tagger(tagger, example):
    orth_text, label_text = example
    doc = nlp.tokenizer(orth_text)
    tags = label_text.split()
    assert len(doc) == len(tags), 'Tokenisation does not match tags for {}'.format(orth_text)
    gold = spacy.gold.GoldParse(doc, tags=tags)
    tagger.update(doc, gold)

def train_tagger(tagger, examples):
    for i in range(100000):
        for example in examples:
            update_tagger(tagger, example)
    tagger.model.end_training()

train_tagger(nlp.tagger, training_data)

At this point, I would prefer it to err on the side of work always being a verb rather than a noun (I understand that such behaviour might be desired in the general case, but it would work for my data).

In the meantime, I've found that if I replace work with some other verb that I'm not likely to see in my data set, like "hasten", I would get the correct dependency parse. But that feels like a very silly workaround.

ines · 2017-05-13T21:16:05Z

Closing this and making #1057 the master issue – work in progress for spaCy v2.0!

lock · 2018-05-08T21:38:49Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

anna-hope changed the title ~~Dependency parser/tagger misidentifies a verb as part of a compound noun~~ Dependency parser/tagger misidentifies a verb as a noun Apr 26, 2017

honnibal added the performance label Apr 27, 2017

ines added models Issues related to the statistical models lang / en English language data and models and removed models Issues related to the statistical models labels May 13, 2017

honnibal mentioned this issue May 13, 2017

💫 Multi-task CNN for parser, tagger and NER #1057

Closed

ines closed this as completed May 13, 2017

anna-hope mentioned this issue Jun 6, 2017

💫 spaCy v2.0.0 alpha – details, feedback & questions (plus stickers!) #1105

Closed

lock bot locked as resolved and limited conversation to collaborators May 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dependency parser/tagger misidentifies a verb as a noun #1021

Dependency parser/tagger misidentifies a verb as a noun #1021

anna-hope commented Apr 26, 2017 •

edited

Loading

anna-hope commented Apr 26, 2017

anna-hope commented Apr 27, 2017 •

edited

Loading

ines commented May 13, 2017

lock bot commented May 8, 2018

Dependency parser/tagger misidentifies a verb as a noun #1021

Dependency parser/tagger misidentifies a verb as a noun #1021

Comments

anna-hope commented Apr 26, 2017 • edited Loading

Info about spaCy

anna-hope commented Apr 26, 2017

anna-hope commented Apr 27, 2017 • edited Loading

ines commented May 13, 2017

lock bot commented May 8, 2018

anna-hope commented Apr 26, 2017 •

edited

Loading

anna-hope commented Apr 27, 2017 •

edited

Loading