Error from adding arbitrary fixup rules to pipeline #600

cchu613 · 2016-11-02T20:27:44Z

Hello! I'm a newbie to natural language processing and am trying to use spaCy for an information extraction project. So far everything has been great, except that in sentences like "One killed in Bucks County shooting", shooting gets tagged as a verb instead of a noun.

Here is my code (only slightly modified from the tutorial titled Customizing the Pipeline):

def arbitrary_fixup_rules(doc):
    for token in doc:
        if token.lower == u'shooting'
            token.tag_ = u'NN'

def custom_pipeline(nlp):
    return (nlp.tagger, arbitrary_fixup_rules, nlp.parser, nlp.entity)

nlp = spacy.load('en', create_pipeline=custom_pipeline)

However, running

doc = nlp(u'One dead in Bucks County shooting.')

resulted in
AttributeError: attribute 'tag_' of 'spacy.tokens.token.Token' objects is not writable

python 2.7, spacy version 1.1.2

The text was updated successfully, but these errors were encountered:

honnibal · 2016-11-02T21:13:40Z

Hm! There's a gap in the API there — a missing attribute setter. Thanks.

honnibal · 2016-11-02T22:35:01Z

This should be fixed in master. We also noticed a page missing from the docs, which we've just put up.

The missing page describes the API for the tokenizer. It's relevant here because it's another way to do what you want here. The tokenizer.add_special_case() method lets you add a rule saying how to segment some string into component tokens. You can then add custom attributes to these tokens.

For instance, you can do something like this:

nlp.tokenizer.add_special_case('shooting', [{"F": "shooting", "pos": "NN"}])

The attribute keys are currently a bit idiosyncratic. It recognises:

F: The string of the subtoken.
pos: The part-of-speech to assign to the subtoken.
L: The lemma (base form) to assign to the the subtoken.

Soon this will be fixed, and it'll support the same token attributes as the rest of the library.

lock · 2018-05-09T07:38:19Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

honnibal added docs Documentation and website bug Bugs and behaviour differing from documentation labels Nov 2, 2016

ines added a commit that referenced this issue Nov 2, 2016

Add documentation for Tokenizer API (see #600)

2515b32

honnibal closed this as completed Nov 2, 2016

honnibal added a commit that referenced this issue Nov 2, 2016

Test Issue #600

125c910

honnibal added a commit that referenced this issue Nov 2, 2016

Fix Issue #600: Missing setters for Token attribute.

05a8b75

lock bot locked as resolved and limited conversation to collaborators May 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error from adding arbitrary fixup rules to pipeline #600

Error from adding arbitrary fixup rules to pipeline #600

cchu613 commented Nov 2, 2016

honnibal commented Nov 2, 2016

honnibal commented Nov 2, 2016

lock bot commented May 9, 2018

Error from adding arbitrary fixup rules to pipeline #600

Error from adding arbitrary fixup rules to pipeline #600

Comments

cchu613 commented Nov 2, 2016

honnibal commented Nov 2, 2016

honnibal commented Nov 2, 2016

lock bot commented May 9, 2018