Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrepency between self.pipeline and self.pipe_names in language.py #1911

Closed
ahalterman opened this issue Jan 30, 2018 · 3 comments
Closed
Labels
usage General spaCy usage

Comments

@ahalterman
Copy link

I've run into two issues that seem to be caused by differences in a language's pipeline and pipe_names attributes.

The first issue I'm running into is that nlp.pipe_names and nlp.pipeline give different answers for a blank en model.

When I load a new blank en model, getting pipe_names behaves as expected:

>>> nlp.pipe_names
['str']

But when I call nlp.pipeline, it gives me

>>> nlp.pipeline
[('str', 'ner')]

Just to verify, nlp.get_pipe('ner') behaves as expected:

KeyError: "No component 'ner' found in pipeline. Available names: ['str']"

The second issue is when I try to add NER to the pipeline:

>>> nlp.add_pipe("ner")
ValueError: 'str' already exists in pipeline.

The relevant code in language.py seems to be:

  1. line 138, where self.pipeline is initialized as an empty list
  2. line 159 where self.pipe_names is first called.
  3. line 193 where pipe_names is a listification of self.pipeline

Your Environment

  • Operating System: Darwin-16.7.0-x86_64-i386-64bit
  • Python Version Used: 3.6.3
  • spaCy Version Used: 2.0.5
@ines
Copy link
Member

ines commented Jan 30, 2018

Ahhh, this took me a while to figure out, but I think I know what's going on here.

nlp.pipe_names returns a list of strings of the pipeline names. nlp.pipeline returns a list of (name, func) tuples, the pipeline component name and the function. If you add your own function via nlp.add_pipe and don't set a name keyword argument, spaCy uses the following logic to generate a human-readable name from your function (function name, method name, class name etc). So my_custom_pipe will receive the name 'my_custom_pipe'.

In your example, you've called nlp.add_pipe('ner') – but I think what you were trying to do was:

ner = nlp.create_pipe('ner')  # create built-in NER pipeline component
nlp.add_pipe(ner) # add it to the pipeline

Instead, you've accidentally added a string "ner" as the component. Because no custom name was set and str.__name__ is 'str', spaCy chose to call your custom component 'str'. This is why you've ended up with one component ('str', 'ner') (name 'str', component "function" 'ner') in your pipeline.

Anyway, the bottom line is, this shouldn't be possible and spaCy should raise an error if the component you're trying to add is not callable. It's an interesting edge case we didn't consider.

@ines ines added the usage General spaCy usage label Jan 30, 2018
@ines ines closed this as completed in 8901814 Jan 30, 2018
@ahalterman
Copy link
Author

Thanks! Definitely a usage problem on my part.

@lock
Copy link

lock bot commented May 8, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
usage General spaCy usage
Projects
None yet
Development

No branches or pull requests

2 participants