Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Entity Recognition #937

Closed
ghost opened this issue Mar 29, 2017 · 2 comments
Closed

New Entity Recognition #937

ghost opened this issue Mar 29, 2017 · 2 comments
Labels
usage General spaCy usage

Comments

@ghost
Copy link

ghost commented Mar 29, 2017

Hi, I tried to add and use new entities.
Here is my code.

`
import spacy

nlp = spacy.load('en')

def merge_phrases(matcher, doc, i, matches):
'''
Merge a phrase. We have to be careful here because we'll change the token indices.
To avoid problems, merge all the phrases once we're called on the last match.
'''
if i != len(matches)-1:
return None
spans = [(ent_id, label, doc[start : end]) for ent_id, label, start, end in matches]
for ent_id, label, span in spans:
span.merge('NNP' if label else span.root.tag_, span.text, nlp.vocab.strings[label])

matcher = spacy.matcher.Matcher(nlp.vocab)
matcher.add(entity_key='company-transocean', label='company', attrs={}, specs=[[{spacy.attrs.ORTH: 'Transocean Ltd'}]], on_match=merge_phrases)
matcher.add(entity_key='company-transocean-ltd', label='company', attrs={}, specs=[[{spacy.attrs.ORTH: 'Transocean'}]], on_match=merge_phrases)
doc = nlp(u"""Tell me about Macys Inc in Japan and about Transocean Ltd.""")
matcher(doc)
print(['%s|%s' % (t.orth_, t.ent_type_) for t in doc])

`

output

['Tell|', 'me|', 'about|', 'Macys|ORG', 'Inc|ORG', 'in|', 'Japan|GPE', 'and|', 'about|', 'Transocean|company', 'Ltd.|ORG']

It's start to work but not as i expect

And i have 2 questions

  1. i want to put 2 types of name of the same company "Transocean Ltd" and "Transocean" it's the same company but system recognized only "Transocean" and think that Ltd. is separate. I want only Transocean Ltd|Company
  2. How to save it, that in new start of script, spacy can use all new added entities, because i don't want to load new entities all the time when script starts
@honnibal honnibal added the usage General spaCy usage label Apr 16, 2017
@ines
Copy link
Member

ines commented Apr 16, 2017

The new version 1.8.0 comes with bug fixes to the NER training procedure and a new save_to_directory() method. We've also updated the docs with more information on training and NER training in particular:

I hope this helps!

@ines ines closed this as completed Apr 16, 2017
@lock
Copy link

lock bot commented May 8, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
usage General spaCy usage
Projects
None yet
Development

No branches or pull requests

2 participants