Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to relate the entity to it place in the text? #12

Open
ali3assi opened this issue Mar 17, 2022 · 1 comment
Open

How to relate the entity to it place in the text? #12

ali3assi opened this issue Mar 17, 2022 · 1 comment
Labels
documentation Improvements or additions to documentation

Comments

@ali3assi
Copy link

Once I get the annotation of the entities how can get the starting position and ending position in the text. So I want to relate the text to its corresponding entity.

I do the following:

for ent in doc.ents:
            print(ent.text, ent.start_char-ent.sent.start_char, ent.end_char-ent.sent.start_char, ent.label_)

But I get the following exception:

Traceback (most recent call last):
  File "C:\Users\Admin\miniconda3\envs\projet1\lib\tkinter\__init__.py", line 1892, in __call__
    return self.func(*args)
  File "C:\Users\Admin\Documents\codePython\dbpedia\index.py", line 71, in <lambda>
    display_annotate = Button(root, height = 2, width = 20, text ="Annotate text", command = lambda:take_input()) 
  File "C:\Users\Admin\Documents\codePython\dbpedia\index.py", line 15, in take_input
    logger.warning(annotate(text_to_annotate))
  File "C:\Users\Admin\Documents\codePython\dbpedia\index.py", line 57, in annotate
    print(ent.text, ent.start_char-ent.sent.start_char, ent.end_char-ent.sent.start_char, ent.label_)
  File "spacy\tokens\span.pyx", line 429, in spacy.tokens.span.Span.sent.__get__
ValueError: [E030] Sentence boundaries unset. You can add the 'sentencizer' component to the pipeline with: `nlp.add_pipe('sentencizer')`. Alternatively, add the dependency parser or sentence recognizer, or set sentence boundaries by setting `doc[i].is_sent_start`.
@MartinoMensio
Copy link
Owner

Hi @ali3assi,
The error you are mentioning happens because by default the blank pipelines don't load the sentencizer.
You can do the following:

import spacy
nlp = spacy.blank('en')
nlp.add_pipe('sentencizer')
nlp.add_pipe('dbpedia_spotlight')
doc = nlp("This is an example text. Let's mention Natural Language Processing")
for ent in doc.ents:
    print(ent.text, ent.start_char-ent.sent.start_char, ent.end_char-ent.sent.start_char, ent.label_)
# Natural Language Processing 14 41 DBPEDIA_ENT

Or in alternative load one of the models that already load the sentencizer:

import spacy
# this needs to be installed https://spacy.io/models/en#en_core_web_sm
nlp = spacy.load('en_core_web_sm')

# then the following is the same
nlp.add_pipe('dbpedia_spotlight')
doc = nlp("This is an example text. Let's mention Natural Language Processing")
for ent in doc.ents:
    print(ent.text, ent.start_char-ent.sent.start_char, ent.end_char-ent.sent.start_char, ent.label_)
# Natural Language Processing 14 41 DBPEDIA_ENT

@MartinoMensio MartinoMensio added the documentation Improvements or additions to documentation label Feb 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants