Add indexes to grouped entity NER pipeline #5676

prithvikannan · 2020-07-11T03:51:52Z

🚀 Feature request

There should be indexes in the output of the grouped entity NER pipeline

The standard NER pipeline from transformers outputs entities that contain the word, score, entity type, and index. The following snippet demonstrates the normal behavior of the NER pipeline with the default grouped_entities=False option.

from transformers import pipeline
nlp_without_grouping = pipeline("ner")
sequence = "Hugging Face Inc. is a company based in New York City."
print(nlp_without_grouping(sequence))

[
    {'word': 'Hu', 'score': 0.9992662668228149, 'entity': 'I-ORG', 'index': 1},
    {'word': '##gging', 'score': 0.9808881878852844, 'entity': 'I-ORG', 'index': 2},
    {'word': 'Face', 'score': 0.9953625202178955, 'entity': 'I-ORG', 'index': 3},
    {'word': 'Inc', 'score': 0.9993382096290588, 'entity': 'I-ORG', 'index': 4},
    {'word': 'New', 'score': 0.9990268349647522, 'entity': 'I-LOC', 'index': 11},
    {'word': 'York', 'score': 0.9988483190536499, 'entity': 'I-LOC', 'index': 12},
    {'word': 'City', 'score': 0.9991773366928101, 'entity': 'I-LOC', 'index': 13}
]

However, the NER pipeline with grouped_entities=True outputs only word, score, and entity type. Here's the code snippet and output. There's also the problem of 'New York City' being duplicated, but I will address that in a new issue.

from transformers import pipeline
nlp_with_grouping = pipeline("ner", grouped_entities=True) 
sequence = "Hugging Face Inc. is a company based in New York City."
print(nlp_with_grouping(sequence))

[
    {'entity_group': 'I-ORG', 'score': 0.9937137961387634, 'word': 'Hugging Face Inc'},
    {'entity_group': 'I-LOC', 'score': 0.9990174969037374, 'word': 'New York City'},
    {'entity_group': 'I-LOC', 'score': 0.9990174969037374, 'word': 'New York City'}
]

I believe that the grouped entities returned should also include the tokens of the entities. Sample output would look as such

[
    {'entity_group': 'I-ORG', 'score': 0.9930560886859894, 'word': 'Hugging Face Inc', 'indexes': [1, 2, 3, 4]},
    {'entity_group': 'I-LOC', 'score': 0.998809814453125, 'word': 'New York City', 'indexes': [11, 12, 13]},
    {'entity_group': 'I-LOC', 'score': 0.998809814453125, 'word': 'New York City', 'indexes': [11, 12, 13]}
]

Motivation

Any application that requires users to locate grouped named entities would require some sort of index. This feature is present in the standard NER pipeline and should also exist in the grouped entity NER pipeline as well.

In my case, I am trying to append the type to the text right after the named entity ("Apple" would become "Apple <I-ORG>") so I need to be able to locate the named entity within my phrase.

Your contribution

I have been able to fix this by adding two lines to group_sub_entities function

transformers/src/transformers/pipelines.py

Line 1042 in 7fad617

def group_sub_entities(self, entities: List[dict]) -> dict:

    def group_sub_entities(self, entities: List[dict]) -> dict:
        """
        Returns grouped sub entities
        """
        # Get the first entity in the entity group
        entity = entities[0]["entity"]
        scores = np.mean([entity["score"] for entity in entities])
        tokens = [entity["word"] for entity in entities]
        indexes = [entity["index"] for entity in entities]    # my added line

        entity_group = {
            "entity_group": entity,
            "score": np.mean(scores),
            "word": self.tokenizer.convert_tokens_to_string(tokens),
            "indexes": indexes    # my added line
        }
        return entity_group

The text was updated successfully, but these errors were encountered:

stale · 2020-09-11T02:02:19Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

sasi143 · 2020-09-15T08:14:07Z

I am facing the same issue, Does this issue got fixed

stale · 2020-11-14T09:26:26Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Narsil · 2020-12-08T13:05:32Z

Fixed by #8781

prithvikannan mentioned this issue Jul 11, 2020

[WIP] Added indexes in grouped entity NER #5677

Closed

stale bot added the wontfix label Sep 11, 2020

stale bot removed the wontfix label Sep 15, 2020

stale bot added the wontfix label Nov 14, 2020

stale bot closed this as completed Nov 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add indexes to grouped entity NER pipeline #5676

Add indexes to grouped entity NER pipeline #5676

prithvikannan commented Jul 11, 2020 •

edited

Loading

stale bot commented Sep 11, 2020

sasi143 commented Sep 15, 2020

stale bot commented Nov 14, 2020

Narsil commented Dec 8, 2020

Add indexes to grouped entity NER pipeline #5676

Add indexes to grouped entity NER pipeline #5676

Comments

prithvikannan commented Jul 11, 2020 • edited Loading

🚀 Feature request

Motivation

Your contribution

stale bot commented Sep 11, 2020

sasi143 commented Sep 15, 2020

stale bot commented Nov 14, 2020

Narsil commented Dec 8, 2020

prithvikannan commented Jul 11, 2020 •

edited

Loading