Skip to content

Latest commit

 

History

History
136 lines (98 loc) · 6.3 KB

README.md

File metadata and controls

136 lines (98 loc) · 6.3 KB

concepCy

PyPI version github actions docs demo status

concepCy is a spaCy wrapper for ConceptNet, a freely-available semantic network designed to help computers understand the meaning of words.

concepCy allows you to query ConceptNet.io to extract word meanings directly from the resource itself.

Install

You can install concepCy via pip:

pip install concepcy

Alternatively you can directly clone the repository and install it using poetry by running the following:

git clone https://github.com/JulesBelveze/concepcy.git
cd concepcy
poetry install

Getting Started

To get started you need to install of one the pre-trained spaCy model available here.

In ConceptNet words are represented as Node and relations between words as Edge.
The Node object contains the following attributes:

  • id: where you can look up all the information about that word
  • label: which may be a more complete phrase such as "an example" instead of just the word "example" that appears in the URI.
  • language: code for what language the label is in
  • term: a link to the most general version of this term. In many cases this is just the same URI.

The Edge object features the following attributes:

  • start: starting Node
  • end: ending Node
  • relation: name of the relation for those two nodes
  • text: some of ConceptNet's data is extracted from text, text shows you what this text was
  • weight: how believable the information is

Simple start

In this case we will simply be interested in the RelatedTo relations between words.

import spacy
import concepcy

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("concepcy")

doc = nlp("WHO is a lovely company")

# Access all the "RelatedTo" relations from the Doc
print("--- All the 'RelatedTo' relations from the Doc ---")
for word, relations in doc._.relatedto.items():
    print(f"Word: '{word}'\n{relations}")

# Access the "RelatedTo" relations word by word
print("--- The 'RelatedTo' relations word by word ---")
for token in doc:
    print(f"Word: '{token}'\n{token._.relatedto}\n")
--- All the 'RelatedTo' relations from the Doc ---
Word: 'company'
[{'start': {'id': '/c/en/company', 'type': 'Node', 'label': 'company', 'language': 'en', 'term': '/c/en/company'}, 'end': {'id': '/c/en/business', 'type': 'Node', 'label': 'business', 'language': 'en', 'term': '/c/en/business'}, 'relation': 'RelatedTo', 'text': '[[company]] is related to [[business]]', 'weight': 6.424017434596516}, {'start': {'id': '/c/en/company', 'type': 'Node', 'label': 'company', 'language': 'en', 'term': '/c/en/company'}, 'end': {'id': '/c/en/corporation', 'type': 'Node', 'label': 'corporation', 'language': 'en', 'term': '/c/en/corporation'}, 'relation': 'RelatedTo', 'text': '[[company]] is related to [[corporation]]', 'weight': 4.432155231938521}, {'start': {'id': '/c/en/company', 'type': 'Node', 'label': 'company', 'language': 'en', 'term': '/c/en/company'}, 'end': {'id': '/c/en/organization', 'type': 'Node', 'label': 'organization', 'language': 'en', 'term': '/c/en/organization'}, 'relation': 'RelatedTo', 'text': '[[company]] is related to [[organization]]', 'weight': 4.259107887809371}]

--- The 'RelatedTo' relations word by word ---
Word: 'WHO'
[]

Word: 'is'
[]

Word: 'a'
[]

Word: 'lovely'
[]

Word: 'company'
[{'start': {'id': '/c/en/company', 'type': 'Node', 'label': 'company', 'language': 'en', 'term': '/c/en/company'}, 'end': {'id': '/c/en/business', 'type': 'Node', 'label': 'business', 'language': 'en', 'term': '/c/en/business'}, 'relation': 'RelatedTo', 'text': '[[company]] is related to [[business]]', 'weight': 6.424017434596516}, {'start': {'id': '/c/en/company', 'type': 'Node', 'label': 'company', 'language': 'en', 'term': '/c/en/company'}, 'end': {'id': '/c/en/corporation', 'type': 'Node', 'label': 'corporation', 'language': 'en', 'term': '/c/en/corporation'}, 'relation': 'RelatedTo', 'text': '[[company]] is related to [[corporation]]', 'weight': 4.432155231938521}, {'start': {'id': '/c/en/company', 'type': 'Node', 'label': 'company', 'language': 'en', 'term': '/c/en/company'}, 'end': {'id': '/c/en/organization', 'type': 'Node', 'label': 'organization', 'language': 'en', 'term': '/c/en/organization'}, 'relation': 'RelatedTo', 'text': '[[company]] is related to [[organization]]', 'weight': 4.259107887809371}]

Custom configuration

One can customize the concepcy wrapper by changing the default value of the config. The two parameters of interest are:

  • relations_of_interest: List[str]: ConceptNet currently support 34 word-relations. Some of them might not be needed for your use case. To only keep the ones needed pass a list of all the relations you want to keep (see all relations available here). Each relation then becomes an extension.
  • filter_edge_fct: Callable[Edge]: Conceptnet is a crowd-sourced resource, meaning that some information might be more relevant than others. To only keep reliable relations you can pass a function that will take an Edge as input and will return a boolean indicating whether to filter that edge or not.
import spacy
import concepcy

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe(
    "concepcy",
    config={
        "relations_of_interest": ["MotivatedByGoal", "CapableOf"],
        "filter_edge_weight": 3.0,
        "filter_missing_text": True,
        "as_dict": False
    }
)

Documentation 📚

📄 The whole documentation along with design decisions and examples can be found here

🎮 A simple demo on how to use concepCy can be found here

References