concepCy
is a spaCy wrapper for ConceptNet, a freely-available semantic network designed to
help computers understand the meaning of words.
concepCy
allows you to query ConceptNet.io to extract word meanings directly from the
resource itself.
You can install concepCy
via pip:
pip install concepcy
Alternatively you can directly clone the repository and install it using poetry by running the following:
git clone https://github.com/JulesBelveze/concepcy.git
cd concepcy
poetry install
To get started you need to install of one the pre-trained spaCy model available here.
In ConceptNet
words are represented as Node
and relations between words as Edge
.
The Node
object contains the following attributes:
id
: where you can look up all the information about that wordlabel
: which may be a more complete phrase such as "an example" instead of just the word "example" that appears in the URI.language
: code for what language thelabel
is interm
: a link to the most general version of this term. In many cases this is just the same URI.
The Edge
object features the following attributes:
start
: startingNode
end
: endingNode
relation
: name of the relation for those two nodestext
: some of ConceptNet's data is extracted from text,text
shows you what this text wasweight
: how believable the information is
In this case we will simply be interested in the RelatedTo relations between words.
import spacy
import concepcy
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("concepcy")
doc = nlp("WHO is a lovely company")
# Access all the "RelatedTo" relations from the Doc
print("--- All the 'RelatedTo' relations from the Doc ---")
for word, relations in doc._.relatedto.items():
print(f"Word: '{word}'\n{relations}")
# Access the "RelatedTo" relations word by word
print("--- The 'RelatedTo' relations word by word ---")
for token in doc:
print(f"Word: '{token}'\n{token._.relatedto}\n")
--- All the 'RelatedTo' relations from the Doc ---
Word: 'company'
[{'start': {'id': '/c/en/company', 'type': 'Node', 'label': 'company', 'language': 'en', 'term': '/c/en/company'}, 'end': {'id': '/c/en/business', 'type': 'Node', 'label': 'business', 'language': 'en', 'term': '/c/en/business'}, 'relation': 'RelatedTo', 'text': '[[company]] is related to [[business]]', 'weight': 6.424017434596516}, {'start': {'id': '/c/en/company', 'type': 'Node', 'label': 'company', 'language': 'en', 'term': '/c/en/company'}, 'end': {'id': '/c/en/corporation', 'type': 'Node', 'label': 'corporation', 'language': 'en', 'term': '/c/en/corporation'}, 'relation': 'RelatedTo', 'text': '[[company]] is related to [[corporation]]', 'weight': 4.432155231938521}, {'start': {'id': '/c/en/company', 'type': 'Node', 'label': 'company', 'language': 'en', 'term': '/c/en/company'}, 'end': {'id': '/c/en/organization', 'type': 'Node', 'label': 'organization', 'language': 'en', 'term': '/c/en/organization'}, 'relation': 'RelatedTo', 'text': '[[company]] is related to [[organization]]', 'weight': 4.259107887809371}]
--- The 'RelatedTo' relations word by word ---
Word: 'WHO'
[]
Word: 'is'
[]
Word: 'a'
[]
Word: 'lovely'
[]
Word: 'company'
[{'start': {'id': '/c/en/company', 'type': 'Node', 'label': 'company', 'language': 'en', 'term': '/c/en/company'}, 'end': {'id': '/c/en/business', 'type': 'Node', 'label': 'business', 'language': 'en', 'term': '/c/en/business'}, 'relation': 'RelatedTo', 'text': '[[company]] is related to [[business]]', 'weight': 6.424017434596516}, {'start': {'id': '/c/en/company', 'type': 'Node', 'label': 'company', 'language': 'en', 'term': '/c/en/company'}, 'end': {'id': '/c/en/corporation', 'type': 'Node', 'label': 'corporation', 'language': 'en', 'term': '/c/en/corporation'}, 'relation': 'RelatedTo', 'text': '[[company]] is related to [[corporation]]', 'weight': 4.432155231938521}, {'start': {'id': '/c/en/company', 'type': 'Node', 'label': 'company', 'language': 'en', 'term': '/c/en/company'}, 'end': {'id': '/c/en/organization', 'type': 'Node', 'label': 'organization', 'language': 'en', 'term': '/c/en/organization'}, 'relation': 'RelatedTo', 'text': '[[company]] is related to [[organization]]', 'weight': 4.259107887809371}]
One can customize the concepcy
wrapper by changing the default value of the config. The two parameters of interest
are:
relations_of_interest: List[str]
: ConceptNet currently support 34 word-relations. Some of them might not be needed for your use case. To only keep the ones needed pass a list of all the relations you want to keep (see all relations available here). Each relation then becomes an extension.filter_edge_fct: Callable[Edge]
: Conceptnet is a crowd-sourced resource, meaning that some information might be more relevant than others. To only keep reliable relations you can pass a function that will take anEdge
as input and will return a boolean indicating whether to filter that edge or not.
import spacy
import concepcy
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe(
"concepcy",
config={
"relations_of_interest": ["MotivatedByGoal", "CapableOf"],
"filter_edge_weight": 3.0,
"filter_missing_text": True,
"as_dict": False
}
)
📄 The whole documentation along with design decisions and examples can be found here
🎮 A simple demo on how to use concepCy can be found here