class DataContainer{
+UUID uid
+str external_uid
+str label
+set[LinguisticRealisation] linguistic_realisations
class Concept {
class Relation{
+Concept source_concept
+Concept destination_concept
class MetaRelation{
class KnowledgeRepresentation{
+set[Concept] concepts
+set[Relation] relations
+set[MetaRelation] metarelations
class LinguisticRealisation {
+str label
+Enrichment enrichment
+set[Any] corpus_occurrences
+get_docs() set[Spacy.doc]
class ConceptLR{
+set[Spacy.span] corpus_occurrences
class RelationLR{
+set[tuple[Spacy.span, Spacy.span, Spacy.span]] corpus_occurrences
+add_corpus_occurrences(set[tuple[Spacy.span, Spacy.span, Spacy.span]])
class MetaRelationLR{
+set[tuple[Spacy.span, Spacy.span]] corpus_occurrences
+add_corpus_occurrences(set[tuple[Spacy.span, Spacy.span]])
class Enrichment {
+set[str] synonyms
class CandidateTerm {
+str label
+set[Spacy.span] corpus_occurrences
+Enrichment enrichment
class Pipeline{
+list[PipelineComponent] pipeline_components
+list[DataPreprocessing] preprocessing_components
+Spacy.lang spacy_model
+CorpusLoader corpus_loader
+list[Spacy.doc] corpus
+KnowledgeRepresentation kr
+set[CandidateTerm] candidate_terms
class CorpusLoader{
+str corpus_path
-read_corpus() list[str]
class JsonCorpusLoader{
+str json_field
class CsvCorpusLoader{
class PipelineComponent{
+get_performance_report() Dict[str, Any]
class DataPreprocessing{
+dict[str, Any] parameters
+dict[str, Any] options
class ConceptRelationExtraction
class CandidateTermExtraction
class MetarelationExtraction
class CandidateTermEnrichment
class AxiomExtraction
class KnowledgeSource{
+dict[str, Any] parameters
+match_external_concepts(term: CandidateTerm) Set[str]
+enrich_term(term: CandidateTerm)
ConceptRelationExtraction --> KnowledgeSource : uses
CandidateTermEnrichment --> KnowledgeSource : uses
PipelineComponent <|--ConceptRelationExtraction
PipelineComponent <|--CandidateTermExtraction
PipelineComponent <|--MetarelationExtraction
PipelineComponent <|--CandidateTermEnrichment
PipelineComponent <|--AxiomExtraction
Pipeline "1" *-- "0..n" DataPreprocessing
Pipeline "1" *-- "3..n" PipelineComponent
Pipeline "1" o-- "0..n" CandidateTerm
Pipeline "1" *-- "1" KnowledgeRepresentation
Pipeline "1" o-- "0..1" CorpusLoader
CorpusLoader <|-- JsonCorpusLoader
CorpusLoader <|-- CsvCorpusLoader
KnowledgeRepresentation "1" *-- "0..n" Relation
KnowledgeRepresentation "1" *-- "0..n" Concept
KnowledgeRepresentation "1" *-- "0..n" MetaRelation
DataContainer <|-- Concept
DataContainer <|-- Relation
Relation <|-- MetaRelation
DataContainer "1" o-- "0..n" LinguisticRealisation
LinguisticRealisation "1" o-- "0..1" Enrichment
CandidateTerm "1" o-- "0..1" Enrichment
Relation "1" o-- "2" Concept
LinguisticRealisation <|-- ConceptLR
LinguisticRealisation <|-- RelationLR
LinguisticRealisation <|-- MetaRelationLR
Here is how the architecture is made:
The folder
contains all of the functional code.- The
folder contains all the tools used by the different modules. - The
folder contains all the data structure needed for the framework. - The
folder contains algorithms and pipeline structure to build and run an ontology learning model.
- The
The folder
contains demonstrators for specific pipeline components or different ontology learning processes.
About the code structure:
- Generic data structure are identified with
at the end of the filename. - Each new class must be tested and the test file is identified with
at the beginning of the filename.
Every development must be tested.
The tests are launched with the pytest
As we want to be an open source library, we must have a test coverage percentage close to 100%. The following command should be used to evaluate:
pytest --cov-report term-missing --cov=. test
- Main branch must always be functional
- When developing a new feature, a new branch named "feat/{feature_name}" must be created
- When fixing an issue, a new branch named "fix/{feature_name}" must be created and linked to the issue
- Create a merge/pull request to submit the code
Start new developments
git pull
git checkout -b {new_branch}
Submit new developments
git add {files}
git commit -m "{comment}"
git push {origin} {new_branch}
Amend commit and push after a review
git checkout {branch_to_be_merged}
git add {modified_files}
git commit --amend
git push --force-with-lease {origin} {branch_to_be_merged}
Update the code after an amend commit
git checkout {branch_to_be_merged}
git fetch
git reset --hard {origin}/{branch_to_be_merged}
Setting up the virtual environment:
- go to the project root directory:
- create the virtual environment by running
virtualenv -p python3.10 {env/path}
(virtualenv needs to be installed) orpython3 -m venv {env/path}
. As a result, you should have a new folder at your project root. - activate the virtual environment by running
source {env/path}/bin/activate
on Linux{env/path}\Scripts\activate
on Windows
- check that the environment is properly installed by running the test from the project root directory with the command line
pytest test
Setting up the workspace :
- add the
folder to the python paths by adding the full path{path/to/the/project/}ontology-learning/src
to the file:- on Linux add
to the fileontology-learning/venv/lib/python3.10/site-packages/_virtualenv.pth
- on Windows add
to the file{path/to/the/project/}ontology-learning\venv\Lib\site-packages\_virtualenv.pth
- on Linux add
- instead, the following command can also be run each time the project is used :
export PYTHONPATH="${PYTHONPATH}:{path/to/the/project/}ontology-learning/src"
Setting up project dependencies :
- install the project dependencies by running
pip install -r requirements.txt
(from within the virtual environment) - update the requirements after new downloads by running
pip freeze > requirements.txt
Generate documentation :
- move to gh-pages
git checkout gh-pages
- install sphinx via pip
pip install sphinx
- move to docs folder
cd docs
- initialize docs
- go back to the root folder
cd ..
- generate sphinx markdown
sphinx-apidocs -o docs olaf/
- and then generate html pages
cd docs && make html
To host the documentation with with github-pages :
- create a branch named
- enable github pages and set the working branch as
- create an acces-token
- add an environment secret named
with the generated acces token as value - add an environment variable called
with your email address as value - add an environment variable called
with your github username as value