Hello!

The tech behind parts of ZincBase was acquired. This repo is still here for reference, but it is deprecated.

Fortunately, work still goes on. Apart from a couple of fringe bits, the active repo lives here.

The new owner of ZincBase as it is today is ComplexDB.

Alright, you still want to continue

ZincBase is a state of the art knowledge base. It does the following:

Extract facts (aka triples and rules) from unstructured data/text
Store and retrieve those facts efficiently
Build them into a graph
Provide ways to query the graph, including via bleeding-edge graph neural networks.

Zincbase exists to answer questions like "what is the probability that Tom likes LARPing", or "who likes LARPing", or "classify people into LARPers vs normies":

It combines the latest in neural networks with symbolic logic (think expert systems and prolog) and graph search.

View full documentation here.

Quickstart

from zincbase import KB
kb = KB()
kb.store('eats(tom, rice)')
for ans in kb.query('eats(tom, Food)'):
    print(ans['Food']) # prints 'rice'

...
# The included assets/countries_s1_train.csv contains triples like:
# (namibia, locatedin, africa)
# (lithuania, neighbor, poland)

kb = KB()
kb.from_csv('./assets/countries.csv')
kb.build_kg_model(cuda=False, embedding_size=40)
kb.train_kg_model(steps=2000, batch_size=1, verbose=False)
kb.estimate_triple_prob('fiji', 'locatedin', 'melanesia')
0.8467

Requirements

Python 3
Libraries from requirements.txt
GPU preferable for large graphs but not required

Installation

pip install -r requirements.txt

Note: Requirements might differ for PyTorch depending on your system.

Testing

python test/test_main.py
python test/test_graph.py
python test/test_lists.py
python test/test_nn_basic.py
python test/test_nn.py
python test/test_neg_examples.py
python test/test_truthiness.py
python -m doctest zincbase/zincbase.py

Validation

"Countries" and "FB15k" datasets are included in this repo.

There is a script to evaluate that ZincBase gets at least as good performance on the Countries dataset as the original (2019) RotatE paper. From the repo's root directory:

python examples/eval_countries_s3.py

It tests the hardest Countries task and prints out the AUC ROC, which should be ~ 0.95 to match the paper. It takes about 30 minutes to run on a modern GPU.

There is also a script to evaluate performance on FB15k: python examples/fb15k_mrr.py.

Building documentation

From docs/ dir: make html. If something changed a lot: sphinx-apidoc -o . ..

TODO

Add documentation
to_csv method
utilize postgres as backend triple store
The to_csv/from_csv methods do not yet support node attributes.
Add relation extraction from arbitrary unstructured text
Add context to triple - that is interpreted by BERT/ULM/GPT-2 similar and put into an embedding that's concat'd to the KG embedding.
Reinforcement learning for graph traversal.

References & Acknowledgements

Theo Trouillon. Complex-Valued Embedding Models for Knowledge Graphs. Machine Learning[cs.LG]. Université Grenoble Alpes, 2017. English. ffNNT : 2017GREAM048

L334: Computational Syntax and Semantics -- Introduction to Prolog, Steve Harlow

Open Book Project: Prolog in Python, Chris Meyers

Prolog Interpreter in Javascript

RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space, Zhiqing Sun and Zhi-Hong Deng and Jian-Yun Nie and Jian Tang, International Conference on Learning Representations, 2019

Citing

If you use this software, please consider citing:

@software{zincbase,
  author = {{Tom Grek}},
  title = {ZincBase: A state of the art knowledge base},
  url = {https://github.com/tomgrek/zincbase},
  version = {0.1.1},
  date = {2019-05-12}
}

Contributing

See CONTRIBUTING. And please do!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hello!

Alright, you still want to continue

Quickstart

Requirements

Installation

Testing

Validation

Building documentation

TODO

References & Acknowledgements

Citing

Contributing

About

Releases 6

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 114 Commits
.circleci		.circleci
assets		assets
docs		docs
examples		examples
logic		logic
nn		nn
test		test
utils		utils
zincbase		zincbase
.gitignore		.gitignore
.readthedocs.yml		.readthedocs.yml
CONTRIBUTING		CONTRIBUTING
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

tomgrek/zincbase

Folders and files

Latest commit

History

Repository files navigation

Hello!

Alright, you still want to continue

Quickstart

Requirements

Installation

Testing

Validation

Building documentation

TODO

References & Acknowledgements

Citing

Contributing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 6

Packages 0

Languages

Packages