Skip to content

pyjanitor visualization using RDF logs - proof of concept

License

Notifications You must be signed in to change notification settings

asmirnov69/pyjviz-poc

Repository files navigation

pyjviz-poc

this repo is built to describe some new features proposed for pyjanitor:

  • be able to generate the diagrams of method chains runtime behavior. it is done via introduction of mechanism to collect the data during pyjanitor chained method calls. The data collected are to be stored as RDF/turtle file. Later it is possible to use provided visualisation scripts to present collected trace as diagram - code example --> diagram

  • to introduce CMP (Chained Methods Pipe). CMP is the class with purpose to provide additional information to pyjanitor chained method calls. Proposed above pyjanitor visualization will put CMP to immediate use - code example --> diagram

The implementaion uses pyjanitor code copied intact to pyjviz-poc/janitor directory so few chosen pyjanitor examples will be working. register.py from pandas_flavor was also copied and modified to have required rdf logging functionality. pyjviz-poc/janitor/pyj*.py and pyjviz-poc/janitor/functions/pandas_builtin_overrides.py are new files with proposed functionality implementation.

Immediate focus is to have png diagram files generation working using rdflib with provided SPARQL and graphviz. There is a bigger idea of rdf logs and similarily collected data to be stored in graph database. This would be database of research&production activity which uses SPARQL and/or opencypher to provide the way to connect collected data to other knowledge graph systems (e.g. Obsidian).

How to run examples

For venv users it is possible to use this sequence:

python3 -m venv ~/venv/pyjviz
source ~/venv/pyjviz/bin/active

cd pyjviz-poc
pip install -r requirements.txt

After that one can run scripts from pyjviz-poc/scripts directory

cd scripts # pwd should show 'pyjviz-poc/scripts'

python a0.py
python dirty-clean.py
python dirty-clean-w-cmp.py
python conditional_join.py
python conditional_join_w_cmp.py

Results will be on the screen and in directory pyjviz-poc/scripts/rdflog

You should be able to run notebooks in pyjviz-poc/notebooks using packages listed in pyjviz-poc/requirements.txt Diagram will be shown at the last cell output.

Install commands for jupyterlab which were working for me (after pip install -r requirements.txt):

cd pyjviz-poc
pip install jupyterlab
jupyter lab

About

pyjanitor visualization using RDF logs - proof of concept

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published