The PharminfoVienna Sandbox is a tool combining data gathering using KNIME and model building of classification models using Jupyter Notebook.
The KNIME workflow (WF) handles all steps of dataset generation automatically, the workflow only needs minimal input which can be configured by using a graphical user interface. The gathered datasets can be downloaded and used for further modeling with the PharminfoVienna Sandbox.
Knime is a free open-source data science software which allows the easy creation of workflows without the need of any programming skills.
- Set up a local ChEMBL instance
- Set up a Python Environment
A detailed description can be found here.
You can download KNIME here. Choose the correct version for your operating system and follow the installation instructions.
Go to File -> Import KNIME Workflow At the appearing Import window, choose Select file and browse to the *.knwf-File
The standardization of the compounds requires certain python libraries therefore KNIME needs to access the previously installed Environment (see Step 0 - Set up Python Environment).
conda create -n sandbox-env python=3.6
conda activate sandbox-env
conda install -c rdkit rdkit=2020.03.2.0
Alternative for Linux:
conda env create -f environment.yml
conda activate sandbox-env
A detailed description including illustrations of the different steps are given in the file: documentation_data_gathering_sandbox.docx