This tutorial is developed in connection to the ACS Fall 2022 Symposium "Open-Source Software for Kinetics, Chemical Networks, & Reactor Modeling". CatHub provides an interface to the Surface Reactions database on Catalysis-Hub.org. The module includes a command line interface (in your terminal) as well as a Python interface to access and upload data. In this tutorial you will learn how to access catalysis-hub.org data via the Python interface.
To install CatHub use pip:
pip3 install git+https://github.com/SUNCAT-Center/CatHub.git --upgrade --user
which will install CatHub and all their dependencies.
To test that the cathub cli is working, start by typing in your terminal:
$ cathub --help
and you should see a list of subcommands. If it’s not working you probably have to add the installation path to PATH in your ~/.bashrc
. This would typically be export PATH=~/.local/bin:${PATH}
for Linux, and export PATH~/Library/PythonX.Y/bin:${PATH}
for Mac.
Familiarize yourself with the datasets on the Catalysis-hub webpage http://www.catalysis-hub.org/publications
Data is divided into distinct datasets belonging to a particular publication, that can be queries based on the "pub_id" a unique dataset id constructed from title, first author name and publication year.
Here you will learn how to fetch adsorption energies and plot scaling relations in python. We will use the example of O vs OH scaling from our 2022 transition metal oxides dataset https://www.catalysis-hub.org/publications/ComerUnraveling2022, publication link: https://pubs.acs.org/doi/10.1021/acs.jpcc.2c02381
To connect to the catalysis-hub.org server in your script, start by importing the cathub SQL interface, creating a database connection to the catalysis-hub server.
from cathub.cathubsql import CathubSQL
db = CathubSQL()
Then use the get_dataframe()
method to query adsorption energy data into a pandas dataframe. Basic Syntax for Pandas is found here:
dataframe = db.get_dataframe(pub_id='ComerUnraveling2022')
print(dataframe)
Inspect the dataframe by printing it to your terminal. Main columns consists of the chemicalComposition (chemical formula of the total slab), surface_composition (reduced chemical composition with surface specific tags), equation (equation for the reaction), reaction_energy (which can also be an adsorption energy)
To continue the analysis, please save the dataframe into a pickle file on your local workspace:
dataframe.to_pickle('ComerUnraveling2022.pickle')
Now you can examine the your local file without pulling from the server:
# db = CathubSQL()
# dataframe = db.get_dataframe(pub_id='ComerUnraveling2022')
# dataframe.to_pickle('ComerUnraveling2022.pickle')
import pandas
dataframe = pandas.read_pickle('ComerUnraveling2022.pickle')
Use your favorite python plotting module to plot the OH vs. O scaling (i.e. plotting OH vs O adsorption energies) Start by examining the unique chemical reactions and facets for the dataset, for example:
print(dataframe["equation"].unique().tolist()) # Unique reactions
print(dataframe["facet"].unique().tolist()) # unique facets
Using matplotlib/pylab you can plot the scaling relation like this:
import pylab as p
O_110 = dataframe[(dataframe["equation"] =='H2O(g) - H2(g) + * -> O*') & (dataframe['facet']=='110')]
OH_110 = dataframe[(dataframe["equation"] =='H2O(g) - 0.5H2(g) + * -> HO*' ) &(dataframe['facet']=='110')]
dataframe_together = O_110.merge(OH_110, on='surface_composition',
suffixes=('_O', '_OH'))
x_data = dataframe_together['reaction_energy_OH']
y_data = dataframe_together['reaction_energy_OH']
p.scatter(x_data, y_data)
for i, txt in enumerate(dataframe_together['surface_composition']):
p.gca().annotate(txt,
(x_data[i],
y_data[i]))
p.title('O-OH Scaling relation')
p.show()
Now try to repeat the plot choosing another facet from the dataset.
Example is shown below:
dataframe = db.get_dataframe(reactants={'COgas': 1},
products={'COstar': 1},
elements=['Cu', 'Al'], #contains Cu and Al
#surface_composition='Cu', # match specific composition
facet = '100'
)
Use Python to query atomic structure for a dataset of choice (Please choose a smaller dataset with Nreactions < 500) to save time). In the example below I have chosen (https://www.catalysis-hub.org/publications/AraComputational2022):
from cathub.cathubsql import CathubSQL
from ase.visualize import view
pub_id = 'AraComputational2022'
db = CathubSQL()
atoms_list = db.get_atoms_for_publication(pub_id=pub_id)
dblocal = connect(pub_id + '.db')
for atoms in atoms_list:
dblocal.write(atoms)
view(atoms_list)
You should now see ase gui open with several atomic structures. Also, inspect your db with the ASE from the command line:
$ ase db <dbfile>
$ ase gui <dbfile>
Next, try to query structures into the pandas dataframe where atomic structures and reaction energies are connected.
db = CathubSQL()
dataframe = db.get_dataframe(pub_id=pub_id,
include_atoms=True)
dataframe.to_pickle(pub_id + '_with_atoms.pickle')
You can view atoms for a specific reaction row by choosing a reaction row in the script below. Atoms objects include empty surface, surface with adsorbate, gas phase molecules and (for some datasets) the bulk geometry.
dataframe = pandas.read_pickle(pub_id + '_with_atoms.pickle')
print(dataframe[['chemical_composition', 'equation', 'atoms_name']])
row_id = 4
view(dataframe['atoms'][row_id])
Notice that the "atoms_name" column contain names of geometries, so that specific types of structures can be fetch systematically. For example, to query only OH adsorption geometries, try this:
atoms_list_OH = []
for id, row in dataframe.iterrows():
if not 'HOstar' in row['atoms_name']:
continue
index = row['atoms_name'].index('HOstar')
atoms_list_OH += [row['atoms'][index]]
view(atoms_list_OH)
Query structures for a specific reaction directly from the server based on the "reaction_id":
atoms_list = db.get_atoms_for_reaction(reaction_id=???)
tip: The reaction_id's can be found in the pandas dataframe.
In this exercise we will access experimental datasets on the catalysis-hub server (only available through Python API so far)
For this exercise you will only need the cathub cli. Go to your terminal and type:
$ cathub exp
And you should get a pop-up opening in your browser. The table lists the experimental datasets, where you can select the pub_id see more details:
$ cathub exp <pub_id>
Query the experimental database in Python, using public access password:
from cathub.experimental.data_interface import *
DB = ExpSQL(user='expvisitor', password='99Ny81eG') # read only access
dataframe = DB.get_dataframe(table=?, pub_id=?)
where experimental tables include: material, sample, xps, xrd and echemical
Follow the guidelines for data upload on the main github page: https://github.com/SUNCAT-Center/CatHub/blob/master/README.md, and the full documentation: http://docs.catalysis-hub.org/en/latest/tutorials/upload.html.
This is particularly for those who has their own DFT calculated adsorption energies. A test dataset can also be found here: https://github.com/SUNCAT-Center/CatHub/tree/master/cathub/tests/aayush/montoya_the_2015 (you can clone or download the entire github repo to get the files locally.)