Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite read-in of SAD guess #345

Open
robertodr opened this issue Nov 11, 2020 · 14 comments
Open

Rewrite read-in of SAD guess #345

robertodr opened this issue Nov 11, 2020 · 14 comments
Labels
enhancement good-first-pr A good way to get yourself into MRChem

Comments

@robertodr
Copy link
Contributor

robertodr commented Nov 11, 2020

Currently the SAD guess is read from 2 files:

  • The basis set file in DALTON format (I believe) in O.bas
  • The atomic density in O.dens
    The two can be unified into a single JSON file O.json that contains both the basis and the corresponding atomic density. The basis can be taken directly from the Basis Set Exchange (BSE), see here for 3-21G of oxygen. The density can be computed with any code. The JSON would have to keys:
{
  "basis": { ... },   # <--- copy-paste from the BSE repository,
  "density": [ ... ]  # <--- 1D array with the atomic density.
}

In the C++ code one needs to:

  • Add a method from_json to the AOBasis class to parse the basis as copy-pasted from the BSE.
  • Rework project_density such that it reads the density field of the JSON file.

The advantages of this approach are that:

  • Easier for developers and it's in a standard, human-readable format
  • We are sure that the basis set is correct, since it comes from the BSE
  • One can build its own SAD guesses on top of the ones shipped with MRChem, for example, to use a larger atomic basis. I am not sure how useful this is in general though.

On top of the C++ changes there should be machinery to generate the SAD guesses in JSON format. This can be nicely achieved with Psi4 and the BSE Python library. Note that I haven't tested the following script:

import json

import numpy as np

import basis_set_exchange as bse
import psi4


mol = psi4.geometry("""
    O       0.0  0.0  0.0
    symmetry c1
    noreorient
    no_com
""")

psi4.core.set_active_molecule(mol)

# set options
options = {
    'SCF_TYPE':'PK',
    'E_CONVERGENCE':1e-12,
    'D_CONVERGENCE':1e-12
}

# build custom basis from BSE
def define_custom_basis(mol, role):
    basstrings = {}
    mol.set_basis_all_atoms("3-21G", role=role)
    basstrings['3-21g'] = bse.get_basis("3-21G", elements=[8], fmt="psi4")
    return basstrings

psi4.qcdb.libmintsbasisset.basishorde['3-21G_BSE'] = define_custom_basis
psi4.core.set_global_option("BASIS", "3-21G_BSE")

psi4.set_options(options)

e, wfn = psi4.energy('SCF', return_wfn=True)

# density as NumPy array
density = wfn.Da().to_array()

output = Path("O.json")
with output.open("w") as f:
      json.dump({"basis": bse.get_basis("3-21G", elements=[8], fmt="json"), "density": density.tolist()} , f)
@robertodr robertodr added enhancement good-first-pr A good way to get yourself into MRChem labels Nov 11, 2020
@robertodr
Copy link
Contributor Author

This script will generate basis+density in JSON file for all the elements known to a given basis set:

import argparse
import json
from pathlib import Path

import basis_set_exchange as bse
import numpy as np
import psi4


# build custom basis
def define_custom_basis(name: str, basis: str, molecule, role: str):
    molecule.set_basis_all_atoms(name, role=role)
    return {
        name: basis,
    }


def generate_guess(basis_name):
    # loop over all elements in basis set and run an atomic calculation
    for el, bas in bse.get_basis(basis_name, elements=[8])["elements"].items():
        symbol = bse.lut.element_sym_from_Z(el, normalize=True)
        print(f"==> Running atomic calculation for {symbol}")
        psi4.core.set_output_file(f"sad_{symbol}.out")
        mol = psi4.geometry(
            f"""
            {symbol}       0.0  0.0  0.0
            symmetry c1
            noreorient
            no_com
        """
        )
        psi4.core.set_active_molecule(mol)

        # get basis in Psi4 format
        custom_name = f"{basis_name}_BSE"
        p4b = bse.get_basis(basis_name, elements=[el], fmt="psi4")
        psi4.qcdb.libmintsbasisset.basishorde[
            custom_name
        ] = lambda m, r: define_custom_basis(basis_name, p4b, m, r)

        psi4.set_options(
            {
                "SCF_TYPE": "PK",
                "D_CONVERGENCE": 1e-10,
                "REFERENCE": "UHF",
                "BASIS": custom_name,
            }
        )

        try:
            e, w = psi4.energy("SCF", return_wfn=True)
        except:
            print(f"   !!! Calculation for {symbol} did not converge!")
            continue

        # density as NumPy array
        D = w.Da().to_array() + w.Db().to_array()
        # set elements smaller than 1e-13 in absolute value to zero
        zeroes = np.abs(D) <= 1e-13
        D[zeroes] = 0.0

        output = Path(f"{symbol}.json")
        with output.open("w") as f:
            json.dump(
                {
                    "basis": bas,
                    "density": [x for _ in D.tolist() for x in _],
                },
                f,
                indent=4,
            )

        psi4.core.clean_options()
        psi4.core.clean()


if __name__ == "__main__":
    cli = argparse.ArgumentParser()
    cli.add_argument("basis", help="name of the basis set")

    args = cli.parse_args()

    print(f"Using basis set {args.basis}")
    generate_guess(args.basis)

can be used as:

$ python sad.py 3-21G

@susilehtola
Copy link

FWIW I'm planning a modular atomic solver library compatible with the BSE, which can be integrated in any QM codes for on-the-fly SAD, for instance.

@susilehtola
Copy link

btw what is the "hydrogenic basis" used with the SAD guess?

@stigrj
Copy link
Contributor

stigrj commented Dec 27, 2021

FWIW I'm planning a modular atomic solver library compatible with the BSE, which can be integrated in any QM codes for on-the-fly SAD, for instance.

That sounds great @susilehtola, we'll definitely be looking into that. We need to increase the level of sophistication of our initial guess solver (see next point).

btw what is the "hydrogenic basis" used with the SAD guess?

These are literally the eigenfunctions of the hydrogen atom. Our SAD procedure is the following:

  1. We have stored pre-computed atomic density matrices from LDA calculations with 3-21G
  2. Compute SAD densities from the pre-computed atomic densities on our numerical (multiwavelet) grid
  3. Construct V_coul and V_xc (SVWN5) potentials from this SAD density, also on the numerical grid
  4. Set up an AO basis of hydrogen eigenfunctions for the system of a given zeta quality (SZ, DZ, TZ, QZ), where the AOs are numerically represented on the grid
  5. Compute the Hamiltonian matrix using the SAD potentials in the chosen AO basis (again, everything numerically on the grid)
  6. Diagonalize the matrix and populate the lowest energy MOs
  7. Throw away all virtual orbitals and carry on with the integral equation SCF (only occupied orbitals, no virtuals needed)

@susilehtola
Copy link

OK. Hydrogenic functions are a very bad basis set; you might look into using e.g. my HGBS Gaussian basis sets instead which are available for the whole periodic table and are obtained from simple physical arguments.

I would not use a correlation functional for the SAD guess, since we looked into this question in J. Chem. Phys. 152, 144105 (2020) and found out that exchange-only calculations yield essentially the same - if not better - result.

It sounds like you might also be very interested in another key feature of the atomic solver: getting numerical AOs for the atom.

@susilehtola
Copy link

Another useful feature would be the ability to read in formatted checkpoint files, since they can be produced by a variety of quantum chemistry programs. What would it take to make this happen?

@stigrj
Copy link
Contributor

stigrj commented Dec 28, 2021

Yes, up to this point we have only been concerned with getting somewhere in the vicinity of the solution with our guess, so that our SCF is able to find the correct solution (not always the case, still). We have not bothered yet with optimizing the guess for performance in terms of reducing the number of SCF iterations.

Numerical AOs would be great, combined with SAD or SAP. Will these AOs come in the form of a 1D numerical radial function combined with standard spherical harmonics, or fully 3D numerical? Is the code to generate these available already, and how would the interface look? Can we directly get callable C++ objects that can be evaluated in arbitrary 3D grid points, or would we need some interpretation on our side, like interpolation on radial grid values etc?

@stigrj
Copy link
Contributor

stigrj commented Dec 28, 2021

Reading checkpoint files could be useful, I agree

@susilehtola
Copy link

Numerical AOs would be great, combined with SAD or SAP. Will these AOs come in the form of a 1D numerical radial function combined with standard spherical harmonics, or fully 3D numerical? Is the code to generate these available already, and how would the interface look? Can we directly get callable C++ objects that can be evaluated in arbitrary 3D grid points, or would we need some interpretation on our side, like interpolation on radial grid values etc?

The more I have thought about this, the more it looks like I should just use what I have in HelFEM already, since it appears the design decisions I made early on about the fully flexible LCAO approach is something that is of interest for future efforts as holes in the valence shell will also polarize the core orbitals. I need to refactor the code to make it more general by removing assumptions about the radial basis set; I'll also want Gaussians and Slater-type orbitals and support for ECPs.

For initial guess purposes, the AOs will indeed be of the form psi_{nlm} (r) = R_{nl) (r) Y_{lm} (r), where the radial functions are evaluated within the module. The code will also be able to evaluate the SAP potential Vsap(r) and the electron density n(r) without the need for extra interpolation.

I don't yet know what would be the best strategy to implement SAD; should it be in the library itself or be left outside. The Coulomb matrix is the same in both SAP and SAD and can be easily evaluated by summing the atomic radial potentials. The only difference between the two is the exchange potential, which is linear in SAP (sum of atomic radial potentials) whereas in SAD it's given by the potential of the atom-summed densities at each grid point. Maybe SAD does need to be left to downstream codes, since the grids will be different in each code... However, being able to evaluate the atomic orbitals will still be important, since you need those to construct the numerical minimal basis in many approaches.

HelFEM is in C++ but uses Armadillo; I recently started considering porting it to Eigen3. In order to facilitate the new modular code's inclusion in quantum chemistry packages, it needs to have minimal dependencies (Eigen3 and libxc and/or possibly xcfun), and be callable from C and Fortran at minimum; a Python interface would also be really useful.

@susilehtola
Copy link

Here's a script that uses spherically averaged spin-restricted Hartree-Fock in PySCF with the correct ground-state occupations from finite-element calculations I published a few years ago.

import argparse
import json
from pathlib import Path

import basis_set_exchange as bse
import numpy

import pyscf
from pyscf.scf import atom_hf

def generate_guess(basis_name):
    # loop over all elements in basis set and run an atomic calculation
    for el, bas in bse.get_basis(basis_name)["elements"].items():
        symbol = bse.lut.element_sym_from_Z(el, normalize=True)
        print(f"==> Running atomic calculation for {el=} {symbol}")

        # get basis in NWChem format
        my_basis = bse.get_basis(basis_name, elements=[el], fmt="nwchem")

        # Define "molecule"
        mol = pyscf.M(
            atom = symbol,
            basis = {symbol : pyscf.gto.load(my_basis, symbol)},
            symmetry = True,
            spin = int(el) % 2
        )

        # Run the atomic scf
        atmscf = atom_hf.get_atm_nrhf(mol)
        # Get the total energy and orbitals
        e, mo_e, mo_C, mo_occ = atmscf[symbol]
        # Form density matrix as NumPy array
        D = numpy.dot(mo_C, numpy.dot(numpy.diag(mo_occ), mo_C.T))

        # set elements smaller than 1e-13 in absolute value to zero
        zeroes = numpy.abs(D) <= 1e-13
        D[zeroes] = 0.0

        output = Path(f"{symbol}.json")
        with output.open("w") as f:
            json.dump(
                {
                    "basis": bas,
                    "density": [x for _ in D.tolist() for x in _],
                },
                f,
                indent=4,
            )

if __name__ == "__main__":
    cli = argparse.ArgumentParser()
    cli.add_argument("basis", help="name of the basis set")

    args = cli.parse_args()

    print(f"Using basis set {args.basis}")
    generate_guess(args.basis)

The issue here is that you need to make sure that the basis functions are normalized in the same way and ordered in the same way. This may not be the same in every program. However, the code above uses full spherical averaging, so the densities are correctly spherically symmetric.

I think PySCF needed some hacks to get the heaviest atoms to converge; I needed to make a multi-step algorithm to converge the NRSRHF calculations with the hydrogenic Gaussian basis sets I published a few years ago.

@susilehtola
Copy link

... but I assume the bigger problem at the moment may be the use of the hydrogenic atomic orbitals which are a very incomplete basis set...

@susilehtola
Copy link

Also, the discussion on read-in SCF guesses already turned into a larger question of interoperability between quantum chemistry codes, see the discussion I started on the computational chemistry list. MRChem could be a prime candidate for early adoption, no?

@stigrj
Copy link
Contributor

stigrj commented Jan 1, 2022

Absolutely, I would be very interested in this. An alternative approach for us could be to implement the SAD solver in Python using our MRCPP interface [VAMPyR]](https://github.com/MRChemSoft/vampyr), in combination with some GTO library to get proper AO functionality which is currently missing (or sloppily implemented, at best) in MRChem.

@susilehtola
Copy link

Absolutely, I would be very interested in this. An alternative approach for us could be to implement the SAD solver in Python using our MRCPP interface [VAMPyR]](https://github.com/MRChemSoft/vampyr), in combination with some GTO library to get proper AO functionality which is currently missing (or sloppily implemented, at best) in MRChem.

... but that would require extra development effort.

As an easier alternative, how about implementing SAP in MRChem? I have tabulated LDA potentials in https://github.com/susilehtola/HelFEM/blob/master/src/general/sap.cpp. You should be able to use these in the Helmholtz solver to get decent guess orbitals...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement good-first-pr A good way to get yourself into MRChem
Projects
None yet
Development

No branches or pull requests

3 participants