Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Physics based priors #26

Open
peastman opened this issue Jun 23, 2021 · 72 comments
Open

Physics based priors #26

peastman opened this issue Jun 23, 2021 · 72 comments

Comments

@peastman
Copy link
Collaborator

I want to implement some physics based priors. An example would be a Coulomb interaction based on pre-computed partial charges that are stored in the dataset. BasePrior.forward() is supposed to return a list of per-atom energy contributions, but physics based interactions usually do not decompose in that way. It would be much easier if it could just return a total energy for each sample.

What do you recommend as the cleanest way of implementing this?

@PhilippThoelke
Copy link
Collaborator

BasePrior.forward() returns the updated atomwise energy predictions, not only the prior contribution (see the Atomref prior as an example). It would be possible to add the prior divided by the number of atoms to each atomwise prediction but I don't think this is very clean. It might be better to have an atomwise_prior flag in BasePrior which determines whether it should be applied before or after reducing the atomwise predictions. We would then apply the prior model either before (as it is done now) or after the following line, based on this flag: https://github.com/compsciencelab/torchmd-net/blob/7a92b3ef7739da14aa8cdd2d87c8e9462d86cf14/torchmdnet/models/output_modules.py#L73
The flag can then be set through the super().__init__(atomwise_prior=True/False) call of the deriving class.

Feel free to open a PR if you want to have a go at this, otherwise I can also make this change.

@peastman
Copy link
Collaborator Author

Sounds good, I'll add that.

@giadefa giadefa closed this as completed Dec 1, 2021
@peastman
Copy link
Collaborator Author

peastman commented Oct 3, 2022

Sorry for reopening an old issue - this never actually got implemented! I'm about to start on it now. I'll add the atomwise_prior flag as suggested above. I also have a few other questions about the best way of implementing particular potentials.

Physics based potentials will often require extra information. For example, one prior I want to add is the ZBL stopping potential for short range repulsion. It requires knowing the atomic number of every atom. forward() takes an argument z which often corresponds to the atomic number, but not always. For example, in some of my models I define atom types by the combination of element and formal charge, so z is an arbitrary atom type index. I need to separately provide both an atom type and an atomic number.

A more complicated example is Coulomb, which depends on the charge of each atom. That could be a fixed number (formal charge or precomputed partial charge), but in other cases it will itself be calculated as an output of the model.

What's the best way of providing extra information like this that's needed to compute a physics based potential?

@peastman
Copy link
Collaborator Author

peastman commented Oct 3, 2022

Another question: currently it only lets you specify a single prior model, but often you may want more than one. A ZBL potential for short range repulsion, plus Coulomb for long range interactions, plus D4 for dispersion. (SpookyNet includes all of those.) Any thoughts on the best way of handling this?

@peastman peastman reopened this Oct 3, 2022
@peastman
Copy link
Collaborator Author

peastman commented Oct 3, 2022

And another question: how do we want to handle units? Nothing in the current code worries about them, except sometimes for a particular dataset. The inputs and outputs to a model are just numbers that might have any units. But physical potentials involve hardcoded constants like eps_0 whose values depend on units. How should we handle this? Some possibilities include

  • Standardizing on a particular set of units for all models.
  • Standardize on a particular set of units for a particular potential function, and require the user to provide scaling factors for converting positions, energies, etc. to the required units.
  • Implement it at a higher level, where a model specifies what units it uses.
    • We could also make each Dataset specify its units, and conversions between the dataset and model units would happen automatically.

@giadefa
Copy link
Contributor

giadefa commented Oct 4, 2022 via email

@giadefa
Copy link
Contributor

giadefa commented Oct 4, 2022 via email

@peastman
Copy link
Collaborator Author

peastman commented Oct 4, 2022

The code is unit-neutral, whatever you use as input it spits out as output

I know. And that's incompatible with physics based potentials. They involve internal parameters with units. It's impossible to compute them without knowing what units the inputs are in, and what units the outputs are expected to be in.

@peastman
Copy link
Collaborator Author

peastman commented Oct 4, 2022

can you use that single model as a wrapper for all the models that you would like to use? It's just the entry point.

Currently there's a single option in the configuration file, prior_model. It specifies the class of the single prior model to create, whose constructor has to take no arguments except the dataset. We need to be able to specify much more complicated arrangements. For example, "Include both a ZBL potential, using atomic numbers specified in the dataset, and a Coulomb potential, using charges that are generated as an output of the model."

@giadefa
Copy link
Contributor

giadefa commented Oct 4, 2022 via email

@giadefa
Copy link
Contributor

giadefa commented Oct 4, 2022 via email

@peastman
Copy link
Collaborator Author

peastman commented Oct 4, 2022

Perhaps we're thinking about this differently. Currently the file priors.py includes only a single prior model: Atomref, which subtracts a reference energy that is defined in the dataset for each atom type. My understanding is that we want to add more choices that implement common physical models. Someone should be able to specify prior_model: Coulomb in their configuration file, and it will add a Coulomb interaction to whatever model they're trying to train. Is that different from your understanding?

@giadefa
Copy link
Contributor

giadefa commented Oct 4, 2022 via email

@peastman
Copy link
Collaborator Author

peastman commented Oct 4, 2022

Let's consider the problem of adding a Coulomb interaction. To begin with, it can be implemented in several ways.

  • The dataset provides a precomputed charge for every atom. You compute the interaction based on them.
  • The model predicts a charge for every atom. They get used to compute the interaction.
  • The model predicts an electronegativity for every atom. You use them to solve for the charges, which then get used to compute the interaction. This also requires knowing the total charge on every molecule, or even better the formal charge on every atom.

First problem is that I don't think TorchMD-Net really supports multitask models yet? Its models produce a single number for each atom, which is interpreted as energy. We need them to produce multiple values for each atom: both energy and charge, or energy and electronegativity. That might also involve extra terms in the loss function: train the model to reproduce charges specified in the dataset.

Next we need to define a mechanism for datasets to provide the extra information required: partial charges, formal charges, or both.

Then there's the problem of units. The Coulomb code receives positions in some units and it needs to produce an energy in some units. The value of eps_0 depends on the units. It's impossible to calculate the result without knowing the units.

Finally we need to design the user interface for all of this. What options will the user add to their configuration file to request a particular combination of physical terms, calculated in a particular way, trained with a particular loss function? You need to be able to specify things like this: "My dataset reports atomic numbers, formal charges, and partial charges for every atom. I want the model to predict electronegativities, which will be used to compute partial charges. Include a loss term based on how well the predicted charges match the ones in the dataset. Once the charges are computed, add ZBL, Coulomb, and dispersion terms to the final energy."

@giadefa
Copy link
Contributor

giadefa commented Oct 4, 2022 via email

@PhilippThoelke
Copy link
Collaborator

PhilippThoelke commented Oct 4, 2022

Multi-head implementation

Yes, currently models can only have a single output head. I think adding multi-head support could be useful, however, requires changes to the current training interface and poses some important design questions.

For example:
With multiple output heads it should be possible to individually set the standardize argument, the type of output model (i.e. output_model argument), every output head should get its own list of prior models (and in turn list of prior_args), its own unit, a loss weighting factor, a "compute neg. derivatives" flag, exponential moving average coefficients for y and potentially neg_dy and potentially a couple others I'm forgetting now.

If then you want charges to come either from the model or from the dataset it gets even more complicated, indicating which one should be used. You would also need to order the computation of the output heads as you are suggesting that the output of one head could depend on the prediction of another. This relationship would also have to be defined in the config file.

Prior args

We could extend that as we did for the dataset arguments with a prior_args
dictionary

This already exists and currently is a critical piece in order to reconstruct prior models from model checkpoints. If the model contains a prior model, the arguments required for building this model will be stored in the hparams.yaml file as they are set here

args.prior_args = prior.get_init_args()

The load_model function expects prior_args to be set. The prior model itself expects either the dataset object or the prior args to be present to correctly instantiate the object. For example the Atomref prior depends on max_z, which is stored in prior_args but falls back to retrieving this from the dataset object in case max_z is not set:
def __init__(self, max_z=None, dataset=None):
super(Atomref, self).__init__()
if max_z is None and dataset is None:
raise ValueError("Can't instantiate Atomref prior, all arguments are None.")
if dataset is None:
atomref = torch.zeros(max_z, 1)
else:
atomref = dataset.get_atomref()

This is the section of code that reconstructs a prior model from a checkpoint file but that is ignored if it's the first time constructing this model:
if args["prior_model"] and prior_model is None:
assert "prior_args" in args, (
f"Requested prior model {args['prior_model']} but the "
f'arguments are lacking the key "prior_args".'
)
assert hasattr(priors, args["prior_model"]), (
f'Unknown prior model {args["prior_model"]}. '
f'Available models are {", ".join(priors.__all__)}'
)
# instantiate prior model if it was not passed to create_model (i.e. when loading a model)
prior_model = getattr(priors, args["prior_model"])(**args["prior_args"])

In the first instantiated of a model, the prior_model variable will not be None in this context as it is directly passed to the create_model function:
self.model = create_model(self.hparams, prior_model, mean, std)

Defining units

I think the freedom the current unit handling (i.e. none) gives is very powerful. If you want to predict some property you don't have to rely on us supporting units for this property but instead just plug in a dataset with the appropriate label. For self- or unsupervised pretraining we also don't necessarily have a unit that we predict, if for example we mask the atom type of one atom and predict this. To pass unit information to prior models I think it does make sense that datasets have some standardized way of defining their units (if it makes sense to define a unit for a certain dataset). Similarly to how Atomref currently works, it relies on the dataset implementing the get_atomref function. The way that the Atomref prior accesses this function here:

atomref = dataset.get_atomref()

you could in the same way access some kind of get_units function and throw an error if a given dataset doesn't implement that.

@PhilippThoelke
Copy link
Collaborator

And yes, through the config file and model creation/loading you are currently limited to just one prior model, however, technically it would be possible to wrap the model in multiple prior models by just nesting them.

@peastman
Copy link
Collaborator Author

peastman commented Oct 4, 2022

Let's have a chat about it. I'll contact you directly

That would be great if all three of us can meet to discuss it.

You would also need to order the computation of the output heads as you are suggesting that the output of one head could depend on the prediction of another.

I think that's more general than what we need. For Coulomb the two outputs (per-atom energy and per-atom charge) can be predicted independently. Each one gets fed into a calculation that produces a per-sample energy, and the two get added together. They're only linked through the loss function.

@peastman
Copy link
Collaborator Author

peastman commented Oct 5, 2022

The prior model itself expects either the dataset object or the prior args to be present to correctly instantiate the object.

In this case it's not strictly either/or. Some attributes will be retrieved from the dataset (for example, the atomic numbers for atom types). Others always need to be specified in the parameter file (for example, the cutoff distance). The existing code assumes everything can be retrieved from the dataset:

prior = getattr(priors, args.prior_model)(dataset=data.dataset)

Any suggestions on the best way to handle this, so we can pass configuration options to the prior even when first creating it?

@PhilippThoelke
Copy link
Collaborator

PhilippThoelke commented Oct 5, 2022

You could simply pass args to the prior model in the same step as passing the dataset. If needed you could add an initial_prior_args of some sorts that can maybe take a dictionary (similar to how the dataset_arg currently works) with the required parameters for the prior model you want to create.

@peastman
Copy link
Collaborator Author

peastman commented Nov 1, 2022

With ZBL implemented in #134 and D2 in progress in #143, the next question is how to combine multiple priors. What if I want a model to include both ZBL and D2?

The internal changes can be very simple. We can just make TorchMD_Net.prior_model into a list. pre_reduce() and post_reduce() are each called on all the priors in order. So mostly this is a question of how the user specifies the prior models in the config file.

One option is to make prior_model and prior_args into lists:

prior_model:
  - ZBL
  - D2
prior_args:
  - {"cutoff_distance": 4.0, "max_num_neighbors": 50}
  - {"cutoff_distance": 10.0, "max_num_neighbors": 100}

A possibly cleaner option is to combine them:

prior_model:
  - ZBL:
      cutoff_distance: 4.0
      max_num_neighbors: 50
  - D2:
      cutoff_distance: 10.0
      max_num_neighbors: 100

@PhilippThoelke
Copy link
Collaborator

Sounds good! I think version two is way more readable, which I think is what we should be going for in the config files. I reckon both versions will be hard to implement in the CLI?

@peastman
Copy link
Collaborator Author

peastman commented Nov 1, 2022

We would have to come up with a syntax for specifying it on the command line and write our own code for parsing it.

@giadefa
Copy link
Contributor

giadefa commented Nov 2, 2022 via email

@peastman
Copy link
Collaborator Author

peastman commented Nov 2, 2022

Being able to override settings from the command line is still useful, at least for some settings. For example, to train several copies of the same model with different random number seeds. But the prior model is probably one of the less important ones to override from the command line.

@raimis
Copy link
Collaborator

raimis commented Nov 2, 2022

I'm in favour for the second version.

We are already using more structure inputs for the data loaders. Regarding the CLI, we maintain the ability to override simple options like seed.

@peastman
Copy link
Collaborator Author

peastman commented Dec 1, 2022

I'm starting work on a Coulomb prior. For the moment I'm just trying to implement fixed, precomputed partial charges. Later I'll get to charges that are dynamically predicted by the model.

This requires some way of passing in charges. Currently the prior gets invoked as

def post_reduce(self, y, z, pos, batch):

For the current ones that's sufficient because z provides everything you need to know about each atom. For Coulomb it isn't, since two atoms with the same type/element may have different partial charges. We need a way for the dataset to provide charges, and for them to get passed to the model. We probably want to implement this in a generic way: other models/datasets might involve other fields in the future. For example, we could allow forward() to take arbitrary keyword arguments, which are automatically filled in with whatever values the dataset returned.

Any suggestions on the best way of structuring it?

@peastman
Copy link
Collaborator Author

All of them. They need to incorporate the extra arguments into the calculation.

@giadefa
Copy link
Contributor

giadefa commented Feb 13, 2024 via email

@peastman
Copy link
Collaborator Author

I don't know what the best way of incorporating it into the model is. That's why I asked the question:

What would be the best way of incorporating that extra information into the calculation?

I'm hoping someone with experience in the internals of the models will provide useful suggestions.

@guillemsimeon
Copy link
Collaborator

guillemsimeon commented Feb 13, 2024 via email

@guillemsimeon
Copy link
Collaborator

guillemsimeon commented Feb 13, 2024 via email

@peastman
Copy link
Collaborator Author

So all I need to do is change

if q is None:
q = torch.zeros_like(z, device=z.device, dtype=z.dtype)
else:
q = q[batch]

to

 if q is None: 
     q = torch.zeros_like(z, device=z.device, dtype=z.dtype) 
 elif q.shape != z.shape: 
     q = q[batch]

Then it should work correctly with any of the following cases?

  • q is None
  • q has one value per sample
  • q had one value per atom

@guillemsimeon
Copy link
Collaborator

guillemsimeon commented Feb 13, 2024 via email

@peastman
Copy link
Collaborator Author

For partial charges that should be a safe assumption. They're usually between -1 and 1. They can be a bit larger for some ions, but they shouldn't be anywhere close to 10. Total charge could reach 10 in some cases.

I notice that q is only passed to the interaction layers, not to the embedding, meaning if you set num_layers to 0 the charge is ignored. Should it also be passed to the embedding?

@guillemsimeon
Copy link
Collaborator

guillemsimeon commented Feb 14, 2024 via email

@peastman
Copy link
Collaborator Author

Suppose we want to add in multiple values, for example both charge and spin. Can this method be generalized to handle that case? Or do we need to figure out a different way of incorporating them into the calculation?

@guillemsimeon
Copy link
Collaborator

guillemsimeon commented Feb 14, 2024 via email

@peastman
Copy link
Collaborator Author

Do you think it could work to just append the extra values to the embedding vector at

Or is that too simplistic?

@guillemsimeon
Copy link
Collaborator

guillemsimeon commented Feb 14, 2024 via email

@guillemsimeon
Copy link
Collaborator

guillemsimeon commented Feb 14, 2024 via email

@peastman
Copy link
Collaborator Author

I'll try both approaches and see if one works better. The embedding approach appeals to me because you can easily incorporate an arbitrary set of global or per-atom properties. I also have to admit that I don't really understand the logic behind the current approach. You increase the strength of all interactions for positive atoms/molecules and decrease the strength of all interactions for negative atoms/molecules. Why?

One could also take the embedding vector, append the extra values, and then pass it through a linear layer that mixes everything together and reduces it back down to the original length. Or you could do something similar to Cormorant. Instead of learning an embedding vector it learns a matrix that mixes together whatever input values you want for each atom and produces the embedding vector.

@guillemsimeon
Copy link
Collaborator

guillemsimeon commented Feb 14, 2024 via email

@peastman
Copy link
Collaborator Author

I tried using the current method to inject partial charges into the model. The result was not good: it roughly doubled the error. Then I tried the method I described above: append the partial charge to the embedding vector, and use a linear layer to mix it back down to the original length. That worked nicely and gave a good result.

This has the advantage that it easily generalizes to arbitrary numbers of global and per-atom scalar parameters. If there are no extra arguments, there's nothing to append and it skips the linear layer, so the model is unchanged. This same approach could be used for all the models if we want, not just TensorNet.

Would you be open to a PR implementing it?

@giadefa
Copy link
Contributor

giadefa commented Feb 20, 2024 via email

@peastman
Copy link
Collaborator Author

we have results that show that partial charges work better.

Work better than what?

Regarding your changes, I am not sure that they work in terms of maintaining the properties of TN.

They exactly maintain it. If you don't add charges, they don't modify the model in any way.

@giadefa
Copy link
Contributor

giadefa commented Feb 20, 2024 via email

@peastman
Copy link
Collaborator Author

Yes, that's what I'm finding as long as I inject the charges with the method I described. If I do it with the current code, it breaks the model.

@giadefa
Copy link
Contributor

giadefa commented Feb 20, 2024 via email

@peastman
Copy link
Collaborator Author

What method are you using? If it's what's in the code, it doesn't work. If it's something else, please provide the code so I can try it.

@guillemsimeon
Copy link
Collaborator

Hi Peter,

the thing is that trying partial charges in our case (which were Gasteiger ones, btw, yours are fix?) was something we tried out of curiosity after knowing that total charges worked. We believe total charge is the way to proceed, and in fact, to the best of my knowledge, is currently being used as it is in the code now, and it helps the model to work fine when there are simultaneously charged and neutral molecules in the dataset (which can be sometimes confused for degenerate inputs). What I think is that your use case is different than ours, meaning that we have never tried partial charges + Coulomb. We didn't want to rely on partial charges because of their 'arbitrary' nature, as opposed to the real physical observable that total charge is.

Guillem

@guillemsimeon
Copy link
Collaborator

I also forgot to mention that, on top of what I said, relying on partial charges did not seem optimal to us because you need always to use exactly the same method of computation of the partial charges, since otherwise any small change in their value would mean a different energy output. In contrast, total charge is an integer.

@giadefa
Copy link
Contributor

giadefa commented Feb 20, 2024 via email

@peastman
Copy link
Collaborator Author

I'm using Gasteiger charges too. I'll make a PR and you can try it out.

We believe total charge is the way to proceed

I don't see how total charge can possibly work. It's a global property, but the model is based entirely on local computations. As it's doing computations for local regions of the system, knowing the total charge of the system provides no useful information. It has no idea how much of that charge is in the region it's looking at and how much is elsewhere, so there's nothing it can do with it. Instead it needs information about the local charge distribution, which partial charges give it. Perhaps it could work for tiny molecules where every atom is within the cutoff distance of every other atom, but not for anything larger than that.

For comparison, take a look at SpookyNet. It supplements the local computation with some global computation, which allows it to meaningfully use global information.

@giadefa
Copy link
Contributor

giadefa commented Feb 20, 2024 via email

@peastman
Copy link
Collaborator Author

Gasteiger charges don't depend on the version of RDKit. The algorithm was published in 1978 and hasn't changed since.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants