Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Planning and ID'ing the CVs we want to give Restraints #1036

Open
Lnaden opened this issue Jul 13, 2018 · 11 comments
Open

Planning and ID'ing the CVs we want to give Restraints #1036

Lnaden opened this issue Jul 13, 2018 · 11 comments
Milestone

Comments

@Lnaden
Copy link
Contributor

Lnaden commented Jul 13, 2018

This thread is to track the exact variables we want to isolate as we convert the restraints to the new CustomCVForce for easier variables.

The goal is to use each Collective Variable as a tracking for quantities which are hard to extract later. E.g. The distance the 2 centroids are from each other every iteration in RadiallySymmetricRestraints

These CV values are fetched from the Context through a CustomCVForce.getCollectiveVariableValues(context) call.

There will also need to be some engineering changes. We will want to come up with a way to neatly track and report the restraints. The problem is that the multistate module only knows of restraints as it pertains to the analysis, and the knowledge of the CustomCVForce objects does not go past the yank.AlchemicalPhase class. So in order to get the CustomCVForce.getCollectiveVariableValues call to be fetched at run time, we will need to engineer a general CustomCVForce reader for either the mulstistate.MultiStateSampler and multistate.MultiStateReporter, and/or we have to engineer the openmmtools.States module to handle that and then have the MultiStateReporter interface with it. I'd like some feed back here.

As the title says, I'd also like to use this issue to enumerate, explicitly, the CV's we want to create and track for each of the major force classes:

  • RadiallySymmetricRestraints
  • BoreschLike
  • RMSD

These changes will be part of the larger #1015

@Lnaden
Copy link
Contributor Author

Lnaden commented Jul 13, 2018

Re-posting an idea I shared in Slack for discussion

Idea for extracting CV's: We have to tell the MultiStateSampler to cycle through all of its known thermodynamic_state which are passed to it in create, then tell it to search for states which implement some common CV function.

I think when we create the common parent class for the RestraintState (yank) and AlchemicalState (oepnmmtools) we would want to add this call to their interface. I think since its a new OpenMM feature others may find non-restraint uses for the CVs and want a common way to extract them. cc #1015

@Lnaden
Copy link
Contributor Author

Lnaden commented Jul 13, 2018

from @andrrizzi:
Hmm. The collective variable seems to be more relatable to SamplerState rather than the thermodynamic state as it changes with integration. It could be something read by SamplerState.update_from_context().

from me: The problem is that the actual call has to come from the CustomCVForce object which is tied to the ThermodynamicState

@andrrizzi:
Yes, the Context has its own copy of the system though, which is unrelated to the standard system that is stored inside the ThermodynamicState.

Thats a good idea. We would still need to provide some solution to extract CV's (if any) from the sampler state. I do see one possible issue and thats what if there are multiple forces each with their own CV, but tracking them through the same name. How do we id which CV came from which force. And that would mean something external has to track map the CV index per force <-> output from SamplerState.

@andrrizzi
Copy link
Contributor

I do see one possible issue and thats what if there are multiple forces each with their own CV, but tracking them through the same name.

Good point. I guess we could associate the force index to keep them separated, but I'm not sure about what is the best data structure to use to expose this info from the SamplerState. Maybe a dict: collective_variable_name -> [(value1, force_index1), (value2, force_index2)] or something similar?

@Lnaden
Copy link
Contributor Author

Lnaden commented Jul 13, 2018

I think if we leave it up to the SamplerState to fetch the data, then the external application which needs that from the SamplerState should know the mapping back to which CV in which Force, since I think the best the SamplerState can spit out is a list of lists.

system = context.getSystem()
cvs = []
for force in system.getForces():
    try:
        # This returns a list of floats
        cvs.append(force.getCollectiveVariableValues(context))
    except AttributeError:
        cvs.append(None)

# Output is a list of lists of floats. List[List[float]] in typing formalism

edit: It would then be up to the thing receiving that information to know how each of those floats map back

@Lnaden
Copy link
Contributor Author

Lnaden commented Jul 13, 2018

In theory, I guess we could also get the CV name by index and make a formal dict. It really is a question of what object should take ownership of the data formatting, SamplerState or the thing asking SamplerState for the CVs?

@andrrizzi
Copy link
Contributor

I tend to favor the CV-name-indexed dictionary solution because I think that if SamplerState exposes a simple list the user will end up looking for the name of the collective variables himself.

A possible solution could be to implement a way to retrieve the CV value that assumes there is only a single CV with that name (failing when this is not the case) and another more general. Something like:

sampler_state.update_from_context(context)  # Read also CVs.
sampler_state.get_collective_variable('unique_name')  # This returns a float.
try:
    # This raises an exception since there are two CVForces with the same collective variable.
    sampler_state.get_collective_variable('duplicated_name')
except DuplicatedCVError:
    # This returns a list.
    sampler_state.get_collective_variable('duplicated_name', allow_duplicate=True)

but it may be best to try to implement this first and see what problems arise.

@jchodera
Copy link
Member

Note that if we just keep this super simple and store all of the CVs defined in the system, keeping them in order (first ordered by force index, then by CV index), we can store metadata about restraints in the NetCDF file to use to figure out which CV is which parameter of which restraint.

So we could

  • store all the metadata we need about restraints in the NetCDF metadata
  • just store the values of CV 0 through n-1 in the NetCDF file in a collective_variables[n_iterations,n_replicas,n_cvs] variable

Another possibility would be to also store the energy of each restraint for rapid unbiasing, but I'm not sure how easy that would be to add.

@Lnaden
Copy link
Contributor Author

Lnaden commented Jul 19, 2018

The implementation tied to this issue:

choderalab/openmmtools#362

@Lnaden
Copy link
Contributor Author

Lnaden commented Jul 26, 2018

I think we have all the underlying mechanics in OpenMMTools now. So the question still remains: What CV's do we want to extract from each force?

Here is my list:

  • RadiallySymmetricRestraints
    • Distance between 2 centroids
  • BoreschLike:
    • Virtual Bond Distance
    • Virtual Angle Value x2
    • Virtual Dihedral Value x3
  • RMSD
    • RMSD

Anything else?

A related question, do we know the relative performance per CV a CustomCVForce has over a standard force? e.g., the 6 CV's CustomCVForce for the Boresch-like restraints will be what % less efficient than the CustomCompoundBond force it currently is?.

@jchodera
Copy link
Member

I think we have all the underlying mechanics in OpenMMTools now. So the question still remains: What CV's do we want to extract from each force?

We want to extract any CVs that the restraint defines and uses to define its restraint.

Ideally, each restraint class would register its own CVs through some internal API, and these restraints would automatically be stored each iteration and be available for defining the bound state. The restraint energy would also automatically be available for unbiasing.

A related question, do we know the relative performance per CV a CustomCVForce has over a standard force? e.g., the 6 CV's CustomCVForce for the Boresch-like restraints will be what % less efficient than the CustomCompoundBond force it currently is?.

Unless the use of CustomCVForce is pathologically abysmal (like a million times slower than CustomCompoundBond) then it should not make any difference because restraints consume a very tiny fraction of the overall iteration time.

@Lnaden
Copy link
Contributor Author

Lnaden commented Jul 26, 2018

Ideally, each restraint class would register its own CVs through some internal API, and these restraints would automatically be stored each iteration and be available for defining the bound state. The restraint energy would also automatically be available for unbiasing.

Sorry if I was unclear. I understand the what we want to do with the CV's once we get them out, I was just trying to explicitly enumerate exactly what CV's each restraint needs before I start implementing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants