Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProtocolMixin: increase flexibility and usability for other packages #678

Merged
merged 2 commits into from
May 7, 2021

Conversation

mbercx
Copy link
Member

@mbercx mbercx commented Apr 20, 2021

Fixes #677

The load_protocol_file method in the protocol utils.py
module uses the path of the module file to obtain the path to the .yaml
file that described the protocols. However, this fails in case the
ProtocolMixin class is used by other packages, since the protocol files
of the package's work chains will be stored in that package.

Moreover, the user can only adapt the protocol by specifying
overrides as a dictionary, or adapting the builder afterwards.

Here we adapt the ProtocolMixin class as follows:

  • Add a get_protocol_filepath() class method that needs to be
    implemented by the work chain classes that use the ProtocolMixin.
  • Move the load_protocol_file method into the ProtocolMixin class as
    a hidden method that relies on the get_protocol_filepath method.
  • The get_protocol_inputs method is made more flexible by accepting
    a pathlib.Path for the overrides input. This allows the user to
    define protocol overrides in a .yaml file.

The Path to the hardcoded protocols that ship with the package are now
also obtained using importlib_resources.

@mbercx mbercx changed the title ProtocolMixinconstruct .yaml file from class path ProtocolMixinconstruct .yaml filepath from class module path Apr 20, 2021
@chrisjsewell
Copy link
Member

Or you could use importlib.resources 😉

@mbercx
Copy link
Member Author

mbercx commented Apr 20, 2021

Or you could use importlib.resources 😉

Nice, wasn't aware of this module 👌. Will have a closer look tomorrow.

@sphuber
Copy link
Contributor

sphuber commented Apr 22, 2021

I am wondering if we want to really make this utility more reusable by other plugins, why not make it more flexible still. The changes are still making quite some assumptions about where protocol files are supposed to be stored. I remember that we were also thinking of passing a completely custom protocol file on call of get_builder_from_protocol so shouldn't we already move to an interface where the path is literally passed in, or can be at least?

@mbercx
Copy link
Member Author

mbercx commented Apr 22, 2021

I remember that we were also thinking of passing a completely custom protocol file on call of get_builder_from_protocol so shouldn't we already move to an interface where the path is literally passed in, or can be at least?

One design question here: Do we add another input argument (e.g. protocol_file_path) that defined a path to a .yaml file that has the same structure as our .yaml files (i.e. that defined multiple protocols), or do we make the protocol input more flexible so it can either define a protocol that is already defined, or a path to a yaml file that then only defines one protocol, e.g. for the PwRelaxWorkChain:

clean_workdir: True
max_meta_convergence_iterations: 5
meta_convergence: True
volume_convergence: 0.02
base:
    pw:
        parameters:
            CELL:
                press_conv_thr: 0.5
base_final_scf:
    pw:
        parameters:
            CELL:
                press_conv_thr: 0.5

I'm leaning towards the latter option, since some who wants to use their own protocol probably isn't looking to define multiple (or can just use multiple files), and it means they only have to replace one input and not consider an extra one.

@sphuber
Copy link
Contributor

sphuber commented Apr 22, 2021

There is also the question whether a custom protocol file that is passed completely replaces existing protocols, or simply adds to it, i.e. the final protocol is the union of the default and custom files taken together. I think that would maybe give the highest flexibility. In that case, I would also not repurpose the protocol argument as a way to specify the path, but simply add an optional argument protocol_filepath. If specified, the overlap of that file and the standard file is taken where the former takes precedence when both protocols specify the same input. I would then also keep the same format for the custom protocol file as the standard one, except the default_protocol and default_inputs keys are optional. One could therefore for example create a file:

protocols:
    moderate:
         kpoints_distance: 0.20

@mbercx
Copy link
Member Author

mbercx commented May 6, 2021

I'm not sure if I fully agree. If the user wants to override some settings of an already existing protocol, the overrides input argument can be used. This also adds another input argument, making the method more complex to use. I suggest the following inputs for determining the protocol:

  • protocol: Union[str, pathlib.Path, None] = None - Specify the protocol to use. This is either a string identifier (e.g. 'moderate', 'fast', 'precise') or a Path to a file that defines the protocol. If not specified, the default protocol will be used.
  • overrides: Union[dict, pathlib.Path, None] = None - optional inputs to override the defaults of the protocol. Can be either a dict or a Path to a .yaml file that contains the overrides.

I think these two inputs cover all use cases, e.g. by repurposing the overrides argument to also accept a Path, your suggested use case is covered, and it's also immediately clear that the inputs defined in this yaml file will override the defaults specified in the protocol input.

@mbercx
Copy link
Member Author

mbercx commented May 6, 2021

After working on this a bit, I've come to the conclusion that allowing the protocol input to also accept a Path adds a lot of complexity to the implementation for relatively little gain. The problem arises when trying to do this for e.g. the PwRelaxWorkChain. The hardcoded protocol file only contains the defaults for the top-level inputs, and overrides for the lower-level PwBaseWorkChains. So either this input protocol file would have to specify which protocol to use for the lower-level work chains along with the overrides, or contain all the defaults for the underlying work chains. And then we'd have to deal with all these cases in the ProtocolMixin class and the get_builder_from_protocol methods.

We could add another protocol_filepath input that has the same structure as the hardcoded file, and then load that one instead. But, then the user would have to specify both this file and the protocol input. If this file would just override the default values, having the ability to pass a Path for the overrides does pretty much the same thing, without the added complexity of starting with:

protocols:
    moderate:

and then having to select the moderate protocol for this to take effect.

A couple of notes:

  • As I mention in Protocols: Consider way of "popping" when overrides is None #653, the overrides don't let us remove input tags, only change them. We could fix this by letting the user set null to remove an input, but this is still a little of a roundabout way of defining your own protocol with a .yaml file.
  • Initially I thought about keeping the logic for constructing the file path to the .yaml files based on the module and putting it in the ProtocolMixin.get_protocol_filepath method, so we wouldn't have to implement it for our work chains here. But then I figured that it might be helpful for others who want to implement protocols to simply see the get_protocol_filepath method (and be able to copy it). Note that removing this method and simply adding a protocol_filepath input to get_protocol_inputs won't do, because then we'd also have to add this input to get_default_protocol and get_available_protocols.

@mbercx mbercx changed the title ProtocolMixinconstruct .yaml filepath from class module path ProtocolMixin: increase flexibility and usability for other packages May 6, 2021
@sphuber
Copy link
Contributor

sphuber commented May 6, 2021

Ok, let's try to separate the various requests in action here. If I understand the current code in this PR, it mostly makes the ProtocolMixin more flexible and actually usable by other WorkChain class by allowing to define where the protocol file is defined. That is already great and should be merged either way I would say.

Then there is the question how the existing protocol files that ship with the package, can be (partially) modified at runtime. I think the use case here is that a user of the aiida-quantumespresso plugin would like to run PwBaseWorkChain.get_builder_from_protocol but use a slightly modified protocol. I am getting a bit confused on what the difference is between the protocols and the overrides. I also see the potential problem that if we have some way of dynamically overriding the protocol of a workchain, that should allow to also override the settings of all lower lying workchains. I don't think it will be feasible for the interface to allow to specify a custom protocol file for each sub workchain that is involved, so instead a single protocol file should do the trick. But then again, I feel like we already have this no? The protocol file of the PwRelax can and does already specify values that override values of the PwBase protocol. Ultimately, it just follows the same structure of the entire spec. With the custom protocol file it should just be "overlaid" onto the base one and wherever they overlap, the custom takes precedence. Won't this be possible and address everything? (The difference between protocol and overrides left aside for now)

@mbercx
Copy link
Member Author

mbercx commented May 6, 2021

Then there is the question how the existing protocol files that ship with the package, can be (partially) modified at runtime. I think the use case here is that a user of the aiida-quantumespresso plugin would like to run PwBaseWorkChain.get_builder_from_protocol but use a slightly modified protocol. I am getting a bit confused on what the difference is between the protocols and the overrides.

To me, what you are describing as slightly modifying is exactly the overrides.

I also see the potential problem that if we have some way of dynamically overriding the protocol of a workchain, that should allow to also override the settings of all lower lying workchains. I don't think it will be feasible for the interface to allow to specify a custom protocol file for each sub workchain that is involved, so instead a single protocol file should do the trick. But then again, I feel like we already have this no? The protocol file of the PwRelax can and does already specify values that override values of the PwBase protocol.

Yes, but our protocol files are a little mixed. On the one hand it specifies defaults for the top-level inputs, and then overrides for the lower-level inputs. Moreover, it is even more complex, since each protocol file also defines two more protocols that again override the values that are defined by the default_protocol. Now, I do agree with this design, since it is more maintainable (i.e. changing a default mixing_beta for the pw.x calculation only has to be changed in one file). But it does mean that we can't have the user supply 1 protocol file that replaces (not overrides) e.g. the moderate protocol without making this file quite complex (i.e. define the protocols for the work chain and all its lower-level work chains).

Ultimately, it just follows the same structure of the entire spec. With the custom protocol file it should just be "overlaid" onto the base one and wherever they overlap, the custom takes precedence. Won't this be possible and address everything? (The difference between protocol and overrides left aside for now)

Again, this is exactly what the overrides do, no? With this PR the user can specify these overrides as a pathlib.Path to a .yaml file.


Let me try and crystallise the difference between a protocol and the overrides somewhat more:

  • A protocol defines a set of default inputs for the work chains.
  • The overrides defines a set of inputs to change for a given protocol.

The 'moderate' protocol is a pure protocol. The 'fast' and 'precise' protocols are only defined as overrides of the 'moderate' protocol. With the current implementation the user can define a new protocol in the sense of the 'fast' or 'precise' protocol, i.e. it can pass a dict or .yaml file that defines overrides for the 'moderate' protocol.

What the user can't do yet, is easily define a pure protocol (which I tried to do by passing a Path to protocol). Now, almost everything can be achieved with overrides (except removing inputs, see #653), but it's a little convoluted that the user always has to define their protocol in relation to the default one, instead of just being able to write their own pure protocol.

When I first looked at how the protocol files were set up, I was a little confused. I was expecting 1 .yaml file for each protocol, that defined the defaults for all the work chains. However, this double overriding strategy made sense to me to avoid duplication and consequently maintenance issues. However, it does make it tricky for the user to just e.g. copy a protocol file and define their own pure protocol that they can run for higher-level work chains.

Sorry for the long-winded explanation. I hope I wasn't being redundant. 😅

@mbercx
Copy link
Member Author

mbercx commented May 7, 2021

@sphuber the changes have been separated in two commits as discussed, and the PR is ready for review!

mbercx added 2 commits May 7, 2021 23:32
The `load_protocol_file` method in the protocol utils.py module uses
the path of the module file to obtain the path to the .yaml file that
described the protocols. However, this fails in case the `ProtocolMixin`
class is used by other packages, since the protocol files of the
package's work chains will be stored in that package.

Here we adapt the `ProtocolMixin` class as follows:

 * Add a `get_protocol_filepath()` class method that needs to be
   implemented by the work chain classes that use the `ProtocolMixin`.

 * Move the `load_protocol_file` method into the `ProtocolMixin` class as
   a hidden method that relies on the `get_protocol_filepath` method.

Moreover, the `Path` to the hardcoded protocols that ship with the package
are now also obtained using `importlib_resources`. It is added as a
dependency as this was backported from Python 3.9.
Currently the user can only adapt the protocol by specifying `overrides`
as a dictionary, or adapting the builder afterwards.

Here the `get_protocol_inputs` method is made more flexible by accepting
a `pathlib.Path` for the `overrides` input. This allows the user to
define protocol overrides in a `.yaml` file.
@sphuber
Copy link
Contributor

sphuber commented May 7, 2021

noticed that you said "allow cutoffs to be passed" in the commit message instead of overrides and couldn't help myself 😬 Don't worry though, I adapted it and the future of humanity is saved

Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good (except for the fumble in one of the commit message, but you have been forgiven) thanks @mbercx

@sphuber sphuber merged commit 5e2ec1f into aiidateam:develop May 7, 2021
@mbercx mbercx deleted the fix/677/get-protocol branch May 8, 2021 06:02
@mbercx
Copy link
Member Author

mbercx commented May 8, 2021

hero

@mbercx
Copy link
Member Author

mbercx commented May 8, 2021

noticed that you said "allow cutoffs to be passed" in the commit message

Working on aiida-pseudo is getting to me. 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ProtocolMixin: don't use __file__ to obtain path to protocol .yaml files
3 participants