-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BasePwCpInputGenerator
: Remove superfluous validation
#926
Conversation
@sphuber I just opened this while working on #922. Looking into this raised an interesting question regarding validation. Just a few notes to discuss:
This has already led to issues in #722. Here we wanted to check that a 🤔 To
|
# Check that a pseudo potential was specified for each kind present in the `StructureData` | |
kinds = [kind.name for kind in self.inputs.structure.kinds] | |
if set(kinds) != set(self.inputs.pseudos.keys()): | |
formatted_pseudos = ', '.join(list(self.inputs.pseudos.keys())) | |
formatted_kinds = ', '.join(list(kinds)) | |
raise exceptions.InputValidationError( | |
'Mismatch between the defined pseudos and the list of kinds of the structure.\n' | |
f'Pseudos: {formatted_pseudos};\nKinds: {formatted_kinds}' | |
) |
Since we are already validating this in a proper validator here:
aiida-quantumespresso/src/aiida_quantumespresso/calculations/__init__.py
Lines 161 to 165 in 7f4c4a1
structure_kinds = set(value['structure'].get_kind_names()) | |
pseudo_kinds = set(value['pseudos'].keys()) | |
if structure_kinds != pseudo_kinds: | |
return f'The `pseudos` specified and structure kinds do not match: {pseudo_kinds} vs {structure_kinds}' |
And in accordance with [1], using a validator is preferable. However, I realised that this is only checked in case the structure
input is actually provided. So now I was thinking about what to do.
My first inclination was that a wrapping work chain should take care of the validation. But this would of course lead to a lot of duplication of code. I.e. we don't want that the PwBandsWorkChain
also has to check that the top-level structure elements all have a pseudo defined in each wrapped PwCalculation
. This logic should be contained on the PwCalculation
.
But following [1] and only using the validator means that the validation doesn't happen. At first I then thought that we should perhaps keep the code I'm removing here, so the excepted wrapping process still shows a more informative exception traceback. However, that isn't necessary, since when the wrapping work chain tries to submit
the process, the "proper" validation still occurs (see the long traceback below).
Full Traceback
❯ verdi process report 121
2023-05-05 16:33:48 [8 | REPORT]: [121|PwRelaxWorkChain|on_except]: Traceback (most recent call last):
File "/Users/mbercx/.virtualenvs/qe/lib/python3.9/site-packages/plumpy/process_states.py", line 228, in execute
result = self.run_fn(*self.args, **self.kwargs)
File "/Users/mbercx/.virtualenvs/qe/lib/python3.9/site-packages/aiida/engine/processes/workchains/workchain.py", line 314, in _do_step
finished, stepper_result = self._stepper.step()
File "/Users/mbercx/.virtualenvs/qe/lib/python3.9/site-packages/plumpy/workchains.py", line 295, in step
finished, result = self._child_stepper.step()
File "/Users/mbercx/.virtualenvs/qe/lib/python3.9/site-packages/plumpy/workchains.py", line 538, in step
finished, result = self._child_stepper.step()
File "/Users/mbercx/.virtualenvs/qe/lib/python3.9/site-packages/plumpy/workchains.py", line 295, in step
finished, result = self._child_stepper.step()
File "/Users/mbercx/.virtualenvs/qe/lib/python3.9/site-packages/plumpy/workchains.py", line 246, in step
return True, self._fn(self._workchain)
File "/Users/mbercx/project/qe/code/aiida-quantumespresso/src/aiida_quantumespresso/workflows/pw/relax.py", line 243, in run_relax
running = self.submit(PwBaseWorkChain, **inputs)
File "/Users/mbercx/.virtualenvs/qe/lib/python3.9/site-packages/aiida/engine/processes/process.py", line 534, in submit
return self.runner.submit(process, **kwargs)
File "/Users/mbercx/.virtualenvs/qe/lib/python3.9/site-packages/aiida/engine/runners.py", line 183, in submit
process_inited = self.instantiate_process(process, **inputs)
File "/Users/mbercx/.virtualenvs/qe/lib/python3.9/site-packages/aiida/engine/runners.py", line 169, in instantiate_process
return instantiate_process(self, process, **inputs)
File "/Users/mbercx/.virtualenvs/qe/lib/python3.9/site-packages/aiida/engine/utils.py", line 64, in instantiate_process
process = process_class(runner=runner, inputs=inputs)
File "/Users/mbercx/.virtualenvs/qe/lib/python3.9/site-packages/plumpy/base/state_machine.py", line 194, in __call__
inst.transition_to(inst.create_initial_state())
File "/Users/mbercx/.virtualenvs/qe/lib/python3.9/site-packages/plumpy/base/state_machine.py", line 339, in transition_to
self.transition_failed(initial_state_label, label, *sys.exc_info()[1:])
File "/Users/mbercx/.virtualenvs/qe/lib/python3.9/site-packages/plumpy/processes.py", line 1003, in transition_failed
raise exception.with_traceback(trace)
File "/Users/mbercx/.virtualenvs/qe/lib/python3.9/site-packages/plumpy/base/state_machine.py", line 324, in transition_to
self._enter_next_state(new_state)
File "/Users/mbercx/.virtualenvs/qe/lib/python3.9/site-packages/plumpy/base/state_machine.py", line 384, in _enter_next_state
self._fire_state_event(StateEventHook.ENTERING_STATE, next_state)
File "/Users/mbercx/.virtualenvs/qe/lib/python3.9/site-packages/plumpy/base/state_machine.py", line 300, in _fire_state_event
callback(self, hook, state)
File "/Users/mbercx/.virtualenvs/qe/lib/python3.9/site-packages/plumpy/processes.py", line 329, in <lambda>
lambda _s, _h, state: self.on_entering(cast(process_states.State, state)),
File "/Users/mbercx/.virtualenvs/qe/lib/python3.9/site-packages/plumpy/processes.py", line 683, in on_entering
call_with_super_check(self.on_create)
File "/Users/mbercx/.virtualenvs/qe/lib/python3.9/site-packages/plumpy/base/utils.py", line 29, in call_with_super_check
wrapped(*args, **kwargs)
File "/Users/mbercx/.virtualenvs/qe/lib/python3.9/site-packages/aiida/engine/processes/process.py", line 397, in on_create
super().on_create()
File "/Users/mbercx/.virtualenvs/qe/lib/python3.9/site-packages/plumpy/base/utils.py", line 16, in wrapper
wrapped(self, *args, **kwargs)
File "/Users/mbercx/.virtualenvs/qe/lib/python3.9/site-packages/plumpy/processes.py", line 750, in on_create
raise ValueError(result)
ValueError: Error occurred validating port 'inputs.pw': The `pseudos` specified and structure kinds do not match: {'mama'} vs {'Si'}
2023-05-05 16:33:48 [9 | REPORT]: [121|PwRelaxWorkChain|on_terminated]: remote folders will not be cleaned
So this long rant is basically to reconfirm that:
We should never validate by raising exceptions in the prepare_for_submission
script.
Unless there is anything I'm missing? I.e. is there a use case where having validation in the prepare_for_submission
script is preferable?
As a final note, the traceback could be made more succinct/clear. Perhaps we can have validators return a generic AiiDA ValidationError
, which is caught in submit
/run
calls and simply returns a nicely formatted (cough using rich
cough) message.
🔀 Dealing with complex validations
I already mentioned in the intro that sometimes validations can be tricky because of [2-3]. To repeat the example (repurposing the number refs here):
- We want to check that a
parent_folder
is present whenCONTROL.calculation
is e.g.nscf
. - The
parent_folder
will not be present in the inputs of e.g. thePwBandsWorkChain
bands
step, since it is only created on runtime. - The validation will hence always fail, raising a warning when the builder is put together using the
get_builder_from_protocol()
method.
Our solution at the time was to split up the validation in a base and full one, and override the validator. However, after seeing this nice example of how the port namespace can be inspected:
aiida-quantumespresso/src/aiida_quantumespresso/calculations/__init__.py
Lines 152 to 153 in 7f4c4a1
if any(key not in port_namespace for key in ('pseudos', 'structure')): | |
return |
I now think it'd be better to check if the parent_folder
is in the name space, and only do the restart validation in this case. Wrapping work chains that dynamically assign this input should exclude the parent_folder
port when exposing the inputs of the PwBaseWorkChain
anyways. I've implemented an example of this in #927.
I'm going to write down these different use cases and how to best deal with them in a tutorial. Happy to have your input @sphuber!
I fully agree with your thought process:
Conclusion: ok to remove this validation in |
Thanks @sphuber! Note that this does mean that the validation will only occur in the step where the wrapped process is launched, which means that potentially multiple calculations will have been run in the work chain before the user finds out that a provided input was wrong. In the case of the structure and pseudos, the work chain could have in principle check this. But I believe we both agree that trying to do this for each work chain is not sustainable, unless we can somehow automate this. That said, if the user is aware of More importantly though:
What do you think about this? Could we somehow make the traceback of validation failures more readable? 😇 |
BasePwCpInputGenerator
: Remove superfluous validation
e326930
to
691d499
Compare
Yeah, fine for me to start omitting those. We already transitioned to using commit hashes in the CHANGELOG right? |
Yup! And with the PR number still easily findable on the commit page, there is no reason to keep them in the commit title I think. |
In the
prepare_for_submission
script of theBasePwCpInputGenerator
,there still is some validation for the pseudopotentials (see that the elements
match those in the provided
structure
) as well as theFIXED_COORDS
setting.
In general, we want to do all validation using validators of the appropriate port
or port name space. In the case of these two inputs, however, this validation is
already done by proper validators, so we can simply remove it.
Even in case the
structure
port is not in the name space because a wrappingwork chain excluded it when exposing the inputs, the validation will still occur when
the process is submitted in the steps of the work chain.