-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parser for pp.x #428
Add parser for pp.x #428
Conversation
d2dbff3
to
096c136
Compare
Thanks a million @ConradJohnston . I am a bit swamped right now with the release of |
Beautiful. |
096c136
to
433e0a8
Compare
I'm trying out this branch, and one issue I've encountered is that the Totally possible I'm doing something wrong (haven't used pp.x before), just thought I'd let you know. |
Hi Dominik, Let me know if there's any other oddities. |
Not sure why, but when trying to plot total potential, pp.x crashed complaining about missing potentials. Maybe it needs to add core charges? I can make a better report of the exact context next week. |
Hmm, could you share input/output?
It might be that in this case (plot_num = 1 ?) the pseudopotential is
needed in order to reconstruct the core charge density and add it, as you
say, to the valence density.
…On Wed, 5 Feb 2020 at 20:09, Dominik Gresch ***@***.***> wrote:
Not sure why, but when trying to plot total potential, pp.x crashed
complaining about missing potentials. Maybe it needs to add core charges?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#428?email_source=notifications&email_token=AJT3VME757UAWGOXLJE34S3RBMMIBA5CNFSM4JEACKBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEK4ZW5A#issuecomment-582589300>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJT3VMBCDYTYPOGTKCOHOHDRBMMIBANCNFSM4JEACKBA>
.
|
433e0a8
to
ca62218
Compare
So this was a bit of a facepalm on my side... the parent calculation was run with Anyway, it's a bit curious that |
FYI: I've used this branch for a while now, and haven't found any other issues (definitely not using the full set of possible inputs, though). One feature I'd think would be convenient is adding unit information to the output data (could be later in a separate PR, of course). |
aiida_quantumespresso/parsers/pp.py
Outdated
|
||
arraydata = orm.ArrayData() | ||
arraydata.set_array('voxel', voxel_array) | ||
arraydata.set_array('dimensions', dimensions_array) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it's necessary to set the dimensions
explicitly as an array - is there a case where it's different from the shape of the data
array? That is already stored in the array|data
attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, it can be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @ConradJohnston . Sorry for the big delay, had to put aiida-quantumespresso
on the backburner for a while. I am planning to release v3.0.0 next week, so if we can address some issues in this PR before that I can include it.
I have mostly addressed higher level design issues now and have some questions there. Besides that there is the question of the test reference files. You are adding 140,000 lines, which if we do this for every parser, we are going to explode the repository. So please try to include the bare minimum output files to test the functionality of the parser. We are not really interested in checking that the parser correctly parses a huge dat file of thousands of lines, unless that has important custom logic. If it just loads the file through normal libraries then just reduce these files to a single line (or literally the minimum required)
aiida_quantumespresso/parsers/pp.py
Outdated
|
||
arraydata = orm.ArrayData() | ||
arraydata.set_array('voxel', voxel_array) | ||
arraydata.set_array('dimensions', dimensions_array) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, it can be removed
@sphuber @ConradJohnston is the goal to include this in the upcoming 3.0 release? If needed, I can help resolve some of the outstanding issues. |
Yes, ideally I would like to include this in the 3.0 release which I want to release this week. @ConradJohnston if you don't have the time for this, please let us know and I can take over from here with @greschd |
I'll get looking at these today.
Cheers!
…On Mon, 30 Mar 2020, 09:32 Sebastiaan Huber, ***@***.***> wrote:
Yes, ideally I would like to include this in the 3.0 release which I want
to release this week. @ConradJohnston <https://github.com/ConradJohnston>
if you don't have the time for this, please let us know and I can take over
from here with @greschd <https://github.com/greschd>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#428 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJT3VMGS2SST3FVUSF4LU3DRKBKJNANCNFSM4JEACKBA>
.
|
@ConradJohnston let me know when you are done with the fixes and I will give it a second pass. |
2c9b12d
to
548a4cb
Compare
Good for a second pass, apologies for the delay. Key changes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @ConradJohnston few more minor comments and then we are good to go
As commented by @giovannipizzi here there is a convention to convert to eV (and probably Angstrom?). I'm not sure if we should follow that here, also. On the one hand, it's nice to be consistent within Opinions @sphuber @ConradJohnston @giovannipizzi? |
My inclination is to be consistent. |
548a4cb
to
2458d02
Compare
2458d02
to
f93cce6
Compare
To be honest, I also haven't used most of the plotting kinds - but I think we should do some more checking on these units (maybe ask someone who knows). |
Received a reply from Paolo on the QE issue confirming the units we had uncertainties about. Going to push the final change imminently. |
7e48f57
to
0853403
Compare
@ConradJohnston please give me a headsup when you're done with the changes and I can give this a final pass. |
@sebastiaan - good to go!
…On Thu, 16 Apr 2020, 08:57 Sebastiaan Huber, ***@***.***> wrote:
@ConradJohnston <https://github.com/ConradJohnston> please give me a
headsup when you're done with the changes and I can give this a final pass.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#428 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJT3VMHPNRUUTLOO4SSHAULRM2257ANCNFSM4JEACKBA>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ConradJohnston I just realized one more thing that I must have missed during initial reviews relating to the parameters
input. That change should not be too much work. The other comment about moving the data output file to the retrieved temporary list we can leave for some other time, but just wanted to have your feedback on the idea, to see whether it even made sense
# Parse the post-processed-data according to what kind of data file was produced | ||
if self.output_parameters['output_format'] == 'gnuplot': | ||
if self.output_parameters['plot_type'] == '2D polar on a sphere': | ||
parsed_data = self.parse_gnuplot_polar(data_raw) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering now if we should maybe move the data file to the retrieve_temporary_list
. The reason is that these files can be quite big (correct?) and we are parsing it essentially in its entirety into an ArrayData
. In a sense we are then duplicating the content, because the original raw file is stored in the file repo of the calculation node as well as in a parsed version in the ArrayData
output node. Do we really need the original raw file if we are storing it as a node as well? If not, then we can retrieve this file in the temporary retrieved folder, which still allows to parse it, but the engine will clean it after parsing and not store it in the repo. Maybe this is too much work for now and if you think it makes sense we can simply open an issue for this. @greschd what are your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm not sure about this. For my own use it would definitely be fine to just discard it. However, I think the main reason pp.x
even supports different output formats is so that they can effortlessly be fed into different plotting tools. If we discard the file, the onus is on us to provide compatibility with these tools (from the ArrayData
).
Maybe a good solution would be to discard the file by default, and provide a setting to keep it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the discarding by default and providing a setting to override would be the best solution. But we can do this in a separate PR so that we can merge this one soon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we change this PR to always discard then, so that the change of adding a setting is backwards compatible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These can be GBs easily for a dense grid in a big supercell. I'd agree that if we parse, we don't need the original. This is loosely analogous to the argument over what to do with MD trajectories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So as it stands, when one asks pp.x to write out to file in particular format, pp.x produces two files: 1. the 3D gridded quantity in a custom format, and 2. the post-processed data converted into a particular format, and reduced to the dimension requested by iflag
.
In the most recent implementation, we never use the first file, and so there is no need to retrieve it. The second, the file in a format we choose according to what we can parse, we temporarily retrieve, parse and discard.
3: 6, # 3D -> Gaussian cube | ||
4: 0, # Polar on a sphere -> # Gnuplot, 1D | ||
} | ||
parameters['PLOT']['output_format'] = dimension_to_output_format[dimension] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhat related to the data file discussion below: If we expect that people use the "raw" output file it would make sense to allow manually specifying the output_format
, and just parse only the ones we understand.
That would be in the spirit of "allow everything the code itself can do", but I'd also be fine with keeping this as it is for now. If there is a need for different output formats, it's a relatively straightforward (and backwards-compatible) change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should the parser do in the first instance if it doesn't understand the format?
Just warn the user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I guess either a report
or warning
- level message.
But again, I'd also be fine with just discarding and not allowing explicit output_format
for now.
Long-term it might actually be nicer to have tools to convert ArrayData
to whatever output format is needed - it seems silly to couple the storage format to the visualization program.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be inclined to leave this for the future. The idea of a warning or report is fine, but I think the argument against this is similar to what @sphuber says in his report - the user will only discover this after they've created the nodes in the DB and the calculation has effecively failed in that it did something unexpected.
Very supportive of an export module for common plotting/visualisation tools - this could be an aiida-core
feature.
Also fixes #499, correct? |
0853403
to
982581f
Compare
Sure does. |
982581f
to
1da95ce
Compare
raise exceptions.InputValidationError("'[PLOT][iflag]' must be explicitly set") | ||
|
||
# Check that a valid plot type is requested | ||
if plot_num in range(23) and plot_num not in [14, 15, 16]: # Must be integer in range 0-22, but not 14-16: | ||
value['INPUTPP']['plot_num'] = int(plot_num) # If this test passes, we can safely cast to int | ||
else: | ||
raise exceptions.InputValidationError("'plot_num' must be an integer in the range 0-23") | ||
|
||
# Check for valid plot dimension: | ||
if dimension in range(5): # Must be in range 0-4: | ||
value['PLOT']['iflag'] = int(dimension) | ||
else: | ||
raise exceptions.InputValidationError("'iflag' must be an integer in the range 0-4") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you shouldn't raise here but just return the message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grand. Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry that I wasn't more clear, but it should return just a string not an exception instance
1da95ce
to
6b669c6
Compare
# Retrieve by default the output file and plot file | ||
calcinfo.retrieve_list = [] | ||
calcinfo.retrieve_list.append(self.inputs.metadata.options.output_filename) | ||
if self.inputs.metadata.options.keep_plot_file: | ||
calcinfo.retrieve_list.append(self._FILEOUT) | ||
else: | ||
calcinfo.retrieve_temporary_list = [self._FILEOUT] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sphuber - Maybe you have some insight - this doesn't seem to do what I would expect.
Even without the if/else block, all files are retrieved, rather than just those specified, as if retrieve_list is being ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well for a kickoff, I cannot even find now where you are telling pp.x
to write to these files. They are blocked keywords, so the plugin should add them to the parameters, correct? Something like
parameters = self.inputs.parameters.get_dict()
parameters['INPUTPP']['filplot'] = self._FILPLOT
parameters['INPUTPP']['filout'] = self._FILEOUT
or am I missing something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, those variables are set right at the top of the class. and later the relevant keywords added to the blocked list:
aiida-quantumespresso/aiida_quantumespresso/calculations/pp.py
Lines 63 to 66 in 7a294c4
# Grid data output file from first stage of pp calculation | |
_FILPLOT = 'aiida.filplot' | |
# Grid data output in desired format | |
_FILEOUT = 'aiida.fileout' |
I've added a PpCalculation
test class.
Retrieving the files works correctly now also.
|
||
# Check for essential keys | ||
try: | ||
plot_num = value['INPUTPP']['plot_num'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the value
here will be the actual input value, so it is a Dict
node. That means you should probably first do parameters = value.get_dict()
and then do checks on that normal dictionary. This actually shows why it is important that we add a unit test for the PpCalculation
class. As it stands this would not run I am pretty sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed this, sorry.
Added a test for the calculation class.
7a294c4
to
e490b91
Compare
One challenge of pp.x calculations is that there is a choice of both dimensionality and of output format. As we want to produce AiiDA `ArrayData` output nodes, the `PpCalculation` plugin is modified to enforce only Gnuplot (for 1D and 2D) and Cube (3D only) file formats, based on the dimensionality the user wants. The `PpCalculation` class is still lightweight in the sense that the user skill is still required to run pp.x and hand-holding is minimal, but is improved over the previous 'free-form' input version in that the output will definitely be parsed by AiiDA and stored in the database in a standard way. The parser collects the useful data from standard out and detects common problems. or convenience `PpCalculation` also enforces that the post-processed data is written to a file which is then retrieved and parsed, rather than to stdout. The parser converts this, for any dimensionality into the appropriate `ArrayData` representation.
3101943
to
6c16d65
Compare
@ConradJohnston I fixed the failing test, due to a compatibility issue in the validator signature and then took the liberty to clean up the |
Also streamlined the `PpCalculation` tests and added unit tests for the validation of the parameters. Finally did some minor styling changes.
@sphuber, I don't mind at all! Your valuable experience is always welcome. I'm happy to go if you are. |
Thanks a lot for the work and your patience @ConradJohnston ! |
Fixes #461 and fixes #499
One challenge of pp calculations is that there is a choice of both dimensionality and of output format. As we want to produce AiiDA ArrayData output nodes, the pp CalcJob class is modified to enforce only Gnuplot (for 1D and 2D) and Cube (3D only) file formats, based on the dimensionality the user wants. The pp calculation class is still lightweight in the sense that the presence of 'iflag' (dimensionality) dependant required parameters are not detected automatically, but is improved over the previous 'free-form' input version in that the output will definitely be parsed by AiiDA and stored in the DB in a standard way. The parser collects the useful data from standard out and detects common problems.
Some things to look at during review: