Add parser for pp.x #428

ConradJohnston · 2019-10-23T11:45:21Z

Fixes #461 and fixes #499

One challenge of pp calculations is that there is a choice of both dimensionality and of output format. As we want to produce AiiDA ArrayData output nodes, the pp CalcJob class is modified to enforce only Gnuplot (for 1D and 2D) and Cube (3D only) file formats, based on the dimensionality the user wants. The pp calculation class is still lightweight in the sense that the presence of 'iflag' (dimensionality) dependant required parameters are not detected automatically, but is improved over the previous 'free-form' input version in that the output will definitely be parsed by AiiDA and stored in the DB in a standard way. The parser collects the useful data from standard out and detects common problems.

Some things to look at during review:

For 1D data, an XyData node is created, but for 2D and 3D an ArrayData node is created. I like the features of XyData (like get_x(), get_y()) but I think for the sake of consistency, it might be better to use ArrayData in every case. I think it could be annoying for developers to have to remember to treat, or worse treat programmatically, the 1D case differently.
I reuse the emit_logs() function from the Ph parser. It might be worth making into a common utility function and just importing it into the Ph parser and wherever else.

sphuber · 2019-10-25T06:23:24Z

Thanks a million @ConradJohnston . I am a bit swamped right now with the release of aiida-core==1.0.0 and papers, but I hope to be able to give this a look soon. Anyway we will include this in the release that follows aiida-core==1.0.0

ConradJohnston · 2019-10-25T14:24:22Z

Thanks a million @ConradJohnston . I am a bit swamped right now with the release of aiida-core==1.0.0 and papers, but I hope to be able to give this a look soon. Anyway we will include this in the release that follows aiida-core==1.0.0

Beautiful.
One thought about the versioning for you to chew on in the meantime- I've added a number of reserved keywords to the PP CalcJob (before it was basically totally free-form) so this PR is likely to backwards-incompatible with previous uses of PpCalculation. However, the PpCalculaton was quite limited before and with no parser, so I wonder how widely it is actually used 'in the wild'.

greschd · 2020-02-04T16:05:32Z

I'm trying out this branch, and one issue I've encountered is that the pseudo directory from the parent_folder is not copied over.

Totally possible I'm doing something wrong (haven't used pp.x before), just thought I'd let you know.

ConradJohnston · 2020-02-05T14:54:51Z

I'm trying out this branch, and one issue I've encountered is that the pseudo directory from the parent_folder is not copied over.

Totally possible I'm doing something wrong (haven't used pp.x before), just thought I'd let you know.

Hi Dominik,
thanks for the feedback!
In principle, I think this is fine as pp.x only postprocesses gridded data from PWSCF. So, on this basis, it will never use the pseudos as it only transforms the results of an existing calculation.

Let me know if there's any other oddities.

greschd · 2020-02-05T20:09:36Z

Not sure why, but when trying to plot total potential, pp.x crashed complaining about missing potentials. Maybe it needs to add core charges?

I can make a better report of the exact context next week.

ConradJohnston · 2020-02-06T09:23:05Z

Hmm, could you share input/output? It might be that in this case (plot_num = 1 ?) the pseudopotential is needed in order to reconstruct the core charge density and add it, as you say, to the valence density.

…

On Wed, 5 Feb 2020 at 20:09, Dominik Gresch ***@***.***> wrote: Not sure why, but when trying to plot total potential, pp.x crashed complaining about missing potentials. Maybe it needs to add core charges? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#428?email_source=notifications&email_token=AJT3VME757UAWGOXLJE34S3RBMMIBA5CNFSM4JEACKBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEK4ZW5A#issuecomment-582589300>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJT3VMBCDYTYPOGTKCOHOHDRBMMIBANCNFSM4JEACKBA> .

greschd · 2020-02-10T22:22:52Z

So this was a bit of a facepalm on my side... the parent calculation was run with disk_io='none'.

Anyway, it's a bit curious that pp.x complains about missing pseudo-potentials before it complains about the missing charge density file. Maybe we can come up with a better error message / exit status for this kind of issue, but it doesn't have to be in this PR.

greschd · 2020-03-19T10:46:07Z

FYI: I've used this branch for a while now, and haven't found any other issues (definitely not using the full set of possible inputs, though).

One feature I'd think would be convenient is adding unit information to the output data (could be later in a separate PR, of course).

greschd · 2020-03-19T13:11:57Z

aiida_quantumespresso/parsers/pp.py

+
+ arraydata = orm.ArrayData()
+ arraydata.set_array('voxel', voxel_array)
+ arraydata.set_array('dimensions', dimensions_array)


I'm not sure if it's necessary to set the dimensions explicitly as an array - is there a case where it's different from the shape of the data array? That is already stored in the array|data attribute.

Correct, it can be removed

sphuber

Thanks a lot @ConradJohnston . Sorry for the big delay, had to put aiida-quantumespresso on the backburner for a while. I am planning to release v3.0.0 next week, so if we can address some issues in this PR before that I can include it.

I have mostly addressed higher level design issues now and have some questions there. Besides that there is the question of the test reference files. You are adding 140,000 lines, which if we do this for every parser, we are going to explode the repository. So please try to include the bare minimum output files to test the functionality of the parser. We are not really interested in checking that the parser correctly parses a huge dat file of thousands of lines, unless that has important custom logic. If it just loads the file through normal libraries then just reduce these files to a single line (or literally the minimum required)

aiida_quantumespresso/calculations/pp.py

sphuber · 2020-03-25T09:01:07Z

aiida_quantumespresso/parsers/pp.py

+
+ arraydata = orm.ArrayData()
+ arraydata.set_array('voxel', voxel_array)
+ arraydata.set_array('dimensions', dimensions_array)


Correct, it can be removed

aiida_quantumespresso/calculations/pp.py

aiida_quantumespresso/parsers/pp.py

greschd · 2020-03-30T08:25:52Z

@sphuber @ConradJohnston is the goal to include this in the upcoming 3.0 release? If needed, I can help resolve some of the outstanding issues.

sphuber · 2020-03-30T08:32:08Z

Yes, ideally I would like to include this in the 3.0 release which I want to release this week. @ConradJohnston if you don't have the time for this, please let us know and I can take over from here with @greschd

ConradJohnston · 2020-03-30T09:06:10Z

I'll get looking at these today. Cheers!

…

On Mon, 30 Mar 2020, 09:32 Sebastiaan Huber, ***@***.***> wrote: Yes, ideally I would like to include this in the 3.0 release which I want to release this week. @ConradJohnston <https://github.com/ConradJohnston> if you don't have the time for this, please let us know and I can take over from here with @greschd <https://github.com/greschd> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#428 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJT3VMGS2SST3FVUSF4LU3DRKBKJNANCNFSM4JEACKBA> .

sphuber · 2020-04-01T15:54:50Z

@ConradJohnston let me know when you are done with the fixes and I will give it a second pass.

ConradJohnston · 2020-04-03T10:17:54Z

@ConradJohnston let me know when you are done with the fixes and I will give it a second pass.

Good for a second pass, apologies for the delay.

Key changes:

Removed extra input ports - everything set in the parameters dict.
Removed settings node - it's not used in polite company.
Extra input validation
Now copies the pseudo folder
Updated the exit codes to conform.
Grabs the fixed folder names from the PW BasePwCpInputGenerator class. This is maybe a bit controversial because these are protected attributes in a different class.
Added extra parser logic to deal with the polar coordinates case that wasn't handled
before.
Added test for polar case
Slimmed down the datafiles to the minimum.

sphuber

Thanks a lot @ConradJohnston few more minor comments and then we are good to go

aiida_quantumespresso/calculations/pp.py

aiida_quantumespresso/parsers/pp.py

greschd · 2020-04-03T10:44:44Z

As commented by @giovannipizzi here there is a convention to convert to eV (and probably Angstrom?). I'm not sure if we should follow that here, also.

On the one hand, it's nice to be consistent within aiida-quantumespresso. But it could also be confusing to have different units than pp.x itself.

Opinions @sphuber @ConradJohnston @giovannipizzi?

ConradJohnston · 2020-04-03T10:59:16Z

As commented by @giovannipizzi here there is a convention to convert to eV (and probably Angstrom?). I'm not sure if we should follow that here, also.

On the one hand, it's nice to be consistent within aiida-quantumespresso. But it could also be confusing to have different units than pp.x itself.

Opinions @sphuber @ConradJohnston @giovannipizzi?

My inclination is to be consistent.
Btw, @greschd, can you sanity check that units dict for me, please? A lot of these plot modes I've never used and the QE docs and source aren't too helpful.

aiida_quantumespresso/parsers/pp.py

greschd · 2020-04-03T11:08:55Z

To be honest, I also haven't used most of the plotting kinds - but I think we should do some more checking on these units (maybe ask someone who knows).

aiida_quantumespresso/parsers/pp.py

ConradJohnston · 2020-04-14T16:19:14Z

@ConradJohnston and @greschd what is the final conclusion/status on the units?

Received a reply from Paolo on the QE issue confirming the units we had uncertainties about. Going to push the final change imminently.

sphuber · 2020-04-16T07:57:04Z

@ConradJohnston please give me a headsup when you're done with the changes and I can give this a final pass.

ConradJohnston · 2020-04-16T08:02:46Z

@sebastiaan - good to go!

…

On Thu, 16 Apr 2020, 08:57 Sebastiaan Huber, ***@***.***> wrote: @ConradJohnston <https://github.com/ConradJohnston> please give me a headsup when you're done with the changes and I can give this a final pass. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#428 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJT3VMHPNRUUTLOO4SSHAULRM2257ANCNFSM4JEACKBA> .

sphuber

Thanks @ConradJohnston I just realized one more thing that I must have missed during initial reviews relating to the parameters input. That change should not be too much work. The other comment about moving the data output file to the retrieved temporary list we can leave for some other time, but just wanted to have your feedback on the idea, to see whether it even made sense

aiida_quantumespresso/calculations/pp.py

sphuber · 2020-04-16T09:31:54Z

aiida_quantumespresso/parsers/pp.py

+ # Parse the post-processed-data according to what kind of data file was produced
+ if self.output_parameters['output_format'] == 'gnuplot':
+ if self.output_parameters['plot_type'] == '2D polar on a sphere':
+ parsed_data = self.parse_gnuplot_polar(data_raw)


I am wondering now if we should maybe move the data file to the retrieve_temporary_list. The reason is that these files can be quite big (correct?) and we are parsing it essentially in its entirety into an ArrayData. In a sense we are then duplicating the content, because the original raw file is stored in the file repo of the calculation node as well as in a parsed version in the ArrayData output node. Do we really need the original raw file if we are storing it as a node as well? If not, then we can retrieve this file in the temporary retrieved folder, which still allows to parse it, but the engine will clean it after parsing and not store it in the repo. Maybe this is too much work for now and if you think it makes sense we can simply open an issue for this. @greschd what are your thoughts?

Hmm, I'm not sure about this. For my own use it would definitely be fine to just discard it. However, I think the main reason pp.x even supports different output formats is so that they can effortlessly be fed into different plotting tools. If we discard the file, the onus is on us to provide compatibility with these tools (from the ArrayData).

Maybe a good solution would be to discard the file by default, and provide a setting to keep it?

I think the discarding by default and providing a setting to override would be the best solution. But we can do this in a separate PR so that we can merge this one soon

Shouldn't we change this PR to always discard then, so that the change of adding a setting is backwards compatible?

These can be GBs easily for a dense grid in a big supercell. I'd agree that if we parse, we don't need the original. This is loosely analogous to the argument over what to do with MD trajectories.

So as it stands, when one asks pp.x to write out to file in particular format, pp.x produces two files: 1. the 3D gridded quantity in a custom format, and 2. the post-processed data converted into a particular format, and reduced to the dimension requested by iflag.
In the most recent implementation, we never use the first file, and so there is no need to retrieve it. The second, the file in a format we choose according to what we can parse, we temporarily retrieve, parse and discard.

greschd · 2020-04-16T09:59:29Z

aiida_quantumespresso/calculations/pp.py

+ 3: 6, # 3D -> Gaussian cube
+ 4: 0, # Polar on a sphere -> # Gnuplot, 1D
+ }
+ parameters['PLOT']['output_format'] = dimension_to_output_format[dimension]


Somewhat related to the data file discussion below: If we expect that people use the "raw" output file it would make sense to allow manually specifying the output_format, and just parse only the ones we understand.

That would be in the spirit of "allow everything the code itself can do", but I'd also be fine with keeping this as it is for now. If there is a need for different output formats, it's a relatively straightforward (and backwards-compatible) change.

What should the parser do in the first instance if it doesn't understand the format?
Just warn the user?

Yeah, I guess either a report or warning - level message.

But again, I'd also be fine with just discarding and not allowing explicit output_format for now.

Long-term it might actually be nicer to have tools to convert ArrayData to whatever output format is needed - it seems silly to couple the storage format to the visualization program.

I'd be inclined to leave this for the future. The idea of a warning or report is fine, but I think the argument against this is similar to what @sphuber says in his report - the user will only discover this after they've created the nodes in the DB and the calculation has effecively failed in that it did something unexpected.

Very supportive of an export module for common plotting/visualisation tools - this could be an aiida-core feature.

greschd · 2020-04-17T12:52:54Z

Also fixes #499, correct?

ConradJohnston · 2020-04-20T14:07:31Z

Also fixes #499, correct?

Sure does.

sphuber · 2020-04-21T09:33:12Z

aiida_quantumespresso/calculations/pp.py

+ raise exceptions.InputValidationError("'[PLOT][iflag]' must be explicitly set")
+
+ # Check that a valid plot type is requested
+ if plot_num in range(23) and plot_num not in [14, 15, 16]: # Must be integer in range 0-22, but not 14-16:
+ value['INPUTPP']['plot_num'] = int(plot_num) # If this test passes, we can safely cast to int
+ else:
+ raise exceptions.InputValidationError("'plot_num' must be an integer in the range 0-23")
+
+ # Check for valid plot dimension:
+ if dimension in range(5): # Must be in range 0-4:
+ value['PLOT']['iflag'] = int(dimension)
+ else:
+ raise exceptions.InputValidationError("'iflag' must be an integer in the range 0-4")


you shouldn't raise here but just return the message

Grand. Fixed.

Sorry that I wasn't more clear, but it should return just a string not an exception instance

ConradJohnston · 2020-04-21T15:05:46Z

aiida_quantumespresso/calculations/pp.py

+ # Retrieve by default the output file and plot file
+ calcinfo.retrieve_list = []
+ calcinfo.retrieve_list.append(self.inputs.metadata.options.output_filename)
+ if self.inputs.metadata.options.keep_plot_file:
+ calcinfo.retrieve_list.append(self._FILEOUT)
+ else:
+ calcinfo.retrieve_temporary_list = [self._FILEOUT]


@sphuber - Maybe you have some insight - this doesn't seem to do what I would expect.
Even without the if/else block, all files are retrieved, rather than just those specified, as if retrieve_list is being ignored.

Well for a kickoff, I cannot even find now where you are telling pp.x to write to these files. They are blocked keywords, so the plugin should add them to the parameters, correct? Something like

parameters = self.inputs.parameters.get_dict() parameters['INPUTPP']['filplot'] = self._FILPLOT parameters['INPUTPP']['filout'] = self._FILEOUT

or am I missing something here?

Yes, those variables are set right at the top of the class. and later the relevant keywords added to the blocked list:

aiida-quantumespresso/aiida_quantumespresso/calculations/pp.py

Lines 63 to 66 in 7a294c4

# Grid data output file from first stage of pp calculation

_FILPLOT = 'aiida.filplot'

# Grid data output in desired format

_FILEOUT = 'aiida.fileout'

I've added a PpCalculation test class.

Retrieving the files works correctly now also.

sphuber · 2020-04-22T09:43:28Z

aiida_quantumespresso/calculations/pp.py

+
+ # Check for essential keys
+ try:
+ plot_num = value['INPUTPP']['plot_num']


Also the value here will be the actual input value, so it is a Dict node. That means you should probably first do parameters = value.get_dict() and then do checks on that normal dictionary. This actually shows why it is important that we add a unit test for the PpCalculation class. As it stands this would not run I am pretty sure

Fixed this, sorry.
Added a test for the calculation class.

One challenge of pp.x calculations is that there is a choice of both dimensionality and of output format. As we want to produce AiiDA `ArrayData` output nodes, the `PpCalculation` plugin is modified to enforce only Gnuplot (for 1D and 2D) and Cube (3D only) file formats, based on the dimensionality the user wants. The `PpCalculation` class is still lightweight in the sense that the user skill is still required to run pp.x and hand-holding is minimal, but is improved over the previous 'free-form' input version in that the output will definitely be parsed by AiiDA and stored in the database in a standard way. The parser collects the useful data from standard out and detects common problems. or convenience `PpCalculation` also enforces that the post-processed data is written to a file which is then retrieved and parsed, rather than to stdout. The parser converts this, for any dimensionality into the appropriate `ArrayData` representation.

sphuber · 2020-04-28T17:44:44Z

@ConradJohnston I fixed the failing test, due to a compatibility issue in the validator signature and then took the liberty to clean up the PpCalculation tests and add some additional ones for the parameter validation. I hope you don't mind. I also rebased and the tests now all pass, so for me this would be good to go. Let me know if you agree.

Also streamlined the `PpCalculation` tests and added unit tests for the validation of the parameters. Finally did some minor styling changes.

ConradJohnston · 2020-04-28T19:23:13Z

@sphuber, I don't mind at all! Your valuable experience is always welcome. I'm happy to go if you are.

sphuber · 2020-04-29T12:56:09Z

Thanks a lot for the work and your patience @ConradJohnston !

ConradJohnston force-pushed the pp-parser branch from d2dbff3 to 096c136 Compare October 23, 2019 13:44

ConradJohnston force-pushed the pp-parser branch from 096c136 to 433e0a8 Compare October 27, 2019 22:40

ConradJohnston force-pushed the pp-parser branch from 433e0a8 to ca62218 Compare February 10, 2020 15:36

greschd mentioned this pull request Feb 20, 2020

PpCalculation also needs to copy pseudo folder #461

Closed

greschd reviewed Mar 19, 2020

View reviewed changes

sphuber requested changes Mar 25, 2020

View reviewed changes

ConradJohnston force-pushed the pp-parser branch 2 times, most recently from 2c9b12d to 548a4cb Compare April 3, 2020 10:05

sphuber requested changes Apr 3, 2020

View reviewed changes

ConradJohnston force-pushed the pp-parser branch from 548a4cb to 2458d02 Compare April 3, 2020 10:59

greschd reviewed Apr 3, 2020

View reviewed changes

aiida_quantumespresso/parsers/pp.py Outdated Show resolved Hide resolved

ConradJohnston force-pushed the pp-parser branch from 2458d02 to f93cce6 Compare April 3, 2020 11:06

greschd reviewed Apr 3, 2020

View reviewed changes

aiida_quantumespresso/parsers/pp.py Outdated Show resolved Hide resolved

greschd reviewed Apr 3, 2020

View reviewed changes

aiida_quantumespresso/parsers/pp.py Outdated Show resolved Hide resolved

ConradJohnston force-pushed the pp-parser branch 2 times, most recently from 7e48f57 to 0853403 Compare April 15, 2020 11:29

ConradJohnston requested a review from sphuber April 15, 2020 13:02

sphuber requested changes Apr 16, 2020

View reviewed changes

greschd reviewed Apr 16, 2020

View reviewed changes

ConradJohnston force-pushed the pp-parser branch from 0853403 to 982581f Compare April 20, 2020 14:03

ConradJohnston force-pushed the pp-parser branch from 982581f to 1da95ce Compare April 20, 2020 15:06

sphuber reviewed Apr 21, 2020

View reviewed changes

ConradJohnston force-pushed the pp-parser branch from 1da95ce to 6b669c6 Compare April 21, 2020 14:56

ConradJohnston commented Apr 21, 2020

View reviewed changes

sphuber reviewed Apr 22, 2020

View reviewed changes

ConradJohnston force-pushed the pp-parser branch 3 times, most recently from 7a294c4 to e490b91 Compare April 27, 2020 15:26

sphuber force-pushed the pp-parser branch 2 times, most recently from 3101943 to 6c16d65 Compare April 28, 2020 17:36

Fix validator function for PpCalculation

8121b38

Also streamlined the `PpCalculation` tests and added unit tests for the validation of the parameters. Finally did some minor styling changes.

sphuber force-pushed the pp-parser branch from 6c16d65 to 8121b38 Compare April 28, 2020 18:45

sphuber approved these changes Apr 29, 2020

View reviewed changes

sphuber merged commit 80f4957 into aiidateam:develop Apr 29, 2020

yakutovicha mentioned this pull request Jun 24, 2020

PP plugin: retrieve multiple files. #530

Closed

	# Grid data output file from first stage of pp calculation
	_FILPLOT = 'aiida.filplot'
	# Grid data output in desired format
	_FILEOUT = 'aiida.fileout'

Add parser for pp.x #428

Add parser for pp.x #428

Conversation

ConradJohnston commented Oct 23, 2019 • edited by sphuber Loading

sphuber commented Oct 25, 2019

ConradJohnston commented Oct 25, 2019

greschd commented Feb 4, 2020

ConradJohnston commented Feb 5, 2020

greschd commented Feb 5, 2020 • edited Loading

ConradJohnston commented Feb 6, 2020 via email

greschd commented Feb 10, 2020

greschd commented Mar 19, 2020

greschd Mar 19, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sphuber left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

greschd commented Mar 30, 2020

sphuber commented Mar 30, 2020

ConradJohnston commented Mar 30, 2020 via email

sphuber commented Apr 1, 2020

ConradJohnston commented Apr 3, 2020

sphuber left a comment

Choose a reason for hiding this comment

greschd commented Apr 3, 2020

ConradJohnston commented Apr 3, 2020 • edited Loading

greschd commented Apr 3, 2020

ConradJohnston commented Apr 14, 2020

sphuber commented Apr 16, 2020

ConradJohnston commented Apr 16, 2020 via email

sphuber left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

greschd Apr 20, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

greschd commented Apr 17, 2020

ConradJohnston commented Apr 20, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sphuber Apr 22, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sphuber commented Apr 28, 2020

ConradJohnston commented Apr 28, 2020

sphuber commented Apr 29, 2020

ConradJohnston commented Oct 23, 2019 •

edited by sphuber

Loading

greschd commented Feb 5, 2020 •

edited

Loading

greschd Mar 19, 2020 •

edited

Loading

ConradJohnston commented Apr 3, 2020 •

edited

Loading

greschd Apr 20, 2020 •

edited

Loading

sphuber Apr 22, 2020 •

edited

Loading