Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parser for pp.x #428

Merged
merged 2 commits into from
Apr 29, 2020
Merged

Add parser for pp.x #428

merged 2 commits into from
Apr 29, 2020

Conversation

ConradJohnston
Copy link
Contributor

@ConradJohnston ConradJohnston commented Oct 23, 2019

Fixes #461 and fixes #499

One challenge of pp calculations is that there is a choice of both dimensionality and of output format. As we want to produce AiiDA ArrayData output nodes, the pp CalcJob class is modified to enforce only Gnuplot (for 1D and 2D) and Cube (3D only) file formats, based on the dimensionality the user wants. The pp calculation class is still lightweight in the sense that the presence of 'iflag' (dimensionality) dependant required parameters are not detected automatically, but is improved over the previous 'free-form' input version in that the output will definitely be parsed by AiiDA and stored in the DB in a standard way. The parser collects the useful data from standard out and detects common problems.

Some things to look at during review:

  • For 1D data, an XyData node is created, but for 2D and 3D an ArrayData node is created. I like the features of XyData (like get_x(), get_y()) but I think for the sake of consistency, it might be better to use ArrayData in every case. I think it could be annoying for developers to have to remember to treat, or worse treat programmatically, the 1D case differently.
  • I reuse the emit_logs() function from the Ph parser. It might be worth making into a common utility function and just importing it into the Ph parser and wherever else.

@sphuber
Copy link
Contributor

sphuber commented Oct 25, 2019

Thanks a million @ConradJohnston . I am a bit swamped right now with the release of aiida-core==1.0.0 and papers, but I hope to be able to give this a look soon. Anyway we will include this in the release that follows aiida-core==1.0.0

@ConradJohnston
Copy link
Contributor Author

Thanks a million @ConradJohnston . I am a bit swamped right now with the release of aiida-core==1.0.0 and papers, but I hope to be able to give this a look soon. Anyway we will include this in the release that follows aiida-core==1.0.0

Beautiful.
One thought about the versioning for you to chew on in the meantime- I've added a number of reserved keywords to the PP CalcJob (before it was basically totally free-form) so this PR is likely to backwards-incompatible with previous uses of PpCalculation. However, the PpCalculaton was quite limited before and with no parser, so I wonder how widely it is actually used 'in the wild'.

@greschd
Copy link
Member

greschd commented Feb 4, 2020

I'm trying out this branch, and one issue I've encountered is that the pseudo directory from the parent_folder is not copied over.

Totally possible I'm doing something wrong (haven't used pp.x before), just thought I'd let you know.

@ConradJohnston
Copy link
Contributor Author

I'm trying out this branch, and one issue I've encountered is that the pseudo directory from the parent_folder is not copied over.

Totally possible I'm doing something wrong (haven't used pp.x before), just thought I'd let you know.

Hi Dominik,
thanks for the feedback!
In principle, I think this is fine as pp.x only postprocesses gridded data from PWSCF. So, on this basis, it will never use the pseudos as it only transforms the results of an existing calculation.

Let me know if there's any other oddities.

@greschd
Copy link
Member

greschd commented Feb 5, 2020

Not sure why, but when trying to plot total potential, pp.x crashed complaining about missing potentials. Maybe it needs to add core charges?

I can make a better report of the exact context next week.

@ConradJohnston
Copy link
Contributor Author

ConradJohnston commented Feb 6, 2020 via email

@greschd
Copy link
Member

greschd commented Feb 10, 2020

So this was a bit of a facepalm on my side... the parent calculation was run with disk_io='none'.

Anyway, it's a bit curious that pp.x complains about missing pseudo-potentials before it complains about the missing charge density file. Maybe we can come up with a better error message / exit status for this kind of issue, but it doesn't have to be in this PR.

@greschd
Copy link
Member

greschd commented Mar 19, 2020

FYI: I've used this branch for a while now, and haven't found any other issues (definitely not using the full set of possible inputs, though).

One feature I'd think would be convenient is adding unit information to the output data (could be later in a separate PR, of course).


arraydata = orm.ArrayData()
arraydata.set_array('voxel', voxel_array)
arraydata.set_array('dimensions', dimensions_array)
Copy link
Member

@greschd greschd Mar 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's necessary to set the dimensions explicitly as an array - is there a case where it's different from the shape of the data array? That is already stored in the array|data attribute.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, it can be removed

Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @ConradJohnston . Sorry for the big delay, had to put aiida-quantumespresso on the backburner for a while. I am planning to release v3.0.0 next week, so if we can address some issues in this PR before that I can include it.

I have mostly addressed higher level design issues now and have some questions there. Besides that there is the question of the test reference files. You are adding 140,000 lines, which if we do this for every parser, we are going to explode the repository. So please try to include the bare minimum output files to test the functionality of the parser. We are not really interested in checking that the parser correctly parses a huge dat file of thousands of lines, unless that has important custom logic. If it just loads the file through normal libraries then just reduce these files to a single line (or literally the minimum required)

aiida_quantumespresso/calculations/pp.py Outdated Show resolved Hide resolved
aiida_quantumespresso/calculations/pp.py Outdated Show resolved Hide resolved

arraydata = orm.ArrayData()
arraydata.set_array('voxel', voxel_array)
arraydata.set_array('dimensions', dimensions_array)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, it can be removed

aiida_quantumespresso/calculations/pp.py Outdated Show resolved Hide resolved
aiida_quantumespresso/calculations/pp.py Outdated Show resolved Hide resolved
aiida_quantumespresso/calculations/pp.py Outdated Show resolved Hide resolved
aiida_quantumespresso/calculations/pp.py Outdated Show resolved Hide resolved
aiida_quantumespresso/calculations/pp.py Outdated Show resolved Hide resolved
aiida_quantumespresso/parsers/pp.py Outdated Show resolved Hide resolved
aiida_quantumespresso/parsers/pp.py Outdated Show resolved Hide resolved
@greschd
Copy link
Member

greschd commented Mar 30, 2020

@sphuber @ConradJohnston is the goal to include this in the upcoming 3.0 release? If needed, I can help resolve some of the outstanding issues.

@sphuber
Copy link
Contributor

sphuber commented Mar 30, 2020

Yes, ideally I would like to include this in the 3.0 release which I want to release this week. @ConradJohnston if you don't have the time for this, please let us know and I can take over from here with @greschd

@ConradJohnston
Copy link
Contributor Author

ConradJohnston commented Mar 30, 2020 via email

@sphuber
Copy link
Contributor

sphuber commented Apr 1, 2020

@ConradJohnston let me know when you are done with the fixes and I will give it a second pass.

@ConradJohnston ConradJohnston force-pushed the pp-parser branch 2 times, most recently from 2c9b12d to 548a4cb Compare April 3, 2020 10:05
@ConradJohnston
Copy link
Contributor Author

@ConradJohnston let me know when you are done with the fixes and I will give it a second pass.

Good for a second pass, apologies for the delay.

Key changes:

  1. Removed extra input ports - everything set in the parameters dict.
  2. Removed settings node - it's not used in polite company.
  3. Extra input validation
  4. Now copies the pseudo folder
  5. Updated the exit codes to conform.
  6. Grabs the fixed folder names from the PW BasePwCpInputGenerator class. This is maybe a bit controversial because these are protected attributes in a different class.
  7. Added extra parser logic to deal with the polar coordinates case that wasn't handled
    before.
  8. Added test for polar case
  9. Slimmed down the datafiles to the minimum.

Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @ConradJohnston few more minor comments and then we are good to go

aiida_quantumespresso/calculations/pp.py Outdated Show resolved Hide resolved
aiida_quantumespresso/calculations/pp.py Outdated Show resolved Hide resolved
aiida_quantumespresso/calculations/pp.py Outdated Show resolved Hide resolved
aiida_quantumespresso/calculations/pp.py Show resolved Hide resolved
aiida_quantumespresso/calculations/pp.py Outdated Show resolved Hide resolved
aiida_quantumespresso/parsers/pp.py Outdated Show resolved Hide resolved
aiida_quantumespresso/parsers/pp.py Outdated Show resolved Hide resolved
@greschd
Copy link
Member

greschd commented Apr 3, 2020

As commented by @giovannipizzi here there is a convention to convert to eV (and probably Angstrom?). I'm not sure if we should follow that here, also.

On the one hand, it's nice to be consistent within aiida-quantumespresso. But it could also be confusing to have different units than pp.x itself.

Opinions @sphuber @ConradJohnston @giovannipizzi?

@ConradJohnston
Copy link
Contributor Author

ConradJohnston commented Apr 3, 2020

As commented by @giovannipizzi here there is a convention to convert to eV (and probably Angstrom?). I'm not sure if we should follow that here, also.

On the one hand, it's nice to be consistent within aiida-quantumespresso. But it could also be confusing to have different units than pp.x itself.

Opinions @sphuber @ConradJohnston @giovannipizzi?

My inclination is to be consistent.
Btw, @greschd, can you sanity check that units dict for me, please? A lot of these plot modes I've never used and the QE docs and source aren't too helpful.

@greschd
Copy link
Member

greschd commented Apr 3, 2020

To be honest, I also haven't used most of the plotting kinds - but I think we should do some more checking on these units (maybe ask someone who knows).

@ConradJohnston
Copy link
Contributor Author

@ConradJohnston and @greschd what is the final conclusion/status on the units?

Received a reply from Paolo on the QE issue confirming the units we had uncertainties about. Going to push the final change imminently.

@sphuber
Copy link
Contributor

sphuber commented Apr 16, 2020

@ConradJohnston please give me a headsup when you're done with the changes and I can give this a final pass.

@ConradJohnston
Copy link
Contributor Author

ConradJohnston commented Apr 16, 2020 via email

Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ConradJohnston I just realized one more thing that I must have missed during initial reviews relating to the parameters input. That change should not be too much work. The other comment about moving the data output file to the retrieved temporary list we can leave for some other time, but just wanted to have your feedback on the idea, to see whether it even made sense

aiida_quantumespresso/calculations/pp.py Outdated Show resolved Hide resolved
aiida_quantumespresso/calculations/pp.py Outdated Show resolved Hide resolved
aiida_quantumespresso/calculations/pp.py Outdated Show resolved Hide resolved
# Parse the post-processed-data according to what kind of data file was produced
if self.output_parameters['output_format'] == 'gnuplot':
if self.output_parameters['plot_type'] == '2D polar on a sphere':
parsed_data = self.parse_gnuplot_polar(data_raw)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering now if we should maybe move the data file to the retrieve_temporary_list. The reason is that these files can be quite big (correct?) and we are parsing it essentially in its entirety into an ArrayData. In a sense we are then duplicating the content, because the original raw file is stored in the file repo of the calculation node as well as in a parsed version in the ArrayData output node. Do we really need the original raw file if we are storing it as a node as well? If not, then we can retrieve this file in the temporary retrieved folder, which still allows to parse it, but the engine will clean it after parsing and not store it in the repo. Maybe this is too much work for now and if you think it makes sense we can simply open an issue for this. @greschd what are your thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not sure about this. For my own use it would definitely be fine to just discard it. However, I think the main reason pp.x even supports different output formats is so that they can effortlessly be fed into different plotting tools. If we discard the file, the onus is on us to provide compatibility with these tools (from the ArrayData).

Maybe a good solution would be to discard the file by default, and provide a setting to keep it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the discarding by default and providing a setting to override would be the best solution. But we can do this in a separate PR so that we can merge this one soon

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we change this PR to always discard then, so that the change of adding a setting is backwards compatible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can be GBs easily for a dense grid in a big supercell. I'd agree that if we parse, we don't need the original. This is loosely analogous to the argument over what to do with MD trajectories.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as it stands, when one asks pp.x to write out to file in particular format, pp.x produces two files: 1. the 3D gridded quantity in a custom format, and 2. the post-processed data converted into a particular format, and reduced to the dimension requested by iflag.
In the most recent implementation, we never use the first file, and so there is no need to retrieve it. The second, the file in a format we choose according to what we can parse, we temporarily retrieve, parse and discard.

3: 6, # 3D -> Gaussian cube
4: 0, # Polar on a sphere -> # Gnuplot, 1D
}
parameters['PLOT']['output_format'] = dimension_to_output_format[dimension]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat related to the data file discussion below: If we expect that people use the "raw" output file it would make sense to allow manually specifying the output_format, and just parse only the ones we understand.

That would be in the spirit of "allow everything the code itself can do", but I'd also be fine with keeping this as it is for now. If there is a need for different output formats, it's a relatively straightforward (and backwards-compatible) change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should the parser do in the first instance if it doesn't understand the format?
Just warn the user?

Copy link
Member

@greschd greschd Apr 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I guess either a report or warning - level message.

But again, I'd also be fine with just discarding and not allowing explicit output_format for now.

Long-term it might actually be nicer to have tools to convert ArrayData to whatever output format is needed - it seems silly to couple the storage format to the visualization program.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be inclined to leave this for the future. The idea of a warning or report is fine, but I think the argument against this is similar to what @sphuber says in his report - the user will only discover this after they've created the nodes in the DB and the calculation has effecively failed in that it did something unexpected.

Very supportive of an export module for common plotting/visualisation tools - this could be an aiida-core feature.

@greschd
Copy link
Member

greschd commented Apr 17, 2020

Also fixes #499, correct?

@ConradJohnston
Copy link
Contributor Author

Also fixes #499, correct?

Sure does.

Comment on lines 29 to 41
raise exceptions.InputValidationError("'[PLOT][iflag]' must be explicitly set")

# Check that a valid plot type is requested
if plot_num in range(23) and plot_num not in [14, 15, 16]: # Must be integer in range 0-22, but not 14-16:
value['INPUTPP']['plot_num'] = int(plot_num) # If this test passes, we can safely cast to int
else:
raise exceptions.InputValidationError("'plot_num' must be an integer in the range 0-23")

# Check for valid plot dimension:
if dimension in range(5): # Must be in range 0-4:
value['PLOT']['iflag'] = int(dimension)
else:
raise exceptions.InputValidationError("'iflag' must be an integer in the range 0-4")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you shouldn't raise here but just return the message

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grand. Fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry that I wasn't more clear, but it should return just a string not an exception instance

Comment on lines 213 to 219
# Retrieve by default the output file and plot file
calcinfo.retrieve_list = []
calcinfo.retrieve_list.append(self.inputs.metadata.options.output_filename)
if self.inputs.metadata.options.keep_plot_file:
calcinfo.retrieve_list.append(self._FILEOUT)
else:
calcinfo.retrieve_temporary_list = [self._FILEOUT]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sphuber - Maybe you have some insight - this doesn't seem to do what I would expect.
Even without the if/else block, all files are retrieved, rather than just those specified, as if retrieve_list is being ignored.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well for a kickoff, I cannot even find now where you are telling pp.x to write to these files. They are blocked keywords, so the plugin should add them to the parameters, correct? Something like

parameters = self.inputs.parameters.get_dict()
parameters['INPUTPP']['filplot'] = self._FILPLOT
parameters['INPUTPP']['filout'] = self._FILEOUT

or am I missing something here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, those variables are set right at the top of the class. and later the relevant keywords added to the blocked list:

# Grid data output file from first stage of pp calculation
_FILPLOT = 'aiida.filplot'
# Grid data output in desired format
_FILEOUT = 'aiida.fileout'

I've added a PpCalculation test class.

Retrieving the files works correctly now also.


# Check for essential keys
try:
plot_num = value['INPUTPP']['plot_num']
Copy link
Contributor

@sphuber sphuber Apr 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the value here will be the actual input value, so it is a Dict node. That means you should probably first do parameters = value.get_dict() and then do checks on that normal dictionary. This actually shows why it is important that we add a unit test for the PpCalculation class. As it stands this would not run I am pretty sure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this, sorry.
Added a test for the calculation class.

@ConradJohnston ConradJohnston force-pushed the pp-parser branch 3 times, most recently from 7a294c4 to e490b91 Compare April 27, 2020 15:26
One challenge of pp.x calculations is that there is a choice of both
dimensionality and of output format. As we want to produce AiiDA
`ArrayData` output nodes, the `PpCalculation` plugin is modified to
enforce only Gnuplot (for 1D and 2D) and Cube (3D only) file formats,
based on the dimensionality the user wants. The `PpCalculation` class is
still lightweight in the sense that the user skill is still required to
run pp.x and hand-holding is minimal, but is improved over the previous
'free-form' input version in that the output will definitely be parsed
by AiiDA and stored in the database in a standard way. The parser
collects the useful data from standard out and detects common problems.
or convenience `PpCalculation` also enforces that the post-processed
data is written to a file which is then retrieved and parsed, rather
than to stdout. The parser converts this, for any dimensionality into
the appropriate `ArrayData` representation.
@sphuber sphuber force-pushed the pp-parser branch 2 times, most recently from 3101943 to 6c16d65 Compare April 28, 2020 17:36
@sphuber
Copy link
Contributor

sphuber commented Apr 28, 2020

@ConradJohnston I fixed the failing test, due to a compatibility issue in the validator signature and then took the liberty to clean up the PpCalculation tests and add some additional ones for the parameter validation. I hope you don't mind. I also rebased and the tests now all pass, so for me this would be good to go. Let me know if you agree.

Also streamlined the `PpCalculation` tests and added unit tests for the
validation of the parameters. Finally did some minor styling changes.
@ConradJohnston
Copy link
Contributor Author

@sphuber, I don't mind at all! Your valuable experience is always welcome. I'm happy to go if you are.

@sphuber sphuber merged commit 80f4957 into aiidateam:develop Apr 29, 2020
@sphuber
Copy link
Contributor

sphuber commented Apr 29, 2020

Thanks a lot for the work and your patience @ConradJohnston !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Verify units lookup table for pp.x parser PpCalculation also needs to copy pseudo folder
3 participants