-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PpCalculation
: add support for retrieving and parsing multiple files
#533
Conversation
@yakutovicha I created an alternative implementation including unit tests that now actually also test the retrieved temporary folder functionality. I also tested this locally, at least for @cpignedoli if you have the time, would be great if you could maybe give this branch a go as well to see if it fixes your use case, thanks. |
I will for sure give it a try next week. Thanks a lot
… On 25 Jun 2020, at 18:23, Sebastiaan Huber ***@***.***> wrote:
@yakutovicha I created an alternative implementation including unit tests that now actually also test the retrieved temporary folder functionality. I also tested this locally, at least for plot_num=7 which produces multiple files. But please try and give this a go.
@cpignedoli if you have the time, would be great if you could maybe give this branch a go as well to see if it fixes your use case, thanks.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
|
||
# How to get the output filenames and how to open them, depends on whether they will have been retrieved in the | ||
# `retrieved` output node, or in the `retrieved_temporary_folder`. Instead of having a conditional with almost | ||
# the same loop logic in each branch, we apply a somewhat dirty trick to define an `opener` which is a callable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dirty trick? That's fantastic! 👍
# `retrieved` output node, or in the `retrieved_temporary_folder`. Instead of having a conditional with almost | ||
# the same loop logic in each branch, we apply a somewhat dirty trick to define an `opener` which is a callable | ||
# that will open a handle to the output file given a certain filename. | ||
if retrieve_temporary_list: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth mentioning here that there can either be all temporary files or all retrieved, not a mix of the two.
I think this is a decent choice, but had to go check in the calculation that this is what's implemented there, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested with the nanoribbon work chain and it worked fine. There is still the parsing error reported in #531, but this is another story.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Is there a case where multiple sub-output files are written but there is a single overall output file?
If so, we may want to offer some user control over whether all sub-outputs are parsed.
I wasn't able to figure this out from the docs, but it looks like it should always be a pair of output files. So I guess for now this should work until we find another edge case I guess |
The implementation of the `PpCalculation` and `PpParser` assumed that the code would always only produce on output file with pre-processed data and one output file with the data formatted in a custom plot format. However, for certain input parameter combinations, such as `INPUTPP.plot_num = 7` where more than a single band is requested, a pair of outputfiles is produced for each band. The filename of the data file has `INPUTPP.filplot` as a prefix, with some kind of suffix to distinguish it from the others. The corresponding plot file will use that filename as a prefix with the `PLOT.fileout` value as a suffix. For example, for the inputs: INPUTPP filplot = 'aiida.filplot' PLOT fileout = 'aiida.fileout' The data files will be named `aiida.filplot_K1_B1` and the plot files are formatted as `aiida.filplot_K1_B1aiida.fileout`. To support this use case, the `PpCalculation` is updated to not just retrieve a single file, but add a directive to the `retrieve_files` or `retrieve_temporary_list` that contains the corresponding globbing pattern. The `PpParser` then simply loops over the content of the retrieved (temporary) folder and parses each file whose filename matches the pattern described above. We assume that if there are more than one file, they all have the exact same format and so can be parsed with the same logic. Since now there are potentially more than one parsed output `ArrayData` node, the `output_data` port, which is not a namespace can not be used. To keep backwards compatibility, we add the `output_data_multiple` namespace, which is used if more than one output plot file is parsed.
02979f1
to
efd41ad
Compare
I believe this code should definitely be changed: datalist = []
for line in data_lines:
for i in range(0, len(line), 13):
data_point = line[i:i + 13].strip()
if data_point != '':
datalist.append(float(data_point))
# Unpack the list and repack as a 3D array
# Note the unusual indexing: cube files run over the z index first, then y and x.
# E.g. The first volumetric data point is x,y,z = (0,0,0) and the second is (0,0,1)
for i in range(0, xdim):
for j in range(0, ydim):
for k in range(0, zdim):
data_array[i, j, k] = (datalist[(i * ydim * zdim) + (j * zdim) + k]) If A better approach would look something like: data_array = np.empty(xdim*ydim*zdim, dtype=float)
cursor = 0
for line in lines:
ls = line.split()
data_array[cursor:cursor+len(ls)] = ls
cursor += len(ls)
data_array = data_array.reshape((xdim, ydim, zdim)) Update.fixed in #535 |
@greschd this is ready for re-review |
I am not @greschd but I believe this can be merged. I tested the plugin quite extensively with the nanoribbon work chain and it works nicely. Maybe merge #535 before (the changes there do not conflict with the current PR), as without it nanoribbon work chain wouldn't work. |
Fixes #530
The implementation of the
PpCalculation
andPpParser
assumed thatthe code would always only produce on output file with pre-processed
data and one output file with the data formatted in a custom plot
format. However, for certain input parameter combinations, such as
INPUTPP.plot_num = 7
where more than a single band is requested, apair of outputfiles is produced for each band. The filename of the data
file has
INPUTPP.filplot
as a prefix, with some kind of suffix todistinguish it from the others. The corresponding plot file will use
that filename as a prefix with the
PLOT.fileout
value as a suffix.For example, for the inputs:
The data files will be named
aiida.filplot_K1_B1
and the plot filesare formatted as
aiida.filplot_K1_B1aiida.fileout
.To support this use case, the
PpCalculation
is updated to not justretrieve a single file, but add a directive to the
retrieve_files
orretrieve_temporary_list
that contains the corresponding globbingpattern. The
PpParser
then simply loops over the content of theretrieved (temporary) folder and parses each file whose filename matches
the pattern described above. We assume that if there are more than one
file, they all have the exact same format and so can be parsed with the
same logic.
Since now there are potentially more than one parsed output
ArrayData
node, the
output_data
port, which is not a namespace can not be used.To keep backwards compatibility, we add the
output_data_multiple
namespace, which is used if more than one output plot file is parsed.