-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sum velocity-binned vpkt spectra across all ranks #284
base: main
Are you sure you want to change the base?
Conversation
On holidays now, will give a review and suggestions within a couple of weeks. |
CodSpeed Performance ReportMerging #284 will not alter performanceComparing Summary
|
83f3ebd
to
965f0ac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from the style suggestions, my main question is whether summing the output of each rank is really the correct thing to do versus averaging them.
@@ -258,6 +258,27 @@ def get_vpkt_config(modelpath: Path | str) -> dict[str, t.Any]: | |||
int(x) for x in vpkt_txt.readline().split() | |||
) | |||
|
|||
# read the next line |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment doesn't add information since the call is to to readline().
|
||
# Read the first file to get the dimensions | ||
vpkt_grid_files = [f"vpkt_grid_{mpirank:04d}.out" for mpirank in range(nprocs)] | ||
vpkt_grid_data = pd.read_csv(vpkt_grid_files[0], sep=" ", header=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer that we use polars for all new code except where it doesn't provide the necessary functionality (e.g. no multi-character delimiters in pl.read_csv), but this is a minor point. Eventually, we will be able to drop the global pandas imports and save ~500ms of start up time, which will improve the command-line autocomplete responsiveness.
else: | ||
print(f"Velocity columns in {filename} do not match previous files.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the condition is missing. After processing the first file, velocity_columns will be not None, so I would expect every additional file to print this warning.
rows_per_obsdir = total_grid_points * nvmaps # Rows for one observer direction | ||
|
||
# Output filename | ||
output_filename_template = "Vpkt_grid_total_{}.txt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid using capital letters in filenames to match the project convention (and avoid issues with case-insensitive file systems).
Are you sure that it is correct to sum these spectra being summed instead of averaging them? For the normal vpkt spectra, the energy contributions are divided by the total number of ranks by artis, but this does not seem to be the case for the vpkt velocity grid. Should the division by num procs be done by artistools?
@@ -558,6 +558,83 @@ def make_virtual_spectra_summed_file(modelpath: Path | str) -> None: | |||
print(f"Saved {outfile}") | |||
|
|||
|
|||
def make_virtual_grid_summed_file(modelpath: Path | str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"virtual grid" is not very descriptive to me. How about something like make_averaged_vpkt_grid_files? (depending on the sum vs average question)
|
||
# Write to output file | ||
output_path = Path(output_filename_template.format(obsdir)) | ||
with output_path.open("w", encoding="utf-8") as f: # Specify encoding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "Specify encoding" comment does not add any information. Remove it unless there is a good reason to keep it.
if data.shape[0] != expected_rows: | ||
print(f"Unexpected number of rows in {filename}. Expected {expected_rows}, got {data.shape[0]}.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this case cause a crash rather than just printing a warning?
# Ensure final data has exactly 5 columns | ||
if final_data.shape[1] > 5: | ||
print("Data has more than 5 columns. Should only have N1, N2, I, Q, U.") | ||
if final_data.shape[1] < 5: | ||
print("Data has less than 5 columns. Should only have N1, N2, I, Q, U.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simpler than this would be:
if final_data.shape[1] != 5:
msg = f"Data has {final_data.shape[1]} != 5 columns. Should only have N1, N2, I, Q, U."
raise ValueError(msg)
@@ -1341,6 +1341,10 @@ def addargs(parser) -> None: | |||
"--makevspecpol", action="store_true", help="Make file summing the virtual packet spectra from all ranks" | |||
) | |||
|
|||
parser.add_argument( | |||
"--makevspecgrid", action="store_true", help="Make file summing the virtual packet grid from all ranks" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider renaming vspecgrid to vspecvelgrid to be more descriptive. I plan to do this in the artis code at some point. Actually, I didn't know that anyone was using the vspecgrid functionality. It might be worth adding some extra em_pos, em_time columns to the vpackets*.out files so that we can do similar plots directly from that data.
optically_thick_cells = vpkt_txt.readline().split() | ||
vpkt_config["optically thick cells"] = optically_thick_cells |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you would have been looking at the artis source code to determine the meaning of items in the vpkt.txt file. Artis calls the items on this line override_thickcell_tau and cell_is_optically_thick_vpkt, which I find more description than "optically thick cells".
Please follow the existing convention and avoid line noise, e.g.,
vpkt_config["override_thickcell_tau"], vpkt["cell_is_optically_thick_vpkt"] = vpkt_txt.readline().split()
Similarly for the following lines, consider directly re-using the artis variable names for consistency, or coming up with a more descriptive name while avoiding spaces in the dictionary keys to match the existing code.
379218c
to
0b6fdad
Compare
Hi Luke, Is there something I've done wrong with this PR ? I'm not sure why it was reverted, as it passed all the tests?