Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sum velocity-binned vpkt spectra across all ranks #284

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

jpollin98
Copy link
Contributor

@jpollin98 jpollin98 commented Dec 20, 2024

Hi Luke, Is there something I've done wrong with this PR ? I'm not sure why it was reverted, as it passed all the tests?

@lukeshingles
Copy link
Member

On holidays now, will give a review and suggestions within a couple of weeks.

Copy link

codspeed-hq bot commented Dec 20, 2024

CodSpeed Performance Report

Merging #284 will not alter performance

Comparing sum_all_polatisation_grids (0b6fdad) with main (093339c)

Summary

✅ 24 untouched benchmarks

@lukeshingles lukeshingles force-pushed the main branch 2 times, most recently from 83f3ebd to 965f0ac Compare December 23, 2024 21:18
@lukeshingles lukeshingles changed the title Sum all polatisation grids Sum velocity-binned vpkt spectra from all ranks Jan 4, 2025
@lukeshingles lukeshingles changed the title Sum velocity-binned vpkt spectra from all ranks Average velocity-binned vpkt spectra across all ranks Jan 4, 2025
@lukeshingles lukeshingles changed the title Average velocity-binned vpkt spectra across all ranks Sum velocity-binned vpkt spectra across all ranks Jan 4, 2025
Copy link
Member

@lukeshingles lukeshingles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the style suggestions, my main question is whether summing the output of each rank is really the correct thing to do versus averaging them.

@@ -258,6 +258,27 @@ def get_vpkt_config(modelpath: Path | str) -> dict[str, t.Any]:
int(x) for x in vpkt_txt.readline().split()
)

# read the next line
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment doesn't add information since the call is to to readline().


# Read the first file to get the dimensions
vpkt_grid_files = [f"vpkt_grid_{mpirank:04d}.out" for mpirank in range(nprocs)]
vpkt_grid_data = pd.read_csv(vpkt_grid_files[0], sep=" ", header=None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer that we use polars for all new code except where it doesn't provide the necessary functionality (e.g. no multi-character delimiters in pl.read_csv), but this is a minor point. Eventually, we will be able to drop the global pandas imports and save ~500ms of start up time, which will improve the command-line autocomplete responsiveness.

Comment on lines +601 to +602
else:
print(f"Velocity columns in {filename} do not match previous files.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the condition is missing. After processing the first file, velocity_columns will be not None, so I would expect every additional file to print this warning.

rows_per_obsdir = total_grid_points * nvmaps # Rows for one observer direction

# Output filename
output_filename_template = "Vpkt_grid_total_{}.txt"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid using capital letters in filenames to match the project convention (and avoid issues with case-insensitive file systems).
Are you sure that it is correct to sum these spectra being summed instead of averaging them? For the normal vpkt spectra, the energy contributions are divided by the total number of ranks by artis, but this does not seem to be the case for the vpkt velocity grid. Should the division by num procs be done by artistools?

@@ -558,6 +558,83 @@ def make_virtual_spectra_summed_file(modelpath: Path | str) -> None:
print(f"Saved {outfile}")


def make_virtual_grid_summed_file(modelpath: Path | str) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"virtual grid" is not very descriptive to me. How about something like make_averaged_vpkt_grid_files? (depending on the sum vs average question)


# Write to output file
output_path = Path(output_filename_template.format(obsdir))
with output_path.open("w", encoding="utf-8") as f: # Specify encoding
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Specify encoding" comment does not add any information. Remove it unless there is a good reason to keep it.

Comment on lines +595 to +596
if data.shape[0] != expected_rows:
print(f"Unexpected number of rows in {filename}. Expected {expected_rows}, got {data.shape[0]}.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this case cause a crash rather than just printing a warning?

Comment on lines +615 to +619
# Ensure final data has exactly 5 columns
if final_data.shape[1] > 5:
print("Data has more than 5 columns. Should only have N1, N2, I, Q, U.")
if final_data.shape[1] < 5:
print("Data has less than 5 columns. Should only have N1, N2, I, Q, U.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simpler than this would be:

if final_data.shape[1] != 5: 
    msg = f"Data has {final_data.shape[1]} != 5 columns. Should only have N1, N2, I, Q, U."
    raise ValueError(msg)

@@ -1341,6 +1341,10 @@ def addargs(parser) -> None:
"--makevspecpol", action="store_true", help="Make file summing the virtual packet spectra from all ranks"
)

parser.add_argument(
"--makevspecgrid", action="store_true", help="Make file summing the virtual packet grid from all ranks"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider renaming vspecgrid to vspecvelgrid to be more descriptive. I plan to do this in the artis code at some point. Actually, I didn't know that anyone was using the vspecgrid functionality. It might be worth adding some extra em_pos, em_time columns to the vpackets*.out files so that we can do similar plots directly from that data.

Comment on lines +265 to +266
optically_thick_cells = vpkt_txt.readline().split()
vpkt_config["optically thick cells"] = optically_thick_cells
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you would have been looking at the artis source code to determine the meaning of items in the vpkt.txt file. Artis calls the items on this line override_thickcell_tau and cell_is_optically_thick_vpkt, which I find more description than "optically thick cells".

Please follow the existing convention and avoid line noise, e.g.,

vpkt_config["override_thickcell_tau"], vpkt["cell_is_optically_thick_vpkt"] = vpkt_txt.readline().split()

Similarly for the following lines, consider directly re-using the artis variable names for consistency, or coming up with a more descriptive name while avoiding spaces in the dictionary keys to match the existing code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants