Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to limit number of BF estimators #47

Merged
merged 43 commits into from
Mar 25, 2024
Merged

Add option to limit number of BF estimators #47

merged 43 commits into from
Mar 25, 2024

Conversation

jpollin98
Copy link
Contributor

No description provided.

@lukeshingles
Copy link
Member

A few things to do before we can merge:

  • Set your IDE to use clang-format and pre-commit install to autoformat the code on commit.
  • Rename BFGlobalVariable to something more descriptive like "bfestimcount". By moving the setup code from input.cc to radfield.cc, we can avoid introducing a new global variable altogether.
  • Ensure that performance has not decreased in the default case of storing a full set of bf estimators. If performance is a problem, allcont can be split into two lists, with and without bf estimators to avoid the branching overhead.

@jpollin98
Copy link
Contributor Author

Hi Luke, I have made a few of the changes you have suggested. The code, for the most part, appears ready to merge into the main branch; once I have implemented the size_t changes, I agree some performance testing should be done.

However, I have a question. For some reason, the code seemed to fall over on one of the Kilonova tests. However, when I reran the tests, it passed without causing any errors. Is this behaviour expected?

@lukeshingles
Copy link
Member

We need to add a new test for this functionality to ensure that it doesn't get broken in the future. I suggest just making a slightly modified version of the nebularonezone test. Do you want me to do that part or do you want to try it?

@lukeshingles
Copy link
Member

Hi Luke, I have made a few of the changes you have suggested. The code, for the most part, appears ready to merge into the main branch; once I have implemented the size_t changes, I agree some performance testing should be done.

However, I have a question. For some reason, the code seemed to fall over on one of the Kilonova tests. However, when I reran the tests, it passed without causing any errors. Is this behaviour expected?

Yes, I have seen this before on the kilonova test. I'm not sure exactly how it happens, but I suspect that we occasionally get assigned a runner with a slightly different CPU or software environment and this changes the checksums. Re-running makes sense in that situation.

@jpollin98
Copy link
Contributor Author

jpollin98 commented Feb 22, 2024

We need to add a new test for this functionality to ensure that it doesn't get broken in the future. I suggest just making a slightly modified version of the nebularonezone test. Do you want me to do that part or do you want to try it?

I'm happy to try to make a new nebularonezone test that tests the functionality, as it's probably valuable for me to be a bit more competent at the code development side of things.

@lukeshingles
Copy link
Member

We need to add a new test for this functionality to ensure that it doesn't get broken in the future. I suggest just making a slightly modified version of the nebularonezone test. Do you want me to do that part or do you want to try it?

I'm happy to try to make a new nebularonezone test that tests the functionality, as it's probably valuable for me to be a bit more competent at the code development side of things.

Ok, great. It's currently only in the develop branch, but take a look at how kilonova_2d_2dgrid_expansionopac is derived from kilonova_2d_2dgrid for an example.

@lukeshingles lukeshingles changed the title Add option to limit bf estimators Add option to limit number of BF estimators Mar 25, 2024
@lukeshingles lukeshingles self-requested a review March 25, 2024 15:48
artisoptions_nltenebular.h Outdated Show resolved Hide resolved
artisoptions_christinenonthermal.h Outdated Show resolved Hide resolved
artisoptions_classic.h Outdated Show resolved Hide resolved
artisoptions_nltewithoutnonthermal.h Outdated Show resolved Hide resolved
artisoptions_nltenebular.h Outdated Show resolved Hide resolved
globals.cc Outdated Show resolved Hide resolved
input.cc Outdated Show resolved Hide resolved
globals.h Outdated Show resolved Hide resolved
@lukeshingles lukeshingles enabled auto-merge (squash) March 25, 2024 15:49
@lukeshingles lukeshingles disabled auto-merge March 25, 2024 15:55
@lukeshingles lukeshingles enabled auto-merge (squash) March 25, 2024 16:16
@lukeshingles lukeshingles merged commit d1eec12 into main Mar 25, 2024
38 checks passed
@lukeshingles lukeshingles deleted the bfestimator branch March 25, 2024 16:19
@lukeshingles lukeshingles mentioned this pull request Mar 26, 2024
lukeshingles added a commit that referenced this pull request Apr 26, 2024
- Add VPKT_WRITE_CONTRIBS option to save all virtual packets
contributions for later time and wavelength binning, with
emission/absorption types by artistools.
- Improve performance by enabling fast-math compiler options, constexpr
radfield bins, improving bound-free estimator accumulation, removing
storage for empty cells, and using node shared memory for
initmassfracstable, elem_meanweight.

- 3D AD2 (Shingles et al. 2023) 1e8pkt 1920 core JUWELS runtime
decreased from 210k core hrs to 134k core hrs (1.56x faster). Memory per
core decreased from 2.8GB to 1.5GB.
- W7 (Shingles et al. 2022) nebular 960 core 2e9pkt 150-410d JUWELS
runtime decreased from 13.6k core hrs (20230811 version) to 10.4k core
hrs (1.31x faster). Compared to the 2022 published version with 18.6k
core hrs, this is 1.79x faster.
- With a Simpson rule integrator in place of GSL's adaptive method
(producing identical observables) runtime decreases to 6.83k core hours,
1.99x faster than 20230811 and 2.72x faster than the 2022 published
version.
- subMch Shen+2018 nebular 960 core 1e9pkt 150-410d runtime decreased
from 13.14k core hrs (20210417 version) to 7.30 kCore hours (1.80x
faster). With Simpson rule integrator, runtime decreased to 6.07 k core
hrs (2.16x faster)
- sim2010 classic mode 2-120 days 2e7 960 core JUWELS runtime decreased
from 3.81k core hrs (20230526 version) to 1.87k core hrs (2.04x faster).
Compared to ARTIS classic with 11.45k core hrs, this version is 6.13x
faster.

- Reduce memory per core to enable future GPU support. All threads now
share a single cell cache and operate on packets within the same cell.
This may later be expanded to multiple cell caches to increase thread
occupancy as memory allows.
- Fix virtual packets for models with internal empty cells. Passing into
an empty cell previously triggered instant escape.
- Fix a 1e5 factor in free-free opacity for opacity case 5 (Tanaka
Ye-dependent grey mode and line-by-line non-grey mode for kilonovae).
This increased the 3D AD2 (Shingles et al. 2023) early luminosity by
about 20%. The spectra are almost unchanged (except for being brighter)
- Reduce noise in particle deposition rate estimators by contributing
partial deposition of packet energy prior to Monte Carlo transition
event (similar to gamma deposition rates from path estimators).
- Add support for calculating and using expansion opacities.
- Add option for #52 by Gerrit Leck
- #47 by Josh Pollin
- Add some fixes to hybrid LTE-NLTE mode by Christine Collins

---------

Co-authored-by: gleck97 <86471143+gleck97@users.noreply.github.com>
Co-authored-by: Gerrit Leck <gleck@lxbuild07.gsi.de>
Co-authored-by: Josh Pollin <joshuapollin222@gmail.com>
Co-authored-by: Christine Collins <ccollins22@qub.ac.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants