Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce utils.simulate() #348

Open
wants to merge 4 commits into
base: dev
Choose a base branch
from
Open

Introduce utils.simulate() #348

wants to merge 4 commits into from

Conversation

vallis
Copy link
Collaborator

@vallis vallis commented May 17, 2023

simulate will draw random residuals for all pulsars in pta by sampling all white-noise and GP objects for parameters params.

  • Requires GPs to have combine=False, and will run faster with GP ECORR.
  • If pta includes a TimingModel, that should be created with a small prior_variance. Note that the variance applies to the entire design-matrix vector, and should be multiplied by len(psr.toas) if it is understood as the variance per individual residual.
  • This function can be used with utils.set_residuals to replace residuals in a Pulsar object.
  • This function can also be used with parameter.sample(pta.params) to sample from the prior.
  • Note that any PTA built from that Pulsar may nevertheless cache residuals internally, so it is safer to rebuild the PTA with the modified Pulsar.

@codecov
Copy link

codecov bot commented May 17, 2023

Codecov Report

Attention: 25 lines in your changes are missing coverage. Please review.

Comparison is base (c49646c) 88.37% compared to head (ab7ef2b) 87.97%.
Report is 54 commits behind head on dev.

❗ Current head ab7ef2b differs from pull request most recent head a98e6c7. Consider uploading reports for the commit a98e6c7 to get more accurate results

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##              dev     #348      +/-   ##
==========================================
- Coverage   88.37%   87.97%   -0.41%     
==========================================
  Files          13       13              
  Lines        3012     3051      +39     
==========================================
+ Hits         2662     2684      +22     
- Misses        350      367      +17     
Files Coverage Δ
enterprise/signals/gp_signals.py 90.68% <100.00%> (ø)
enterprise/signals/signal_base.py 89.84% <86.44%> (-0.27%) ⬇️
enterprise/signals/utils.py 83.94% <59.52%> (-2.52%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c49646c...a98e6c7. Read the comment docs.

@vhaasteren
Copy link
Member

vhaasteren commented May 18, 2023

I have a semi-related question because you are introducing prior_variance in TimingModel. I have also run into the need to use values for the prior_variance for the timing model parameters. Either for simulating data, but I also need it for analysis purposes. Problem is that the prior on timing model parameters is highly specific for each timing model parameter, and we have different parameters per pulsar. I ran into this challenge also when writing code to solve pulsars

What I don't like about using prior_variance in this particular way is that it is not flexible enough (I think, it's hard to see what BasisGP really can accept for priorFunction) to tailor the prior per timing model parameter per pulsar. I have an application in mind that requires that, so I have an interest in getting it right from the start.

In my previous work, my solution was to use the information in the par-files for this. The uncertainty column in the par-file is currently only used as output, even though libstempo and PINT do read it in. Actually, I'm not sure whether PINT can use it for anything else. But it's a perfect place in the par-file to write the prior width as input. I mean, the parameter values in the par-file are in reality also just the means of the prior in the analysis. People just haven't been thinking about it like that, but that's what it is. I'd also love to have a discussion about this with the timers. Ideally the timing group(s) would create data releases with more physical priors included on all parameters.

So my suggestion is to use the uncertainty column in the par-file to specify the prior width.

@vhaasteren
Copy link
Member

Two more questions:

  1. Regarding caching of residuals. When doing non-linear timing models, you can't cache the residuals like normal either. Those come from the timing package when you change parameters. How (and where?) is that implemented? Somehow we should be able to tell the caching descriptor that the residuals have changed and related quantities like TNr should be recalculated.

  2. Will it still be faster with GP ECORR if you use my FastShermanMorrison code? I saw somewhere in here that it depends on the rNr solve of ShermanMorrison, which is super slow

@vhaasteren vhaasteren changed the base branch from master to dev November 7, 2023 09:00
@vhaasteren vhaasteren self-requested a review November 7, 2023 09:07
Copy link
Member

@vhaasteren vhaasteren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general looks ok. I provided minor code feedback.

Question:

In the unit tests, right now the returned residuals are not actually checked, which makes the tests incomplete. Ideally we would generate an ensemble of realizations and compare that to the covariance matrix. That would be most realistic. It also sounds less feasible.

Also, my question regarding the timing model parameter priors still stand. IMO those should actually come from the timing package. I don't want to make that a show-stopper though.

Any thoughts?

inv = sl.cho_solve(cf, np.identity(cf[0].shape[0]))
if logdet:
ld = 2.0 * np.sum(np.log(np.diag(cf[0])))
except np.linalg.LinAlgError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a # pragma: no cover here?

gpresiduals.append(0)
elif phi.ndim == 1:
gpresiduals.append(np.dot(fmat, np.sqrt(phi) * np.random.randn(phi.shape[0])))
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a # pragma: no cover here

for delay, ndiag in zip(delays, ndiags):
if ndiag is None:
whiteresiduals.append(0)
elif isinstance(ndiag, ShermanMorrison):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a good idea to check for an instance of ShermanMorrison. When 'fastshermanmorrison' is used, this will be a different type. Instead, perhaps duck typing can be used, like:

if all(hasattr(ndiag, attr) for attr in ['_nvec', '_jvec', '_slices']):

cf = cholesky(sps.csc_matrix(phis))
gp = np.zeros(phis.shape[0])
gp[cf.P()] = np.dot(cf.L().toarray(), np.random.randn(phis.shape[0]))
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a #pragma: no cover

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants