Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-207: simplified output format #211

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

gh-207: simplified output format #211

wants to merge 6 commits into from

Conversation

ntessore
Copy link
Contributor

@ntessore ntessore commented Oct 24, 2024

Simplifies the data format for Heracles' results.

Multiple results are now produced as a dictionary with keys corresponding to the input fields:

{
    ("POS", "POS", 1, 1): ...,
    ("POS", "SHE", 1, 1): ...,
    ("SHE", "SHE", 1, 1): ...,
    ...
}

For the output from angular_power_spectra(), the individual entries are

  • 1D arrays with shape (lmax + 1,) for TT,
  • 2D arrays with shape (2, lmax + 1) for TE and TB,
  • 2D arrays with shape (3, lmax + 1) for EE, BB, EB of an auto-correlation (where EB = BE), and
  • 2D arrays with shape (4, lmax + 1) for EE, BB, EB, BE of a cross-correlation (where EB ≠ BE).

For the output from mixing_matrices(), the individual entries are (in the notation of Brown, Castro & Taylor 2005):

  • the $M^{TT,TT}$ mixing matrix for scalar x scalar,

  • the $M^{TE,TE} = M^{TB,TB}$ mixing matrix for scalar x spin,

  • the stack of mixing matrices

    • $M^{EE,EE} = M^{BB,BB}$,
    • $M^{EE,BB} = M^{BB,EE}$,
    • $M^{EB,EB}$

    for spin x spin.

In practice, this more compact storage of results, with everything for one combination of fields in one array, is generally more convenient (see example notebook).

Internally, individual results are stored as the new heracles.Result class, with is a subclass of numpy's ndarray, and decays into a plain numpy array under almost all operations. Arrays of Result type have optional properties:

  • result.ell -- None or the array of angular mode numbers for the result.
  • result.axis -- None or the angular axis corresponding to ell.
  • result.lower -- None or the lower bound of ell for binned results.
  • result.upper -- None or the upper bound of ell for binned results.
  • result.weight -- None or the weight in each bin for binned results.

This means that angular power spectra, mixing matrices, and their binned versions all use the exact same data type -- no more case distinction for structured arrays.

Consequently, it is no longer necessary to have separate functions for binning. Both spectra and mixing matrices can use heracles.binned().

The same is true for reading and writing results to file: both spectra and mixing matrices can be read using heracles.read() and written using heracles.write().

Finally, the new data type is flexible enough to support multiple ell axes for covariance matrices, as well as future use cases for bispectra and trispectra.

Closes: #207

@ntessore ntessore marked this pull request as ready for review November 25, 2024 13:37
@ntessore ntessore requested review from JaimeRZP and ucapbba November 25, 2024 13:37
Copy link
Contributor

@JaimeRZP JaimeRZP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I am going to try to plug this into the covariance code to see how this works in practice.

# twopoint
"angular_power_spectra",
"debias_cls",
"mixing_matrices",
"bin2pt",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like merging all of these functions into a single one: binned, write, read

# get lower bounds or create default
lower = getattr(result, "lower", None)
if lower is None:
lower = ell
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not similar to how upper is defined below?
Does it make sense for this to be just ell?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea was that in the absence of information, the ell bins are just the half-open intervals between the given ells: $[\ell_i, \ell_{i+1})$. The upper bound then needs an invented maximum upper bound.

arr = np.asanyarray(arr)
def _write_result(fits, ext, key, result):
"""
Write a result array to FITS.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define fits, ext , key and result. It is for example not clear what ext does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that we need to fix the documentation at some point

If the output file exists, the new estimates will be appended, unless the
``clobber`` parameter is set to ``True``.

def write(path, results, *, clobber=False):
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would again define the inputs, specially for public functions like write or read

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, needs to happen soon!

out.upper = bins[1:]

# set the binned weight
out.weight = wb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is ok for this class to overwrite the value of weight?
At the moment a user might provide a str and get an array back.
I understand that this is the array created from the string but it seems dangerous.
I would honestly just remove the srt option.

Returns true if *alm* has a non-zero spin weight and a leading axis of size
2, false otherwise.
"""
md = alm.dtype.metadata or {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does or {} do here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no metadata was set, dtype.metadata is None, not an empty dictionary. a or b returns a if a is truthy, or b if not. So if dtype.metadata is None, it's not truthy, in which case md is set to an empty dictionary. A more explicit way to make sure md is a dictionary would be:

md = alm.dtype.metadata if alm.dtype.metadata is not None else {}


if weight is None:
w = np.ones_like(ell)
elif isinstance(weight, str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again I find this string for weights very elegant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not very elegant, presumably?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update output formats
2 participants