gh-207: simplified output format #211

ntessore · 2024-10-24T21:39:04Z

Simplifies the data format for Heracles' results.

Multiple results are now produced as a dictionary with keys corresponding to the input fields:

{
    ("POS", "POS", 1, 1): ...,
    ("POS", "SHE", 1, 1): ...,
    ("SHE", "SHE", 1, 1): ...,
    ...
}

For the output from angular_power_spectra(), the individual entries are

1D arrays with shape (lmax + 1,) for TT,
2D arrays with shape (2, lmax + 1) for TE and TB,
2D arrays with shape (3, lmax + 1) for EE, BB, EB of an auto-correlation (where EB = BE), and
2D arrays with shape (4, lmax + 1) for EE, BB, EB, BE of a cross-correlation (where EB ≠ BE).

For the output from mixing_matrices(), the individual entries are (in the notation of Brown, Castro & Taylor 2005):

the $M^{TT,TT}$ mixing matrix for scalar x scalar,
the $M^{TE,TE} = M^{TB,TB}$ mixing matrix for scalar x spin,
the stack of mixing matrices
- $M^{EE,EE} = M^{BB,BB}$,
- $M^{EE,BB} = M^{BB,EE}$,
- $M^{EB,EB}$
for spin x spin.

In practice, this more compact storage of results, with everything for one combination of fields in one array, is generally more convenient (see example notebook).

Internally, individual results are stored as the new heracles.Result class, with is a subclass of numpy's ndarray, and decays into a plain numpy array under almost all operations. Arrays of Result type have optional properties:

result.ell -- None or the array of angular mode numbers for the result.
result.axis -- None or the angular axis corresponding to ell.
result.lower -- None or the lower bound of ell for binned results.
result.upper -- None or the upper bound of ell for binned results.
result.weight -- None or the weight in each bin for binned results.

This means that angular power spectra, mixing matrices, and their binned versions all use the exact same data type -- no more case distinction for structured arrays.

Consequently, it is no longer necessary to have separate functions for binning. Both spectra and mixing matrices can use heracles.binned().

The same is true for reading and writing results to file: both spectra and mixing matrices can be read using heracles.read() and written using heracles.write().

Finally, the new data type is flexible enough to support multiple ell axes for covariance matrices, as well as future use cases for bispectra and trispectra.

Closes: #207

JaimeRZP

LGTM! I am going to try to plug this into the covariance code to see how this works in practice.

JaimeRZP · 2024-12-16T11:31:58Z

heracles/__init__.py

    # twopoint
    "angular_power_spectra",
    "debias_cls",
    "mixing_matrices",
-    "bin2pt",


I like merging all of these functions into a single one: binned, write, read

JaimeRZP · 2024-12-16T11:34:53Z

heracles/io.py

+    # get lower bounds or create default
+    lower = getattr(result, "lower", None)
+    if lower is None:
+        lower = ell


Why is this not similar to how upper is defined below?
Does it make sense for this to be just ell?

My idea was that in the absence of information, the ell bins are just the half-open intervals between the given ells: $[\ell_i, \ell_{i+1})$. The upper bound then needs an invented maximum upper bound.

JaimeRZP · 2024-12-16T11:36:57Z

heracles/io.py

-    arr = np.asanyarray(arr)
+def _write_result(fits, ext, key, result):
+    """
+    Write a result array to FITS.


Define fits, ext , key and result. It is for example not clear what ext does.

Agreed that we need to fix the documentation at some point

JaimeRZP · 2024-12-16T11:42:20Z

heracles/io.py

-    If the output file exists, the new estimates will be appended, unless the
-    ``clobber`` parameter is set to ``True``.
-
+def write(path, results, *, clobber=False):
    """


I would again define the inputs, specially for public functions like write or read

Yes, needs to happen soon!

JaimeRZP · 2024-12-16T11:53:18Z

heracles/result.py

+    out.upper = bins[1:]
+
+    # set the binned weight
+    out.weight = wb


is ok for this class to overwrite the value of weight?
At the moment a user might provide a str and get an array back.
I understand that this is the array created from the string but it seems dangerous.
I would honestly just remove the srt option.

JaimeRZP · 2024-12-16T11:55:23Z

heracles/twopoint.py

+    Returns true if *alm* has a non-zero spin weight and a leading axis of size
+    2, false otherwise.
+    """
+    md = alm.dtype.metadata or {}


what does or {} do here?

If no metadata was set, dtype.metadata is None, not an empty dictionary. a or b returns a if a is truthy, or b if not. So if dtype.metadata is None, it's not truthy, in which case md is set to an empty dictionary. A more explicit way to make sure md is a dictionary would be:

md = alm.dtype.metadata if alm.dtype.metadata is not None else {}

JaimeRZP · 2024-12-16T12:19:09Z

tests/test_result.py

+
+    if weight is None:
+        w = np.ones_like(ell)
+    elif isinstance(weight, str):


Again I find this string for weights very elegant

Not very elegant, presumably?

ntessore mentioned this pull request Oct 24, 2024

Create functions for mixing matrix application #109

Open

simplified format for results

96436fe

ntessore force-pushed the nt/gh-207 branch from 0ffb8c1 to 96436fe Compare November 22, 2024 21:12

ntessore added 5 commits November 22, 2024 21:26

fix syntax error for Python 3.9 and 3.10

1f94b11

fix another syntax error for Python 3.9 and 3.10

8a2b477

restore include/exclude parameters for spectra

7769902

Merge branch 'main' into nt/gh-207

89a5b4f

some readiness for multi-ell results

31b0758

ntessore marked this pull request as ready for review November 25, 2024 13:37

ntessore requested review from JaimeRZP and ucapbba November 25, 2024 13:37

JaimeRZP reviewed Dec 16, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-207: simplified output format #211

gh-207: simplified output format #211

ntessore commented Oct 24, 2024 •

edited

Loading

JaimeRZP left a comment

JaimeRZP Dec 16, 2024

JaimeRZP Dec 16, 2024

ntessore Dec 16, 2024

JaimeRZP Dec 16, 2024

ntessore Dec 16, 2024

JaimeRZP Dec 16, 2024

ntessore Dec 16, 2024

JaimeRZP Dec 16, 2024

JaimeRZP Dec 16, 2024

ntessore Dec 16, 2024

JaimeRZP Dec 16, 2024

ntessore Dec 16, 2024

gh-207: simplified output format #211

Are you sure you want to change the base?

gh-207: simplified output format #211

Conversation

ntessore commented Oct 24, 2024 • edited Loading

JaimeRZP left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ntessore commented Oct 24, 2024 •

edited

Loading