Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge_samples_weighted does not reset_index which breaks sample compression #189

Closed
Stefan-Heimersheim opened this issue May 31, 2022 · 5 comments · Fixed by #191
Closed
Labels
bug Something isn't working

Comments

@Stefan-Heimersheim
Copy link
Collaborator

Describe the bug
merge_samples_weighted 2 data sets with N samples each produces indices 0,1,2,...,N,0,1,2,...N rather than 0,1,2,...,2N.

To Reproduce

import anesthetic
import numpy as np

prior_samples = []
for i in range(51):
	d = {"x": np.random.uniform(size=1000),
	     "y": np.random.uniform(size=1000)}
	tmp = anesthetic.MCMCSamples(d)
	prior_samples.append(tmp)

merge_prior = anesthetic.samples.merge_samples_weighted(prior_samples, weights=np.ones(51))
merge_prior.plot_2d(["x", "y"])

gives IndexError: index 5330 is out of bounds for axis 0 with size 1000 in triangular_sample_compression_2d:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/stefan/core/data/phd-data/python_environments/hera-system/lib/python3.10/site-packages/anesthetic/samples.py", line 344, in plot_2d
    self.plot(ax_, x, y, plot_type=plot_type, *args, **lkwargs)
  File "/home/stefan/core/data/phd-data/python_environments/hera-system/lib/python3.10/site-packages/anesthetic/samples.py", line 213, in plot
    return kde_contour_plot_2d(ax, x, y, weights=self.weights,
  File "/home/stefan/core/data/phd-data/python_environments/hera-system/lib/python3.10/site-packages/anesthetic/plot.py", line 818, in kde_contour_plot_2d
    tri, w = triangular_sample_compression_2d(data_x, data_y, cov,
  File "/home/stefan/core/data/phd-data/python_environments/hera-system/lib/python3.10/site-packages/anesthetic/utils.py", line 359, in triangular_sample_compression_2d
    np.add.at(w_, k[:, i], w[j != -1]/3)
IndexError: index 23211 is out of bounds for axis 0 with size 1000

This can be solved by adding a reset_index() to the merge, re-indexing the samples correctly

merge_prior = anesthetic.samples.merge_samples_weighted(prior_samples, weights=np.ones(51)).reset_index()

As soon as I get time I will make a PR to fix this bug.

Note: We use df.append in merge_samples_weighted which is now deprecated and we should replace that function as well.

@Stefan-Heimersheim Stefan-Heimersheim added the bug Something isn't working label May 31, 2022
@Stefan-Heimersheim
Copy link
Collaborator Author

There's another problem: reset_index() resets merge_prior.weights to 1, while merge_prior["weights"] is still correct. As quick fix I am using merge_prior.weights = merge_prior["weights"] now.

@williamjameshandley
Copy link
Collaborator

williamjameshandley commented May 31, 2022

Hi @Stefan-Heimersheim. The weights problem looks like you are using quite an old version of anesthetic -- do you get this problem if you use the master branch? You can get this by installing:

pip install anesthetic==2.0.0b11
#or
pip install pip install git+https://github.com/williamjameshandley/anesthetic@master

(We plan to properly release anesthetic 2.0.0 over the summer once we've finalised all the interface-breaking changes).

I agree that we should replace the df.append functions

@Stefan-Heimersheim
Copy link
Collaborator Author

I appear to have the latest version:

pip freeze | grep anesthetic
anesthetic @ git+https://github.com/williamjameshandley/anesthetic@5430098069d2fd4d33e0a0689a55b42563ad8f71

I just did pip3 install anesthetic==2.0.0b11 on a different machine (calx) and ran my MWe and still got an IndexError

@williamjameshandley
Copy link
Collaborator

williamjameshandley commented Jun 7, 2022

I've had a chance to look into this properly -- you are right that it does occur on the latest version. The "weights" column from the reindexing threw me a little (since weights used to be stored as a column, rather than as an index).

If we want to go with reindexing it's a little bit more involved to ensure that the weights are passed on correctly. Here is the correct code:

merge_prior = anesthetic.samples.merge_samples_weighted(prior_samples, weights=np.ones(51))
merge_prior = merge_prior.reset_index().set_index("weights", append=True).drop(columns="#")
merge_prior.index.set_names(["#", "weights"], inplace=True)

However, I don't think this is the right thing to do. One might want to preserve the original index labelling, and the plotting commands shouldn't be failing here.

The correct thing to do is to put

x = np.array(x)
y = np.array(y)

at the start of anesthetic.utils.triangular_sample_compression_2d, analogous to the guard in anesthetic.utils.triangular_sample_compression_1d.

If you agree, please go ahead and submit a PR with this minor change. Here is an even more minimal test which reproduces this error, which passes after the fix:

samples = anesthetic.MCMCSamples(np.random.rand(1000,2),columns=["x","y"],index=np.random.randint(0,100,1000))
samples.plot_2d(["x", "y"])

Ideally we would run the entire test suite through with the assumption that the "#" label is not unique/ordered, but It's not obvious to me how we could do this easily.

@Stefan-Heimersheim
Copy link
Collaborator Author

If we want to go with reindexing it's a little bit more involved to ensure that the weights are passed on correctly.

Ah right, I had just added an merge_prior.weights = merge_prior["weights"].

If you agree, please go ahead and submit a PR with this minor change

Will do!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants