-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add reader for dnest4 #391
base: master
Are you sure you want to change the base?
Conversation
@williamjameshandley Hey Will, I've added the complete output from DNest4 for a 2D-Gaussian with mean 0 and variance=unity. The prior is uniform on a box from -10 to 10. In addition to the raw files from DNest4 (levels.txt, sample.txt, sample_info.txt), For the visualization, I think levels.txt, sample.txt, sample_info.txt + posterior_sample.txt should be enough, because then it would be possible to show "live" points or the posterior. |
As a first pass, here is one (non-dynamic) way to visualise a dnest run: import numpy as np
import os
from anesthetic.plot import basic_cmap
levels_file = 'levels.txt'
sample_file = 'sample.txt'
sample_info_file = 'sample_info.txt'
weights_file = 'weights.txt'
root = 'tests/example_data/dnest4/'
levels = np.loadtxt(os.path.join(root, levels_file), dtype=float, delimiter=' ', comments='#')
samples = np.genfromtxt(os.path.join(root, sample_file), dtype=float, delimiter=' ', comments='#', skip_header=1)
sample_info = np.loadtxt(os.path.join(root, sample_info_file), dtype=float, delimiter=' ', comments='#')
weights = np.loadtxt(os.path.join(root, weights_file), dtype=float, delimiter=' ', comments='#')
n_params = samples.shape[1]
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame(np.concatenate([samples, sample_info],axis=1), columns=['x0', 'x1', 'level', 'log likelihood', 'tiebreaker', 'ID'])
df.ID = df.ID.astype(int)
df.level = df.level.astype(int)
levels = np.sort(df.level.unique())
cmap = basic_cmap('C0')
fig, axes = plt.subplots(3,3, sharex=True, sharey=True)
for j, ax in enumerate(axes.ravel()):
ls = levels[:j+1]
for i, l in enumerate(ls):
color = basic_cmap('C0')((i+1)/len(ls))
ax.plot(*df[df.level==l][['x0', 'x1']].to_numpy().T, '.', color=color)
ax.set_xticks([])
ax.set_yticks([])
ax.text(-10,10, f'{j+1}', va='top')
ax.set_xlim(-10, 10)
ax.set_ylim(-10, 10)
fig.tight_layout()
fig.set_size_inches(7,7)
fig.savefig('dnest.png') |
That example code is very useful. With it the replay of samples with a colormap is almost done. For the LX over log(X) curve, the easiest way right now is to add a function that stores the posterior weights and log(X) from DNest4. |
Hi, I've made progress and the PR is ready for review. Since NestedSamples and DiffusiveNestedSamples support different types of plots, I decided that these classes should each provide their own methods that return the supported plot types and the corresponding points. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #391 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 36 37 +1
Lines 3058 3142 +84
=========================================
+ Hits 3058 3142 +84 ☔ View full report in Codecov by Sentry. |
@williamjameshandley This PR is ready for review :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @qacwnfq, thanks for contributing to anesthetic!
I leave a more detailed review of the diffusive nested sampling stuff to @williamjameshandley. But I've left some comments inline about integration into the anesthetic API.
def n_live(self, i): | ||
""" | ||
Get live points at iteration i. | ||
|
||
Parameters | ||
---------- | ||
i: i | ||
nested sampling iteration | ||
|
||
Returns | ||
------- | ||
live points at teration i | ||
|
||
""" | ||
return self.nlive.iloc[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the need for this function? Why not directly use self.nlive.iloc[i]
?
I don't like how similarly self.n_live
is spelled compared to self.nlive
without any indication why/how they behave differently.
If a function is indeed necessary, then we should think about something along the lines of a more general get_nlive()
method with an optional kwarg iteration
(not i
) or item
(might be good to take a brief look at the standard naming for similiar things used in numpy and/or pandas).
def LX(self, beta, logX): | ||
""" | ||
Get LX, e.g., for Higson plot. | ||
|
||
Parameters | ||
---------- | ||
beta: float | ||
temperature | ||
logX: np.ndarray | ||
prior volumes | ||
|
||
Returns | ||
------- | ||
LX: np.ndarray | ||
""" | ||
LX = self.logL*beta + logX | ||
return LX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the naming of this function, it's misleading. Makes me think that it returns L * X
, which is not the case.
def plot_types(self): | ||
""" | ||
Get types of plots supported by this class. | ||
|
||
Returns | ||
------- | ||
tuple[str] | ||
""" | ||
return 'live', 'posterior' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the bigger scheme of anesthetic, this is misleading. We have used the kwargs types
and plot_type
in the past to indicate KDE plots, histograms, scatter plots, etc. These kwargs have been renamed to kind
, to unify with naming conventions in pandas and matplotlib.
I don't think this should be a method of NestedSamples
. This seems to be a more GUI specific thing, so a simple list there makes probably more sense.
def points_to_plot(self, plot_type, label, evolution, beta, base_color): | ||
""" | ||
Get samples for plotting. | ||
|
||
Parameters | ||
---------- | ||
plot_type: str | ||
see plot_types() for supported types. | ||
label: str | ||
column to plot | ||
evolution: int | ||
iteration to plot | ||
beta: float | ||
temperature | ||
base_color: | ||
base_color used to create color palette | ||
|
||
Returns | ||
------- | ||
List[array-like]: list of points to plot | ||
List[tuple[float]: colors to use | ||
""" | ||
if plot_type == 'posterior': | ||
return [self.posterior_points(beta)[label]], [base_color] | ||
elif plot_type == 'live': | ||
logL = self.logL.iloc[evolution] | ||
return [self.live_points(logL)[label]], [base_color] | ||
else: | ||
raise ValueError("plot_type not supported") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This, too, does not feel like it should be a method of NestedSamples
. Too GUI specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be cleaner to have a single subfolder dnest4
in the example_data
folder. Can we merge everything from dnest4_no_column_names
into dnest4
? dnest4
itself can have subfolders...
plotter.type.buttons.set_active(1) | ||
assert plotter.type() == 'posterior' | ||
plotter.type.buttons.set_active(0) | ||
assert plotter.type() == 'live' | ||
for i, plot_type in enumerate(samples.plot_types()): | ||
plotter.type.buttons.set_active(i) | ||
assert plotter.type() == plot_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change, although one line shorter, is not actually better. For unit tests it is better to repeat and make sure any issue can be pinned to a specific line. Loops leave unclear at which iteration in the loop an issue occured. So better to repeat in tests.
That said, excessive repeats are of course annoying to maintain. But the better way of handling that is with pytest's parametrize
options.
ns.points_to_plot('visited points', | ||
label='x1', | ||
evolution=0, | ||
beta=1, | ||
base_color='C0') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_reader.py
should be about testing the reading of files, so this and the following plotting calls are a bit out of place.
ns.points_to_plot('visited points', | ||
label='x1', | ||
evolution=0, | ||
beta=1, | ||
base_color='C0') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does not belong in test_reader.py
.
Description
This is a draft and first attempt at adding a reader for diffusive nested samples.
I've opened the PR to facilitate easier discussion of the required changes
While making this, I was contemplating what would be required to replay a diffusive nested samling run.
I've added example output to tests/example_data/dnest4.
From the dnest4 output we can easily get the likelihood levels and replay what sample lead to the construction of the level at what iteration.
Additionally, we also already have the prior compression X available and the likelihoods.
Because we don't really track dead and live partices, I think the best approach would be a specialized class DiffusiveNestedSamples.
Hopefully, we could still achieve a similar interface to NestedSamples, in order to reuse the gui.
The only issue I think we can not solve within anesthetic, is that diffusive nested sampling is allowed to correct the level spacing in the second phase.
Therefore, we would not be able to "exactly" replay what the algorithm did.
Here, the solution would be to store the phases of diffusive nested sampling separately. I'm not sure this is absolutely required.
Fixes # (issue)
Checklist:
flake8 anesthetic tests
)pydocstyle --convention=numpy anesthetic
)python -m pytest
)