Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Features/exact #712

Merged
merged 10 commits into from
Jun 3, 2023
Merged

Features/exact #712

merged 10 commits into from
Jun 3, 2023

Conversation

matthewghgriffiths
Copy link
Collaborator

Updating graphical api to allow better exact updates.

Added graphical/regression/test_exact.py to test exact EP updates.

In that case an exact linear regression and exact probit update is used to build a linear classification model.

@@ -215,7 +215,6 @@ def factor_approximation(self, factor: Factor) -> FactorApproximation:
for v in factor_dist.all_variables
}).prod(
*factor_mean_field.values(),
default=factor_dist,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes incorrect initialisation of the cavity dist if a variable is used by only one factor.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's quite common in our usage for variables to only be used by a single factor. Many factors have 'dangling' variables which are not shared

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean you've fixed that issue or newly introduced it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I fixed it.

The way that you solved the dangling variable problem was causing variance shrinkage in the case of dangling variables.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way the dangling variable has no cavity distribution associated with it in the FactorApproximation which is fine for a lot of factor optimisations.

If you require access to the cavity distribution as a prior then you shouldn't have a dangling variable. (The prior should be present as a second factor connected to the variable).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually yeah I guess every variable at least has a PriorFactor attached and so would be shared.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this may of partly also been responsible for the variance shrinkage in my tests, as each 1D Gaussian dataset had 2 parameters which were only tied to one factor?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test for this happening would be to run a factor optimisation multiple times in a row for the same factor, if the variance shrinks each step, then you're doing something like this.

@codecov
Copy link

codecov bot commented Jun 1, 2023

Codecov Report

Merging #712 (a53e3bb) into main (560ecca) will increase coverage by 0.22%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main     #712      +/-   ##
==========================================
+ Coverage   82.00%   82.23%   +0.22%     
==========================================
  Files         180      180              
  Lines       13264    13305      +41     
==========================================
+ Hits        10877    10941      +64     
+ Misses       2387     2364      -23     

see 9 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Owner

@rhayes777 rhayes777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it only failed because of a drop in code coverage. Is there a test or two you could write? I think the paths stuff is probably not working properly but it would be a bit trickier to write a decent test for that as you would have to mock up the optimisers.

Also some docs would be nice

Otherwise looks good

self.paths = paths
if self.paths:
for optimiser in self.factor_optimisers.values():
optimiser.paths = self.paths
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want each factor_optimiser to have a different paths to ensure that they do not clash.

If multiple DynestyStatic had the same paths, for example, we might see them loading each other's optimisation state.

You could use the create_child method and pass the name of each factor.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what paths are supposed to do but it was breaking some of my code, so I put in code paths so that it became optional. In a later commit I've set the paths attribute to be optionally set if it doesn't exist.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually had to revert the change, if you don't have this line test_autofit/graphical/info/test_output.py fails.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's basically an object that handles saving and loading data for optimisations. We used to have path manipulation stuff litered all over the optimiser classes so we put it all in one place. It can also conveniently be swapped out for a database client so data is persisted directly into a SQL database rather than being saved in a directory structure.

@@ -89,6 +90,9 @@ def __init__(
self.log_norm = log_norm
self._plates = self.sorted_plates if plates is None else plates

def copy(self) -> "MeanField":
return type(self)(self)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A type of self of self!

Comment on lines -218 to +244
return VariableData({v: dist.mean for v, dist in self.items()})
return self.attr("mean")

@property
def variance(self):
return VariableData({v: dist.variance for v, dist in self.items()})
return self.attr("variance")

@property
def std(self):
return VariableData({v: dist.std for v, dist in self.items()})
return self.attr("std")

@property
def scale(self):
return VariableData({v: dist.scale for v, dist in self.items()})
return self.attr("scale")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

cavity_dist = factor_approx.cavity_dist
cavity_dist = factor_approx.cavity_dist.copy()
for v in factor_approx.mean_field.keys() - cavity_dist:
cavity_dist[v] = factor_approx.mean_field[v].zeros_like()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhayes777

For exact_fits of FactorApproximations we need the full cavity_dist, so in this case we need to create zeros_like versions of the missing cavity_dists - e.g. for the Normal Distribution the zeros_like dist has infinite variance, note that its logpdf will be - inf so we don't want to evaluate it in general.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Could you write something along those lines in the code docs?

@matthewghgriffiths
Copy link
Collaborator Author

I don't know why it failed. It works on my machine...

@matthewghgriffiths matthewghgriffiths merged commit cd81ab3 into main Jun 3, 2023
@Jammy2211 Jammy2211 deleted the features/exact branch August 6, 2023 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants