Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix size-zero mass matrix #3071

Merged
merged 2 commits into from
Jul 25, 2024
Merged

Conversation

nhuurre
Copy link
Contributor

@nhuurre nhuurre commented Oct 20, 2021

Submission Checklist

  • Run unit tests: ./runTests.py src/test/unit
  • Run cpplint: make cpplint
  • Declare copyright holder and open-source license: see below

Summary

Fixes a couple of problems with size-zero dense or diagonal mass matrix in (nonadaptive) HMC samplers. (The unit_e metric was already fine.)
Does not fix step size adaptation so zero-parameter models still cannot use adaptive samplers. Adaptive samplers also work. Step size adaptation simply sets the step size to NaN; the step size has no effect anyway.

Intended Effect

Allow HMC to run even if a model has no parameters.

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): Niko Huurre

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.58 3.54 1.01 1.09% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.97 -2.71% slower
eight_schools/eight_schools.stan 0.09 0.09 0.99 -1.29% slower
gp_regr/gp_regr.stan 0.15 0.14 1.04 3.53% faster
irt_2pl/irt_2pl.stan 5.09 5.15 0.99 -1.24% slower
performance.compilation 91.7 90.53 1.01 1.27% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.7 8.05 1.08 7.45% faster
pkpd/one_comp_mm_elim_abs.stan 30.59 29.83 1.03 2.5% faster
sir/sir.stan 119.65 118.55 1.01 0.92% faster
gp_regr/gen_gp_data.stan 0.03 0.03 1.04 4.13% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.97 2.97 1.0 0.19% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.38 0.37 1.02 1.73% faster
arK/arK.stan 2.8 2.09 1.34 25.33% faster
arma/arma.stan 0.27 0.23 1.21 17.05% faster
garch/garch.stan 0.65 0.57 1.13 11.33% faster
Mean result: 1.05732252726

Jenkins Console Log
Blue Ocean
Commit hash: e138b71


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@betanalpha
Copy link
Contributor

As discussed in stan-dev/cmdstan#1054 the zero-dimensional boundary is not well-defined. The changes here are fine if assuming an empty-set boundary behavior but not a single element or multiple finite element set boundary behavior (in which case there would be one element to the metric in each case with a somewhat arbitrary value). My strong preference is to avoid this ambiguity by defining the HMC implementations only for N > 0.

@nhuurre
Copy link
Contributor Author

nhuurre commented Nov 1, 2021

I think you mean "as discussed in stan-dev/cmdstan#1030 (comment)"

Anyway, the only assumption you need to make is what is manifest in the code: the point in the unconstrained parameter space is represented as Eigen::VectorXd. That means the only possible zero-dimensional space is the one with a single point. And the current sampler code handles that correctly (i.e. an expensive no-op).

@rok-cesnovar
Copy link
Member

How do we resolve this one? To me, the changes look good, but I am not confident in my knowledge of the MCMC code.

Reversing the no-parameters hack in CmdStan and without this PR, models like this one https://github.com/stan-dev/stat_comp_benchmarks/blob/master/benchmarks/gp_regr/gen_gp_data.stan error in an ugly way if run with the default CmdStan arguments (which is a model we also run in out test suite). The error is:

https://github.com/stan-dev/stat_comp_benchmarks/blob/master/benchmarks/gp_regr/gen_gp_data.stan
Iteration:  300 / 2000 [ 15%]  (Warmup)
Iteration:  400 / 2000 [ 20%]  (Warmup)
Iteration:  500 / 2000 [ 25%]  (Warmup)
Iteration:  600 / 2000 [ 30%]  (Warmup)
Iteration:  700 / 2000 [ 35%]  (Warmup)
Iteration:  800 / 2000 [ 40%]  (Warmup)
Iteration:  900 / 2000 [ 45%]  (Warmup)
Iteration: 1000 / 2000 [ 50%]  (Warmup)

Assertion failed: (index >= 0 && index < size()), function operator(), file stan/lib/stan_math/lib/eigen_3.3.9/Eigen/src/Core/DenseCoeffsBase.h, line 425.

We all agree the current Cmdstan auto switch to fixed param is not great, this error is also bad.

I guess the two options are:
a) this PR
b) stop CmdStan in case of no parameters and HMC samplers?

@nhuurre
Copy link
Contributor Author

nhuurre commented Nov 9, 2021

How do we resolve this one?

Ideally, we'll wait for @betanalpha to figure it out. There's still plenty of time before the next release, I'm sure he can do it.

b) stop CmdStan in case of no parameters and HMC samplers?

That's what the behaviour used to be. It was annoying stan-dev/cmdstan#953 for IMHO no good reason.

We also have the option
c) Change the services to invoke the fixed_param sampler through services::util::run_sampler(), like it does all the other samplers. Then the output should be uniform enough for the interfaces to parse even if the sampler info is technically wrong.

@betanalpha
Copy link
Contributor

betanalpha commented Nov 10, 2021 via email

@betanalpha
Copy link
Contributor

betanalpha commented Nov 10, 2021 via email

@nhuurre
Copy link
Contributor Author

nhuurre commented Nov 10, 2021

As mentioned in the other thread there are multiple ways to try to define R^{0}, for example as any countable subset of R^{1}

As mentioned in the other thread R^N is defined (up to isomorphism) as the N-dimensional real vector space.
General N-dimensional topological manifolds are usually defined to be topological sets that are locally homeomorphic to R^N so you need to know what R^N is (up to homeomorphism) before you can even talk about arbitrary N-manifolds.

Finite subsets of R^1 (or R^2, or R^10) certainly are 0-dimensional spaces but arbitrary countable subsets are not. For instance, Q is a countable subset of R that, while totally disconnected, is not discrete.
Or if you're going to ignore the induced subspace topology and just impose the discrete topology then why limit it to countable subsets?
Or maybe your idea of "zero dimensional" means Hausdorff dimension? Not a topological invariant but ok...

Regardless all of those definitions would be represented by an Eigen::VectorXd with one element not zero elements.

It feels like you're arguing against not the actual code changes here but some convoluted misfeature you've invented in your head. There is no attempt to add one-element representation for empty models.

The unconstrained parameters are realized as an Eigen::VectorXd with model.num_params_r() elements.

base_hmc(const Model& model, BaseRNG& rng)
: base_mcmc(),
z_(model.num_params_r()),

class ps_point {
public:
explicit ps_point(int n) : q(n), p(n), g(n) {}
Eigen::VectorXd q;

The sampler has always worked correctly when model.num_params_r() >= 1 and no changes are proposed for that case.
But the Stan language allows you to define a model with model.num_params_r() == 0.
Examples:

parameters {
}
parameters {
  vector[0] x;
}
parameters {
  simplex[1] x;
}

These models require fixed_param sampler because HMC code assumes that the unconstrained parameter vector has at least one element.
This PR only removes that assumption. The change affects only models with model.num_params_r() == 0. These models still have size zero unconstrained parameters. The first two models also produce size-zero output, i.e. only internal sampler parameters are printed. The third model produces a single output parameter x[1] whose value is defined by the simplex_constrain transform.


Another "zero-dimensional" model you may want to consider is

parameters {
  unit_vector[1] x;
}

The constrained parameter space is zero-dimensional but, because the unit_vector transform is many-to-one, this model actually has num_params_r() == 1 and consequently the PR makes no difference for this model. Nevertheless, it may be reassuring to know that sampling fails:

Exception: Found dimension size one in unit vector declaration. One-dimensional unit vector is discrete
but the target distribution must be continuous. variable=x; dimension size expression=1
(in 'unit_vec.stan', line 2, column 14 to column 15)

@nhuurre
Copy link
Contributor Author

nhuurre commented Dec 16, 2021

@betanalpha
I'd like to get this resolved before the next release, January 17th. If you don't have time to review this can you recommend someone else?

Also, if you insist that stan-dev/cmdstan#992 should be reverted instead then I have another question for you.
That PR removed an error message that said

Must use algorithm=fixed_param for model that has no parameters.

Do you think this message is adequate? As far as I can tell, your argument about alternative topologies for R^0 is a problem not only for HMC but also for sampling with fixed_param. It then seems to me that unconditional advice to use fixed_param is not appropriate. How should the user verify that fixed_param sampler works as desired for their model?

Copy link
Member

@mitzimorris mitzimorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@betanalpha
Copy link
Contributor

betanalpha commented Dec 21, 2021 via email

@nhuurre
Copy link
Contributor Author

nhuurre commented Dec 22, 2021

“Hamiltonian Monte Carlo has not been implemented for programs with no parameters.
The ‘fixed_param’ sampler will accept this program, evaluating all other blocks.”

There's a potentially confusing difference between "no constrained parameters" and "no unconstrained parameters". A model with simplex[1] in the parameters block arguably has a parameter (and fixed_param sampler does evaluate the block) but has no unconstrained parameters and therefore would not implement HMC.

For example what is the inverse metric for R^{0}?

It's a 0-by-0 matrix, ie. the matrix with no elements. Indeed, this PR is titled "Fix size-zero mass matrix" because originally the only change was to fix reading and printing a metric with no elements; and currently the only other change is that (in zero dimensions) stepsize initialization blows up the stepsize immediately without erroring. (The sampler still tries to adapt the stepsize as usual but that of course has no effect anymore. In positive dimensions, undefined stepsize would make every transition an instant divergence but in zero dimensions the integrator doesn't even notice.)
No changes to the inference algorithm are necessary; the expected behaviour follows naturally from Eigen's definition of zero-dimensional vector arithmetic and I take that to mean this extension to zero-dimensional HMC is reasonably "canonical."

since there is still a state (an arbitrary point in R) it should be output after every iteration; in order to implement this a default definition/parameterization/name of R^{0} would be required.

I think this is the crux of your confusion. You're thinking of R^0 as a subspace of R^1 and that does lead to ambiguity. But a similar ambiguity afflicts R^1 if you think of it only as an arbitrary subspace of R^2, and yet obviously that ambiguity is irrelevant. I'm saying that R^0, like R^1, can just be its own thing without the need to be embedded in some larger space.
The state/output after every iteration is empty but ... see the next point:

In my opinion having no metric or state is compatible not with the modeling configuration space being R^{0} but rather the empty set. Indeed I think that most people interpret a program with an empty parameter block as defining an empty model configuration space.

There's an important difference between

  1. the configuration space is the empty set (ie contains nothing)
  2. the configuration space contains the empty set (as a member, not a subset)

The configuration space is the set of all possible draws; a single draw from a model is a finite "set" of numbers. (a singleton for R^1, a pair for R^2, a triple for R^3, and, yes, the empty set for R^0)

In case (1) the configuration space has no members and so no draw is allowed. If we try to write a prior sampling model it must be

generated quantities {
    reject("No valid sample can exist.");
}

In case (2) the empty set is a valid draw and sampling from the prior is possible:

generated quantities {
    row_vector[0] x = []; // a sample exists, but is empty
}

The situation with an empty block is identical to (2), not (1). (If it was (1) then any model without a generated quantities block would be ill-formed.)

Alternatively, let's note that concatenating two models creates a configuration space that is the Cartesian product the of configuration spaces of the (sub)models.
The product of empty set with any set is always empty; and every set is isomorphic to its product with a singleton set. As theorems about configuration spaces these translate to observations about models: a single reject() anywhere in the model rejects the whole model; but adding an empty list of statements does not change the model.

@betanalpha
Copy link
Contributor

betanalpha commented Dec 24, 2021 via email

@nhuurre
Copy link
Contributor Author

nhuurre commented Dec 24, 2021

Yeah, although the simplex and correlation matrix types are the only ones with this property, right?

Yes. The other weird one is unit_vector[1] but that fails in a different way.

I don’t think that ambiguity is irrelevant for N >= 1 at all; it’s the main source of confusion people have with the real numbers.

Irrelevant in the sense that we don't argue about whether HMC can be extended to R^1 in a canonical way.

My issues is that multiple interpretations for R^{0} have been thrown around

Thrown around only by you. As I think I've already said this PR (and stan-dev/cmdstan#1054) only touches the case where the unconstrained parameter vector has no elements i.e. the model configuration space contains the empty vector (or the "empty set") and nothing else.
Also, there's no change to how Stan models are transpiled; a model without a parameter block has always resulted in the same configuration space.

@betanalpha
Copy link
Contributor

betanalpha commented Jan 12, 2022 via email

@nhuurre
Copy link
Contributor Author

nhuurre commented Jan 15, 2022

Interpretation: R^{0} is a singleton

Interpretation: R^{0} is the empty set
Complication: The empty set is a singleton,

If the second interpretation is that R^{0} is a singleton then maybe the first interpretation should be described as "R^{0} contains a single real number." I would also rather say that "R^{0} is a singleton set that contains the empty set" instead of "R^{0} is the empty set". Programming language terminology and set theory disagree when is comes to the exact meaning of "singleton": set theorist say that a set is singleton if it contains a single element, programmers say that an object is a singleton if it is the unique object of its type; in other words, to a mathematician "singleton" is the type but to a programmer "singleton" is the instance of that type. I've already discussed the importance of "configuration space is the empty set" vs "configuration space contains the empty set".


I’m being very careful about the difference between a real space and its associated Euclidean vector space because that difference is fundamental to differential geometry on which the most general Hamiltonian Monte Carlo algorithm is built.

I'm not entirely sure what you mean by a "real space" here.
The most basic notion of an N-dimensional "real space" I can think of is the Cartesian product of N copies of the real numbers. Each member of this set is an N-tuple, an ordered list of N real numbers. (A C++ programmer might call these "length-N vectors" but they're not vectors in the mathematical sense.)
Let's call this basic notion the "N-tuple space." The other spaces are constructed from this N-tuple space by associating it with a group of structure-preserving automorphisms. The most common examples:

  • the real vector space, i.e. the tuples augmented with vector addition and scalar multiplication. The automorphism group is the general linear group GL(N)
  • the real affine space, basically a vector space without origin. The automorphisms form the affine group Aff(N)
  • the Euclidean metric space. The automorphism group is the group of Euclidean isometries E(N)
  • R^N as a smooth manifold. The automorphism group is the diffeomorphism group Diff(R^N)

For N=0 the tuple space is a singleton set that contains only the empty tuple, and all these automorphism groups are trivial.


They represent both components of a vector (the momenta being interpreted as elements of the local cotangent space) and the coordinates of a space (the momenta being interpreted as points on a cotangent fiber).

I don't see the distinction you're making. A cotangent fiber is the inverse image of the projection map, and thus a subspace of the cotangent bundle, which in turn is the (disjoint) union of all local cotangent spaces (with an appropriate topology). The fiber at a point is the local vector space. You could take an isomorphic vector space that's not part of the bundle but then what makes it the cotangent space?
"Components of a vector" is a coordinate system on the vector space, no?

In the zero-dimensional limit the cotangent spaces reduce to zero vectors, and empty vectors are appropriate, but the cotangent fibers would reduce to single points with an infinite number of possible labels.

So, your objection to this pull req is that by allowing an empty parameter vector it promotes the "cotangent vector" interpretation and rejects the "fiber coordinate" interpretation? And the only practical difference between these interpretations is that the latter requires a redundant coordinate in zero dimensional space?


For N>=1, a coordinate chart on N-dimensional manifold is an injective function from the N-tuple space to the manifold.
I take it you're saying that a coordinate chart on a zero-dimensional manifold is an injective partial function from the real numbers (or the 1-tuple space) to the manifold, with all the ambiguity that entails.
I'd say the natural notion of a coordinate chart in zero dimensions is a function from the 0-tuple space to the manifold. When the manifold is connected there's only one such function--the function that maps the empty tuple to the single point on the manifold.


Given the model

parameters {
  vector[2] x;
}

the diagnostic output CSV contains, in addition to the internal sampler parameters, the following columns

x.1, x.2, p_x.1, p_x.2, g_x.1, g_x.2

If I understand

the cotangent fibers would reduce to single points with an infinite number of possible labels.

In the zero-dimensional limit we would still need a non-empty Eigen::VectorXd to hold the lone coordinate label

For example in the diagnostic output empty vectors will read out nothing where as constant, single-element vectors would read out that arbitrary limiting coordinate value.

then you think the model

parameters {
  vector[0] x;
}

should generate diagnostic output with the column

p_x.1

and this column records the coordinate label on the cotangent fiber. (it's not clear to me if you expect the columns x.1 and g_x.1 to also be present.)

This does not make sense to me; compare

parameters {
  vector[0] x;
  vector[1] y;
}

which does not need p_x.1. But why does the presence of y matter here? You said

This is the same ambiguity that arises when selecting a R^{N - 1} from a R^{N}; resolve the ambiguity requires specifying a parameterization for both spaces.

and if you interpret this model as selecting R^1 subspace from R^2 then you must specify the coordinate p_x.1, no?

@betanalpha
Copy link
Contributor

betanalpha commented Mar 1, 2022 via email

@mitzimorris mitzimorris merged commit cd8b2e0 into stan-dev:develop Jul 25, 2024
@mitzimorris
Copy link
Member

mitzimorris commented Jul 25, 2024

merge failed because unit test is using deprecated io.dump
investigating merge failure.

this be the error message:

src/test/unit/services/util/inv_metric_test.cpp:125:18: error: no viable conversion from 'stan::io::array_var_context' to 'stan::io::dump'
  stan::io::dump dmp = stan::services::util::create_unit_e_dense_inv_metric(0);
                 ^     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/stan/io/dump.hpp:596:7: note: candidate constructor (the implicit copy constructor) not viable: no known conversion from 'stan::io::array_var_context' to 'const stan::io::dump &' for 1st argument
class dump : public stan::io::var_context {
      ^
src/stan/io/dump.hpp:596:7: note: candidate constructor (the implicit move constructor) not viable: no known conversion from 'stan::io::array_var_context' to 'stan::io::dump &&' for 1st argument
1 error generated.

@mitzimorris
Copy link
Member

mitzimorris commented Jul 25, 2024

fixed unit test on new branch: https://github.com/stan-dev/stan/tree/bugfix/3071-fix-unit-test

but the unit test doesn't really test much - it shows that yes, it's possible to create a 0-size mass matrix, but that's always been possible - this PR doesn't touch stan/services/util/read_dense_inv_metric.cpp

the necessary unit test will run the adaptive sampler on a model with 0 params.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants