Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Running example code for Ensemble SINDy gives different median or mean from the model.coefficients() or optimizer.coef_? #295

Closed
linguo4 opened this issue Mar 27, 2023 · 2 comments

Comments

@linguo4
Copy link

linguo4 commented Mar 27, 2023

When using the example code in the documentation, the mean or median calculated from the optimizer.coef_list is different from the value by optimizer.coef_ or model.coefficients(). Is this a bug?

Reproducing code example:

image

import pysindy
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
from scipy.integrate import solve_ivp
from sklearn.linear_model import Lasso

from pysindy.utils import enzyme
from pysindy.utils import lorenz
from pysindy.utils import lorenz_control
import pysindy as ps

# bad code but allows us to ignore warnings
import warnings
warnings.filterwarnings("ignore", category=UserWarning)

# Seed the random number generators for reproducibility
np.random.seed(100)

# Initialize integrator keywords for solve_ivp to replicate the odeint defaults
integrator_keywords = {}
integrator_keywords['rtol'] = 1e-12
integrator_keywords['method'] = 'LSODA'
integrator_keywords['atol'] = 1e-12

# Generate measurement data
dt = .002

t_train = np.arange(0, 10, dt)
x0_train = [-8, 8, 27]
t_train_span = (t_train[0], t_train[-1])
x_train = solve_ivp(lorenz, t_train_span, x0_train,
                    t_eval=t_train, **integrator_keywords).y.T

# Default is to sample the entire time series with replacement, generating 10 models on roughly
# 60% of the total data, with duplicates.

# Custom feature names
np.random.seed(100)
feature_names = ['x', 'y', 'z']

ensemble_optimizer = ps.EnsembleOptimizer(ps.STLSQ(threshold=1e-3,normalize_columns=False),
                                          bagging=True,n_subset=int(1.0*x_train.shape[0]), replace=True)

model = ps.SINDy(optimizer=ensemble_optimizer, feature_names=feature_names)
model.fit(x_train, t=dt)
ensemble_coefs = ensemble_optimizer.coef_list
mean_ensemble = np.mean(ensemble_coefs, axis=0)
std_ensemble = np.std(ensemble_coefs, axis=0)

ensemble_optimizer.coef_==model.coefficients()

np.abs(np.median(ensemble_coefs, axis=0)-model.coefficients())<1e-10

np.abs(np.mean(ensemble_coefs, axis=0)-model.coefficients())<1e-10

Error message:

PySINDy/Python version information:

@Jacob-Stevens-Haas
Copy link
Member

Jacob-Stevens-Haas commented Apr 8, 2023

Very interesting - thank you for finding a difficult numerical issue, and thank you for submitting a complete working example! I can confirm this in master.

tl;dr: yes, this is by design. The final coefficients are "unbiased" by the choice of optimizer.

My first thought is it's in LinearRegression._set_intercept()

Nope: self.fit_intercept is False in this example.

Next thought is it's something to do with wrapping every optimizer in a SINDyOptimizer and thereby running self._unbias().

That was correct, but SINDyOptimizer.unbias is desired - fit the coefficients one more time, omitting all terms the (regularized) optimizer chose to omit but using un-regularized linear regression. It defaults to True in SINDy.__init__().

There are some things to think about design-wise here, however. @akaptano, @znicolaou , thoughts?

  • Currently, BaseOptimizer has old docstring with unbias parameter but doesn't use it - should be removed if nothing else done.
  • Currently, SINDy.fit wraps all optimizers in a SINDyOptimizer to which it gives it's own unbias initialization parameter.
  • SINDyOptimizer, as a class, seems to exist solely to run _unbias() (maybe some simple reshaping on 1-d coeff matrices).
  • It appears that SINDyOptimizer is never used in the examples, except as a markdown comment to suggest unbiasing.
  • Is there a reason to keep SINDyOptimizer? We could return _unbias() to the BaseOptimizer class, combine their fit() code together...
  • ...and deprecate/remove the unbias parameter from SINDy objects? Rationale being (a) avoiding SINDy being a God Object antipattern, (b) encapsulation of optimization parameters w/in optimization objects. OTOH if we treat unbiasing as a step after optimization, the alternative is to remove it from SINDyOptimizer and make it a function in pysindy.py
  • Regardless, I'm motivated by the ZOP There should be one-- and preferably only one --obvious way to do it.

@Jacob-Stevens-Haas
Copy link
Member

Closing & posting new issue for 🧵 about unbias param.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants