[BUG] Running example code for Ensemble SINDy gives different median or mean from the model.coefficients() or optimizer.coef_? #295

linguo4 · 2023-03-27T01:36:10Z

When using the example code in the documentation, the mean or median calculated from the optimizer.coef_list is different from the value by optimizer.coef_ or model.coefficients(). Is this a bug?

Reproducing code example:

import pysindy
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
from scipy.integrate import solve_ivp
from sklearn.linear_model import Lasso

from pysindy.utils import enzyme
from pysindy.utils import lorenz
from pysindy.utils import lorenz_control
import pysindy as ps

# bad code but allows us to ignore warnings
import warnings
warnings.filterwarnings("ignore", category=UserWarning)

# Seed the random number generators for reproducibility
np.random.seed(100)

# Initialize integrator keywords for solve_ivp to replicate the odeint defaults
integrator_keywords = {}
integrator_keywords['rtol'] = 1e-12
integrator_keywords['method'] = 'LSODA'
integrator_keywords['atol'] = 1e-12

# Generate measurement data
dt = .002

t_train = np.arange(0, 10, dt)
x0_train = [-8, 8, 27]
t_train_span = (t_train[0], t_train[-1])
x_train = solve_ivp(lorenz, t_train_span, x0_train,
                    t_eval=t_train, **integrator_keywords).y.T

# Default is to sample the entire time series with replacement, generating 10 models on roughly
# 60% of the total data, with duplicates.

# Custom feature names
np.random.seed(100)
feature_names = ['x', 'y', 'z']

ensemble_optimizer = ps.EnsembleOptimizer(ps.STLSQ(threshold=1e-3,normalize_columns=False),
                                          bagging=True,n_subset=int(1.0*x_train.shape[0]), replace=True)

model = ps.SINDy(optimizer=ensemble_optimizer, feature_names=feature_names)
model.fit(x_train, t=dt)
ensemble_coefs = ensemble_optimizer.coef_list
mean_ensemble = np.mean(ensemble_coefs, axis=0)
std_ensemble = np.std(ensemble_coefs, axis=0)

ensemble_optimizer.coef_==model.coefficients()

np.abs(np.median(ensemble_coefs, axis=0)-model.coefficients())<1e-10

np.abs(np.mean(ensemble_coefs, axis=0)-model.coefficients())<1e-10

Error message:

PySINDy/Python version information:

The text was updated successfully, but these errors were encountered:

Jacob-Stevens-Haas · 2023-04-08T16:12:49Z

Very interesting - thank you for finding a difficult numerical issue, and thank you for submitting a complete working example! I can confirm this in master.

tl;dr: yes, this is by design. The final coefficients are "unbiased" by the choice of optimizer.

~~My first thought is it's in LinearRegression._set_intercept()~~

Nope: self.fit_intercept is False in this example.

Next thought is it's something to do with wrapping every optimizer in a SINDyOptimizer and thereby running self._unbias().

That was correct, but SINDyOptimizer.unbias is desired - fit the coefficients one more time, omitting all terms the (regularized) optimizer chose to omit but using un-regularized linear regression. It defaults to True in SINDy.__init__().

There are some things to think about design-wise here, however. @akaptano, @znicolaou , thoughts?

Currently, BaseOptimizer has old docstring with unbias parameter but doesn't use it - should be removed if nothing else done.
Currently, SINDy.fit wraps all optimizers in a SINDyOptimizer to which it gives it's own unbias initialization parameter.
SINDyOptimizer, as a class, seems to exist solely to run _unbias() (maybe some simple reshaping on 1-d coeff matrices).
It appears that SINDyOptimizer is never used in the examples, except as a markdown comment to suggest unbiasing.
Is there a reason to keep SINDyOptimizer? We could return _unbias() to the BaseOptimizer class, combine their fit() code together...
...and deprecate/remove the unbias parameter from SINDy objects? Rationale being (a) avoiding SINDy being a God Object antipattern, (b) encapsulation of optimization parameters w/in optimization objects. OTOH if we treat unbiasing as a step after optimization, the alternative is to remove it from SINDyOptimizer and make it a function in pysindy.py
Regardless, I'm motivated by the ZOP There should be one-- and preferably only one --obvious way to do it.

Jacob-Stevens-Haas · 2023-04-24T16:59:59Z

Closing & posting new issue for 🧵 about unbias param.

Jacob-Stevens-Haas closed this as completed Apr 24, 2023

Jacob-Stevens-Haas mentioned this issue Apr 24, 2023

Either remove SINDyOptimizer or unbias arg in BaseOptimizer #319

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Running example code for Ensemble SINDy gives different median or mean from the model.coefficients() or optimizer.coef_? #295

[BUG] Running example code for Ensemble SINDy gives different median or mean from the model.coefficients() or optimizer.coef_? #295

linguo4 commented Mar 27, 2023

Jacob-Stevens-Haas commented Apr 8, 2023 •

edited

Loading

Jacob-Stevens-Haas commented Apr 24, 2023

[BUG] Running example code for Ensemble SINDy gives different median or mean from the model.coefficients() or optimizer.coef_? #295

[BUG] Running example code for Ensemble SINDy gives different median or mean from the model.coefficients() or optimizer.coef_? #295

Comments

linguo4 commented Mar 27, 2023

Reproducing code example:

Error message:

PySINDy/Python version information:

Jacob-Stevens-Haas commented Apr 8, 2023 • edited Loading

Jacob-Stevens-Haas commented Apr 24, 2023

Jacob-Stevens-Haas commented Apr 8, 2023 •

edited

Loading