added SINDy library input option to WeakPDELibrary #433

znicolaou · 2023-11-29T00:16:51Z

As a partial response to issue #351 and discussion #429 here, I'm adding an option to replace the library_functions and function_names input of WeakPDELibrary with library, which is an existing instance of BaseFeatureLibrary. I've only tested so far on the first example in the example notebook 12, which reproduces the previous result:

import numpy as np
import pysindy as ps
from scipy.integrate import solve_ivp
from pysindy.utils import lorenz

# Generate measurement data
dt = 0.002
t_train = np.arange(0, 10, dt)
t_train_span = (t_train[0], t_train[-1])
u0_train = [-8, 8, 27]
u_train = solve_ivp(
    lorenz, t_train_span, u0_train, t_eval=t_train
).y.T

# Define weak form ODE library
# defaults to derivative_order = 0 if not specified,
# and if spatial_grid is not specified, defaults to None,
# which allows weak form ODEs.
poly_lib = ps.PolynomialLibrary(
    degree=2,
    include_bias=False
)

ode_lib2 = ps.WeakPDELibrary(
    library=poly_lib,
    spatiotemporal_grid=t_train,
    is_uniform=True,
    K=100,
)


# Instantiate and fit the SINDy model with the integral of u_dot
model = ps.SINDy(feature_library=ode_lib2, optimizer)
model.fit(u_train)
model.print()

I'd also like to take some time to look through the PDE example notebooks to clean them up and make sure deprecated input has been removed, so I'm making this a draft for now. If anyone has other suggestions on modifications to the weak/pde input or functionality, this is a good chance. I may also look into #222 if it isn't a huge headache.

codecov · 2023-11-29T00:21:27Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (2633ee0) 93.87% compared to head (6642ca5) 93.90%.
Report is 1 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #433      +/-   ##
==========================================
+ Coverage   93.87%   93.90%   +0.02%     
==========================================
  Files          37       37              
  Lines        3626     3608      -18     
==========================================
- Hits         3404     3388      -16     
+ Misses        222      220       -2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Jacob-Stevens-Haas · 2023-11-29T12:56:52Z

Thank you @znicolaou, this is a great feature! I'll review the code today or tomorrow. Do you have a simple test you could add for this?

I would defer #222 to a later PR, as it would help keep this PR simple.

znicolaou · 2023-11-29T17:17:53Z

Thanks, @Jacob-Stevens-Haas ! Yes, will revisit the tests and the example notebooks in the next day or two!

Jacob-Stevens-Haas

This seems like such a great feature for a lot of reasons. It expands the ability to use WeakPDELibrary for nontrivial problems, it avoids using special dunder attributes like __code__, and it simplifies the logic considerably.

Lets go ahead and deprecate the old behavior! See the comment for a suggestion.

test/test_feature_library.py

pysindy/feature_library/weak_pde_library.py

Jacob-Stevens-Haas · 2023-11-30T11:56:07Z

pysindy/feature_library/weak_pde_library.py

+        if self.library is not None:
+            lib_names = self.library.get_feature_names()
+        else:
+            for i, f in enumerate(self.functions):
+                for c in self._combinations(
+                    n_features, f.__code__.co_argcount, self.interaction_only
+                ):
+                    lib_names.append(
+                        self.function_names[i](*[input_features[j] for j in c])
+                    )
+        feature_names = feature_names + lib_names


What if, in initialization, we adapt the old way into the new way? Then we could get rid of all the self.library is not None conditionals and deprecate the old functionality.
e.g.

if library is None: self.library = CustomLibrary(library_functions=library_functions, function_names=function_names) warnings.warn("Use the `library` argument instead of `library_functions` and `function_names`. " "Pass a `CustomLibrary` if you truly want custom functions.", DeprecationWarning)

I think that's a good idea. It'll take a bit to make the example notebooks reproduce the old behavior, which I think it necessary before we merge, but I think it's worth it.

Wait, is this why you added TrimmedLibrary?

pysindy/feature_library/weak_pde_library.py

znicolaou · 2023-11-30T17:40:17Z

Last point about deprecating--I think we should probably implement the same change in PDELibrary in addition, since it also used the same library_functions convention that is essentially repeating the CustomLibrary. Again, will take just a few small changes to the code but a bit of time to not break examples...

…le notebooks

…er from unnecessarily copying Theta to conserve memory for PDEs

…edLibrary; merged update from master

Jacob-Stevens-Haas

Hey, thanks for spreading this change to PDELibrary! But this adds a lot of new changes as well. I'm OK with skipping deprecation and going right to removal of library_functions - after all, it does remove code and tests, and that's a good thing. But why add a TrimmedLibrary class?

Jacob-Stevens-Haas · 2023-12-05T14:03:22Z

examples/1_feature_overview/example.py

@@ -1,12 +1,12 @@
-# %% [markdown]
+#!/usr/bin/env python


Did you modify the notebooks, then save as .py? The preferred way is to modify the .py, then run publish_notebook.py. This keeps the metadata and formatting the same between different notebook users, which minimizes diffs, e.g. # %% [markdown] vs # # coding: utf-8. It also keeps the nbdiff smaller by limiting changes to the cell and notebook metadata.

Ah, good to know about the publish_notebook script, I didn't keep track of that before. I'll do it in the future.

These are minor enough changes that I'm happy to do it on my end, if you don't mind?

Jacob-Stevens-Haas · 2023-12-05T14:06:19Z

pysindy/feature_library/base.py

+        for x in x_full:
+            xp = self.library.transform([x])[0]
+            xp = np.delete(xp, self.drop_inds, axis=xp.ax_coord)
+            xp = AxesArray(xp, comprehend_axes(xp))


Would it work to create a new AxesArray with xp = AxesArray(xp, axes=xp.axes) so we don't rely on the conventions in comprehend_axes?

Good question: I copied the format from the ConcatLibrary and made minimal modifications. I think it would work fine that way. As a side note, there's a lot of old code using AxesArray in inconsistent ways. It would be great to go through and ensure that the we don't rely on the old conventions of axis=-1 for ax_coord and axis=-2 for time, but I haven't had time to look through it.

Agree it would be nice, but it's probably not something anyone needs to put time into until it causes an error.

Jacob-Stevens-Haas · 2023-12-05T14:41:35Z

pysindy/feature_library/pde_library.py

@@ -103,12 +85,13 @@ class PDELibrary(BaseFeatureLibrary):

    def __init__(
        self,
-        library_functions=[],
+        function_library: BaseFeatureLibrary = PolynomialLibrary(


Mutable default values should be set in __init__(). Since PolynomialLibrary is mutable, this has surprising side effects. Check this out:

from pysindy import PDELibrary, SINDy import numpy as np data_1 = np.zeros((10, 5, 1)) data_2 = np.zeros((10, 5, 2)) model_1 = SINDy(feature_library = PDELibrary()).fit(data_1) model_2 = SINDy(feature_library = PDELibrary()).fit(data_2) assert model_2.feature_library.function_library is model_1.feature_library.function_library # is true model_2.print() model_1.print() # IndexError because the inner function library is shared, and has been resized to second input

This is because the call to create default inputs like PolynomialLibrary() occurs only once, when the __init__() function signature is read.

Makes sense, thanks!

Jacob-Stevens-Haas · 2023-12-05T14:42:32Z

pysindy/feature_library/weak_pde_library.py

@@ -157,11 +135,11 @@ class WeakPDELibrary(BaseFeatureLibrary):

    def __init__(
        self,
-        library_functions=[],
-        library=None,
+        function_library: BaseFeatureLibrary = PolynomialLibrary(


Same as above - create it in __init__

Jacob-Stevens-Haas · 2023-12-05T14:43:09Z

pysindy/optimizers/base.py


+        # This calls check_X_y


NIT: This is the kind of comment that is perhaps better in a commit message explaining the change from check_X_y to _preprocess_data.

Yeah, you're right, sorry about that.

Jacob-Stevens-Haas · 2023-12-05T14:43:15Z

pysindy/optimizers/base.py

@@ -93,7 +92,7 @@ def __init__(
        max_iter=20,
        normalize_columns=False,
        initial_guess=None,
-        copy_X=True,
+        copy_X=False,


Why change? False means that the optimizer mutates the arguments that get passed, which feels like surprising default behavior. E.g. if someone is compares two models using the same numpy array, this will silently update the data in between fitting the two models. This is why the superclass sets the default to True.

The matrix being copied here is the output of the library.transform function after (passing through the SampleConcatter transform), which I don't think will ever be shared between two fits unless you hack it. For PDE data, this matrix can easily be more than 10GB in size, and copying it is very costly, especially when the copy does nothing. Looking at sklearn/linear_model/_base.py and __init__ here, this copy_X is only used in the _preprocess_data. When __fit_intercept=False as is default here, _preprocess_data does no centering, no rescaling, so the copying seems to do nothing. Nevertheless, this was causing spikes in memory usage that can crash examples when memory is low. If there's a good reason to copy the data here, happy to note that you can set copy_X=False to conserve resources instead, but I'm pretty sure it's doing nothing.

I agree that it's helpful for copy_X to be false when fitting large data, but think the docstring is the place to note that option. Changing the code would apply to all subclasses. Making the default True would silently change the behavior of any estimator that modifies the arguments to _reduce away from the convention not just in scikit-learn, but across all of PyData (e.g. numpy functions where inplace=False is default). It's general practice that public functions should not mutate arguments.

Our sparse estimators have a public API, and I don't think using one outside of a pysindy.SINDy model is really a hack. It's designed behavior.

Jacob-Stevens-Haas · 2023-12-05T14:51:04Z

pysindy/optimizers/stable_linear_sr3.py

-    ) -> Tuple[cp.Variable, cp.Expression]:
+    ) -> Tuple["cp.Variable", "cp.Expression"]:


from __future__ import annotations should take care of this.

I just merged @akaptano's fix that he pushed in master. Can do this instead.

Ah, makes sense. No worries - either way is fine, as SCA mostly reads quotes as forward references.

Jacob-Stevens-Haas · 2023-12-05T14:58:41Z

pysindy/feature_library/base.py

@@ -506,6 +506,117 @@ def calc_trajectory(self, diff_method, x, t):
        return self.libraries[0].calc_trajectory(diff_method, x, t)


+class TrimmedLibrary(BaseFeatureLibrary):


What is the use case for adding a TrimmedLibrary class?

When we made the EnsembleOptimizer, we lost some functionality in notebook 13, which I noticed when I went through the example notebooks now. It's also useful for implicit SINDy to be able to mask a feature library term if you want to use it in place of x_dot, as in SINDyPI. I added it here just to get the example notebook 13 running again, but it is useful to be able to construct a library then remove specific terms from it. Maybe it would be better to make another pull request, but I only had a week where I could focus on pysindy, so I snuck it in here. I have to turn to other work now.

The quick fix, if you don't have time to focus on pysindy, is to either let example 13 be broken, or remove TrimmedLibrary from here and put it in notebook 13. The enterprise-level fix is to remove it from this PR, add an issue for what was broken in 13 and why a TrimmedLibrary is a useful solution, and handle it separately with a new test.

Broadly speaking, I don't think we don't need to maintain the notebooks unless they implement a change for fast testing in #203 (see README.rst). It's too much of a burden otherwise, since they take a huge amount of time and memory to run. When someone runs into an error, they can make an issue and we can build a test, a fix, and if someone has the time, implement the maintainability fixes from the #203.

The broken notebooks serve as an example of what pysindy can do, even if they don't show the modern syntax.

I think it is a terrible practice to break the example notebooks. That compromises recently published work that other groups may be trying to reproduce and can ruin the reputation of recent graduates.

I agree that reproduciblility and supporting recent graduates is important. But "How to keep one piece of software perfectly reproducible while working on another version" is a solved problem: add a requirements for each of the notebooks, and use a separate environment to satisfy those dependencies. If you don't have pinned requirements, you don't actually have something reproducible. At the bare minimum, just put the version of pysindy that the experiments require at the top of the file: 1.7.4 or 1.7.5.

I didn't think you or Alan were that attached to the notebooks all being able to run, since when I started contributing a couple years ago, some of them were already broken. Nevertheless, I do care about supporting your and Alan's research longevity, and that includes across major releases. So I built #203 to make it easier for you and Alan to protect the functionality of your notebooks across major versions (in particular, see the readme part).

I'd be happy to have a discussion about examples vs full research papers as notebooks in the repo, as well as whether we should always should maintain backwards compatibility of a notebook across major versions even if they could have fast tests. When does reading a bit of updated documentation or troubleshooting an error become negligible compared to reading the notebook & paper in the first place? It's not necessarily germane to this PR - we can move that to discussions if it's better there.

Nevertheless, TrimmedLibrary doesn't make any code faster or cleaner inside the repository and it's only use case is in Example 13. As I said, the quick fix for this PR is to move it to examples/13_ensembling/utils.py.

PS What functionality had been lost from example 13?

I'm trying to be respectful, but discussions like this are not a good use of resources. I can spend time explaining decisions to you that are important, but that's not really my job.

You can open the notebooks, read the papers, understand what they are trying to do, and figure out why I made the changes if you want. You'll see that the features in the TrimmedLibrary are exactly those that were lost in notebook 13, but getting the notebook running is not the only reason to include it. I explained the use case with SINDyPI and ensemble SINDy already above, and I wanted to be able to test those things on research questions.

I know you want me to make another pull request for it and you will weigh in on its value, but I don't want to waste a bunch of time writing about trivial changes and discussing them. If you disagree with my decisions, then I will hear out your rationale, but the costs of contributing here are outweighing the benefits for me.

This repository is not an enterprise endeavor, and good scientific practices (particularly, not wasting time and computing resources) are at least as important as enterprise coding practices here in my opinion. We can have a difference of values, but we shouldn't let that become a barrier to collaboration.

znicolaou · 2023-12-05T19:48:17Z

I think this is done now, pending any other issues. Can take a look once more, but I have other work that I need to focus on now. If things look okay, happy to merge any time.

Jacob-Stevens-Haas · 2023-12-06T14:58:16Z

I think this is done now, pending any other issues. Can take a look once more, but I have other work that I need to focus on now. If things look okay, happy to merge any time.

If you need to move onto other things, would you like me to take this PR over the finish line?

Jacob-Stevens-Haas · 2024-01-19T17:18:51Z

With that PR and issue, are you ok if we close this, Zach? (But let's keep the branch around so that we can pull in TrimmedLibrary)

added SINDy library input option to WeakPDELibrary

26f7b30

znicolaou requested review from Jacob-Stevens-Haas and akaptano November 29, 2023 00:16

znicolaou self-assigned this Nov 29, 2023

znicolaou added 2 commits November 29, 2023 11:31

fixed pde function names

8b6e67c

added some test

0f2e54c

Jacob-Stevens-Haas requested changes Nov 30, 2023

View reviewed changes

znicolaou added 9 commits November 30, 2023 16:11

updated function input for PDELibrary; started sanity checks on examp…

062ebbe

…le notebooks

cleaned up tests

f60bdd4

updated notebook test

5e0b473

removed unused and uncovered functions

525d818

updated deprecated calls in example notebooks; prevented base optimiz…

7454094

…er from unnecessarily copying Theta to conserve memory for PDEs

finished testing and updated example notebooks; adding test for Trimm…

cb8b3df

…edLibrary; merged update from master

finished testing and updated example notebooks; adding test for Trimm…

b71e7f7

…edLibrary; merged update from master

increase coverage

58f3c53

updated TrimmedLibrary for WeakPDELibrary case

6642ca5

Jacob-Stevens-Haas requested changes Dec 5, 2023

View reviewed changes

Jacob-Stevens-Haas mentioned this pull request Jan 3, 2024

Pde feature library wrapper #449

Merged

Jacob-Stevens-Haas mentioned this pull request Jan 19, 2024

Remove research experiments from pysindy #461

Open

Jacob-Stevens-Haas closed this Apr 14, 2024

		) -> Tuple[cp.Variable, cp.Expression]:
		) -> Tuple["cp.Variable", "cp.Expression"]:

		@@ -506,6 +506,117 @@ def calc_trajectory(self, diff_method, x, t):
		return self.libraries[0].calc_trajectory(diff_method, x, t)


		class TrimmedLibrary(BaseFeatureLibrary):

added SINDy library input option to WeakPDELibrary #433

added SINDy library input option to WeakPDELibrary #433

Conversation

znicolaou commented Nov 29, 2023 • edited by Jacob-Stevens-Haas Loading

codecov bot commented Nov 29, 2023 • edited Loading

Codecov Report

Jacob-Stevens-Haas commented Nov 29, 2023

znicolaou commented Nov 29, 2023

Jacob-Stevens-Haas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

znicolaou commented Nov 30, 2023

Jacob-Stevens-Haas left a comment • edited Loading

Choose a reason for hiding this comment

Jacob-Stevens-Haas Dec 5, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jacob-Stevens-Haas Dec 6, 2023 • edited Loading

Choose a reason for hiding this comment

Jacob-Stevens-Haas Dec 5, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jacob-Stevens-Haas Dec 5, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jacob-Stevens-Haas Dec 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jacob-Stevens-Haas Dec 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

znicolaou commented Dec 5, 2023

Jacob-Stevens-Haas commented Dec 6, 2023 • edited Loading

Jacob-Stevens-Haas commented Jan 19, 2024

znicolaou commented Nov 29, 2023 •

edited by Jacob-Stevens-Haas

Loading

codecov bot commented Nov 29, 2023 •

edited

Loading

Jacob-Stevens-Haas left a comment •

edited

Loading

Jacob-Stevens-Haas Dec 5, 2023 •

edited

Loading

Jacob-Stevens-Haas Dec 6, 2023 •

edited

Loading

Jacob-Stevens-Haas Dec 5, 2023 •

edited

Loading

Jacob-Stevens-Haas Dec 5, 2023 •

edited

Loading

Jacob-Stevens-Haas Dec 6, 2023 •

edited

Loading

Jacob-Stevens-Haas Dec 7, 2023 •

edited

Loading

Jacob-Stevens-Haas commented Dec 6, 2023 •

edited

Loading