Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added SINDy library input option to WeakPDELibrary #433

Closed
wants to merge 12 commits into from

Conversation

znicolaou
Copy link
Collaborator

@znicolaou znicolaou commented Nov 29, 2023

As a partial response to issue #351 and discussion #429 here, I'm adding an option to replace the library_functions and function_names input of WeakPDELibrary with library, which is an existing instance of BaseFeatureLibrary. I've only tested so far on the first example in the example notebook 12, which reproduces the previous result:

import numpy as np
import pysindy as ps
from scipy.integrate import solve_ivp
from pysindy.utils import lorenz

# Generate measurement data
dt = 0.002
t_train = np.arange(0, 10, dt)
t_train_span = (t_train[0], t_train[-1])
u0_train = [-8, 8, 27]
u_train = solve_ivp(
    lorenz, t_train_span, u0_train, t_eval=t_train
).y.T

# Define weak form ODE library
# defaults to derivative_order = 0 if not specified,
# and if spatial_grid is not specified, defaults to None,
# which allows weak form ODEs.
poly_lib = ps.PolynomialLibrary(
    degree=2,
    include_bias=False
)

ode_lib2 = ps.WeakPDELibrary(
    library=poly_lib,
    spatiotemporal_grid=t_train,
    is_uniform=True,
    K=100,
)


# Instantiate and fit the SINDy model with the integral of u_dot
model = ps.SINDy(feature_library=ode_lib2, optimizer)
model.fit(u_train)
model.print()

I'd also like to take some time to look through the PDE example notebooks to clean them up and make sure deprecated input has been removed, so I'm making this a draft for now. If anyone has other suggestions on modifications to the weak/pde input or functionality, this is a good chance. I may also look into #222 if it isn't a huge headache.

Copy link

codecov bot commented Nov 29, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (2633ee0) 93.87% compared to head (6642ca5) 93.90%.
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #433      +/-   ##
==========================================
+ Coverage   93.87%   93.90%   +0.02%     
==========================================
  Files          37       37              
  Lines        3626     3608      -18     
==========================================
- Hits         3404     3388      -16     
+ Misses        222      220       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Jacob-Stevens-Haas
Copy link
Member

Thank you @znicolaou, this is a great feature! I'll review the code today or tomorrow. Do you have a simple test you could add for this?

I would defer #222 to a later PR, as it would help keep this PR simple.

@znicolaou
Copy link
Collaborator Author

Thanks, @Jacob-Stevens-Haas ! Yes, will revisit the tests and the example notebooks in the next day or two!

Copy link
Member

@Jacob-Stevens-Haas Jacob-Stevens-Haas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like such a great feature for a lot of reasons. It expands the ability to use WeakPDELibrary for nontrivial problems, it avoids using special dunder attributes like __code__, and it simplifies the logic considerably.

Lets go ahead and deprecate the old behavior! See the comment for a suggestion.

test/test_feature_library.py Outdated Show resolved Hide resolved
test/test_feature_library.py Outdated Show resolved Hide resolved
pysindy/feature_library/weak_pde_library.py Outdated Show resolved Hide resolved
Comment on lines 731 to 741
if self.library is not None:
lib_names = self.library.get_feature_names()
else:
for i, f in enumerate(self.functions):
for c in self._combinations(
n_features, f.__code__.co_argcount, self.interaction_only
):
lib_names.append(
self.function_names[i](*[input_features[j] for j in c])
)
feature_names = feature_names + lib_names
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if, in initialization, we adapt the old way into the new way? Then we could get rid of all the self.library is not None conditionals and deprecate the old functionality.
e.g.

if library is None:
    self.library = CustomLibrary(library_functions=library_functions, function_names=function_names)
    warnings.warn("Use the `library` argument instead of `library_functions` and `function_names`. "
        "Pass a `CustomLibrary` if you truly want custom functions.", DeprecationWarning)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good idea. It'll take a bit to make the example notebooks reproduce the old behavior, which I think it necessary before we merge, but I think it's worth it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, is this why you added TrimmedLibrary?

pysindy/feature_library/weak_pde_library.py Outdated Show resolved Hide resolved
pysindy/feature_library/weak_pde_library.py Outdated Show resolved Hide resolved
@znicolaou
Copy link
Collaborator Author

Last point about deprecating--I think we should probably implement the same change in PDELibrary in addition, since it also used the same library_functions convention that is essentially repeating the CustomLibrary. Again, will take just a few small changes to the code but a bit of time to not break examples...

Copy link
Member

@Jacob-Stevens-Haas Jacob-Stevens-Haas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for spreading this change to PDELibrary! But this adds a lot of new changes as well. I'm OK with skipping deprecation and going right to removal of library_functions - after all, it does remove code and tests, and that's a good thing. But why add a TrimmedLibrary class?

@@ -1,12 +1,12 @@
# %% [markdown]
#!/usr/bin/env python
Copy link
Member

@Jacob-Stevens-Haas Jacob-Stevens-Haas Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you modify the notebooks, then save as .py? The preferred way is to modify the .py, then run publish_notebook.py. This keeps the metadata and formatting the same between different notebook users, which minimizes diffs, e.g. # %% [markdown] vs # # coding: utf-8. It also keeps the nbdiff smaller by limiting changes to the cell and notebook metadata.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good to know about the publish_notebook script, I didn't keep track of that before. I'll do it in the future.

Copy link
Member

@Jacob-Stevens-Haas Jacob-Stevens-Haas Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are minor enough changes that I'm happy to do it on my end, if you don't mind?

for x in x_full:
xp = self.library.transform([x])[0]
xp = np.delete(xp, self.drop_inds, axis=xp.ax_coord)
xp = AxesArray(xp, comprehend_axes(xp))
Copy link
Member

@Jacob-Stevens-Haas Jacob-Stevens-Haas Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it work to create a new AxesArray with xp = AxesArray(xp, axes=xp.axes) so we don't rely on the conventions in comprehend_axes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question: I copied the format from the ConcatLibrary and made minimal modifications. I think it would work fine that way. As a side note, there's a lot of old code using AxesArray in inconsistent ways. It would be great to go through and ensure that the we don't rely on the old conventions of axis=-1 for ax_coord and axis=-2 for time, but I haven't had time to look through it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree it would be nice, but it's probably not something anyone needs to put time into until it causes an error.

@@ -103,12 +85,13 @@ class PDELibrary(BaseFeatureLibrary):

def __init__(
self,
library_functions=[],
function_library: BaseFeatureLibrary = PolynomialLibrary(
Copy link
Member

@Jacob-Stevens-Haas Jacob-Stevens-Haas Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mutable default values should be set in __init__(). Since PolynomialLibrary is mutable, this has surprising side effects. Check this out:

from pysindy import PDELibrary, SINDy
import numpy as np

data_1 = np.zeros((10, 5, 1))
data_2 = np.zeros((10, 5, 2))

model_1 = SINDy(feature_library = PDELibrary()).fit(data_1)
model_2 = SINDy(feature_library = PDELibrary()).fit(data_2)

assert model_2.feature_library.function_library is model_1.feature_library.function_library
# is true

model_2.print()
model_1.print()
# IndexError because the inner function library is shared, and has been resized to second input

This is because the call to create default inputs like PolynomialLibrary() occurs only once, when the __init__() function signature is read.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks!

@@ -157,11 +135,11 @@ class WeakPDELibrary(BaseFeatureLibrary):

def __init__(
self,
library_functions=[],
library=None,
function_library: BaseFeatureLibrary = PolynomialLibrary(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above - create it in __init__


# This calls check_X_y
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: This is the kind of comment that is perhaps better in a commit message explaining the change from check_X_y to _preprocess_data.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're right, sorry about that.

@@ -93,7 +92,7 @@ def __init__(
max_iter=20,
normalize_columns=False,
initial_guess=None,
copy_X=True,
copy_X=False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change? False means that the optimizer mutates the arguments that get passed, which feels like surprising default behavior. E.g. if someone is compares two models using the same numpy array, this will silently update the data in between fitting the two models. This is why the superclass sets the default to True.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The matrix being copied here is the output of the library.transform function after (passing through the SampleConcatter transform), which I don't think will ever be shared between two fits unless you hack it. For PDE data, this matrix can easily be more than 10GB in size, and copying it is very costly, especially when the copy does nothing. Looking at sklearn/linear_model/_base.py and __init__ here, this copy_X is only used in the _preprocess_data. When __fit_intercept=False as is default here, _preprocess_data does no centering, no rescaling, so the copying seems to do nothing. Nevertheless, this was causing spikes in memory usage that can crash examples when memory is low. If there's a good reason to copy the data here, happy to note that you can set copy_X=False to conserve resources instead, but I'm pretty sure it's doing nothing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it's helpful for copy_X to be false when fitting large data, but think the docstring is the place to note that option. Changing the code would apply to all subclasses. Making the default True would silently change the behavior of any estimator that modifies the arguments to _reduce away from the convention not just in scikit-learn, but across all of PyData (e.g. numpy functions where inplace=False is default). It's general practice that public functions should not mutate arguments.

Our sparse estimators have a public API, and I don't think using one outside of a pysindy.SINDy model is really a hack. It's designed behavior.

Comment on lines -208 to +210
) -> Tuple[cp.Variable, cp.Expression]:
) -> Tuple["cp.Variable", "cp.Expression"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from __future__ import annotations should take care of this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just merged @akaptano's fix that he pushed in master. Can do this instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes sense. No worries - either way is fine, as SCA mostly reads quotes as forward references.

@@ -506,6 +506,117 @@ def calc_trajectory(self, diff_method, x, t):
return self.libraries[0].calc_trajectory(diff_method, x, t)


class TrimmedLibrary(BaseFeatureLibrary):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use case for adding a TrimmedLibrary class?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we made the EnsembleOptimizer, we lost some functionality in notebook 13, which I noticed when I went through the example notebooks now. It's also useful for implicit SINDy to be able to mask a feature library term if you want to use it in place of x_dot, as in SINDyPI. I added it here just to get the example notebook 13 running again, but it is useful to be able to construct a library then remove specific terms from it. Maybe it would be better to make another pull request, but I only had a week where I could focus on pysindy, so I snuck it in here. I have to turn to other work now.

Copy link
Member

@Jacob-Stevens-Haas Jacob-Stevens-Haas Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The quick fix, if you don't have time to focus on pysindy, is to either let example 13 be broken, or remove TrimmedLibrary from here and put it in notebook 13. The enterprise-level fix is to remove it from this PR, add an issue for what was broken in 13 and why a TrimmedLibrary is a useful solution, and handle it separately with a new test.

Broadly speaking, I don't think we don't need to maintain the notebooks unless they implement a change for fast testing in #203 (see README.rst). It's too much of a burden otherwise, since they take a huge amount of time and memory to run. When someone runs into an error, they can make an issue and we can build a test, a fix, and if someone has the time, implement the maintainability fixes from the #203.

The broken notebooks serve as an example of what pysindy can do, even if they don't show the modern syntax.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a terrible practice to break the example notebooks. That compromises recently published work that other groups may be trying to reproduce and can ruin the reputation of recent graduates.

Copy link
Member

@Jacob-Stevens-Haas Jacob-Stevens-Haas Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that reproduciblility and supporting recent graduates is important. But "How to keep one piece of software perfectly reproducible while working on another version" is a solved problem: add a requirements for each of the notebooks, and use a separate environment to satisfy those dependencies. If you don't have pinned requirements, you don't actually have something reproducible. At the bare minimum, just put the version of pysindy that the experiments require at the top of the file: 1.7.4 or 1.7.5.

I didn't think you or Alan were that attached to the notebooks all being able to run, since when I started contributing a couple years ago, some of them were already broken. Nevertheless, I do care about supporting your and Alan's research longevity, and that includes across major releases. So I built #203 to make it easier for you and Alan to protect the functionality of your notebooks across major versions (in particular, see the readme part).

I'd be happy to have a discussion about examples vs full research papers as notebooks in the repo, as well as whether we should always should maintain backwards compatibility of a notebook across major versions even if they could have fast tests. When does reading a bit of updated documentation or troubleshooting an error become negligible compared to reading the notebook & paper in the first place? It's not necessarily germane to this PR - we can move that to discussions if it's better there.

Nevertheless, TrimmedLibrary doesn't make any code faster or cleaner inside the repository and it's only use case is in Example 13. As I said, the quick fix for this PR is to move it to examples/13_ensembling/utils.py.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS What functionality had been lost from example 13?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to be respectful, but discussions like this are not a good use of resources. I can spend time explaining decisions to you that are important, but that's not really my job.

You can open the notebooks, read the papers, understand what they are trying to do, and figure out why I made the changes if you want. You'll see that the features in the TrimmedLibrary are exactly those that were lost in notebook 13, but getting the notebook running is not the only reason to include it. I explained the use case with SINDyPI and ensemble SINDy already above, and I wanted to be able to test those things on research questions.

I know you want me to make another pull request for it and you will weigh in on its value, but I don't want to waste a bunch of time writing about trivial changes and discussing them. If you disagree with my decisions, then I will hear out your rationale, but the costs of contributing here are outweighing the benefits for me.

This repository is not an enterprise endeavor, and good scientific practices (particularly, not wasting time and computing resources) are at least as important as enterprise coding practices here in my opinion. We can have a difference of values, but we shouldn't let that become a barrier to collaboration.

@znicolaou
Copy link
Collaborator Author

I think this is done now, pending any other issues. Can take a look once more, but I have other work that I need to focus on now. If things look okay, happy to merge any time.

@Jacob-Stevens-Haas
Copy link
Member

Jacob-Stevens-Haas commented Dec 6, 2023

I think this is done now, pending any other issues. Can take a look once more, but I have other work that I need to focus on now. If things look okay, happy to merge any time.

If you need to move onto other things, would you like me to take this PR over the finish line?

@Jacob-Stevens-Haas
Copy link
Member

With that PR and issue, are you ok if we close this, Zach? (But let's keep the branch around so that we can pull in TrimmedLibrary)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants