Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhanced subdomain integration for the weak form library #153

Merged
merged 28 commits into from
Apr 30, 2022

Conversation

znicolaou
Copy link
Collaborator

I implemented significant improvements to the weak form library! The performance increase depends on the number of spatial dimensions and the space-time resolution, but the example notebook 12_weakform_SINDy_examples.ipynb achieves better results than the old version in about 1/10th the runtime.

The old implementation interpolated the input onto a regular grid of points within each domain cell, as specified by num_pts_per_domain. These interpolated values were multiplied by corresponding derivatives of the polynomial test functions and integrated using the trapezoid rule to produce the output features.

The crucial observation here is that the linear interpolation and trapezoid rule approximates an integral of a piecewise polynomial in the large num_pts_per_domain limit. There is no need to interpolate the data--the integral of the polynomials can be performed exactly on each interval separating space-time points. By deriving the exact expressions for the integrals of the derivatives of the test functions at the space-time points of the input data, we can convert the integral into a weighted sum over the input data. This reduces the number of evaluation to the number of space-time points in the domain cells while achieving the accuracy of the num_pts_per_domain to infinity limit. It also allows for vectorized evaluation of each domain-cell integral.

In the new code, we find the indices self.inds_k[k] of the spatiotemporal grid that lie within the domain cell k (which was recentered and resized from self.H_xt to self.H_xt_k[k], as determined by the first a last points along each axis). The weights weights[k] are determined from the derivative orders of each term and self.xtilde_k[k], which is a rescaled version of the coordinates XT_k[k]=self.spatiotemporal_grid[np.ix_(*self.inds[k])]. The values of x_k[k]=x_shaped[np.ix_(*self.inds[k])] are weighted and summed to evaluate the integrals np.sum(x_k[k]*weights[k]).

I am pretty happy with the code now, but there are opportunities for improvement still. Since the size of self.inds_k[k] varies with k, it's challenging to further vectorize, but it's probably possible to do so. If we performed cumsum on a flattened version of the ragged array x_k*weights, then the sum within each cell could be determined by differences with some clever indexing. This is a little challenging to write, so I haven't seen if it would help. Also, the current implementation uses randomly placed domain cells, but it may be worth looking into slicing the whole spatiotemporal grid into a regular grid of cells.

@znicolaou znicolaou added the enhancement New feature or request label Jan 20, 2022
@codecov-commenter
Copy link

codecov-commenter commented Jan 20, 2022

Codecov Report

Merging #153 (0748ec5) into master (9792beb) will decrease coverage by 1.16%.
The diff coverage is 86.79%.

@@            Coverage Diff             @@
##           master     #153      +/-   ##
==========================================
- Coverage   94.96%   93.79%   -1.17%     
==========================================
  Files          33       33              
  Lines        3178     3322     +144     
==========================================
+ Hits         3018     3116      +98     
- Misses        160      206      +46     
Impacted Files Coverage Δ
pysindy/utils/__init__.py 100.00% <ø> (ø)
pysindy/utils/odes.py 98.93% <0.00%> (ø)
pysindy/feature_library/base.py 92.51% <44.44%> (-3.17%) ⬇️
pysindy/pysindy.py 88.07% <65.06%> (-6.10%) ⬇️
pysindy/utils/base.py 89.76% <76.92%> (-3.22%) ⬇️
pysindy/feature_library/weak_pde_library.py 96.37% <97.98%> (-0.60%) ⬇️
pysindy/feature_library/generalized_library.py 96.99% <100.00%> (+0.05%) ⬆️
pysindy/optimizers/constrained_sr3.py 91.94% <100.00%> (ø)
pysindy/optimizers/frols.py 97.61% <100.00%> (ø)
pysindy/optimizers/sr3.py 94.20% <100.00%> (ø)
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9792beb...0748ec5. Read the comment docs.

@akaptano
Copy link
Collaborator

Hey Zach, are you free for a quick zoom call today? I'm free all day and think it will be easier for me to understand exactly what you did here. Looks amazing though!

@znicolaou
Copy link
Collaborator Author

Sure, happy to chat after lunch today-how about we say 2?

@akaptano
Copy link
Collaborator

sounds good! I'll shoot you a zoom link on the Slack if that works

@znicolaou
Copy link
Collaborator Author

FYI, after benchmarking a little, at least half of the runtime is spent calculating the weights now, which is done in _set_up_grids when you create the WeakPDELibrary object. This is currently accomplished by multiplying the weights along each axis with python loops, and I expect there is a lot of opportunity for vectorization there, but those ragged arrays make it a bit hard.

@akaptano
Copy link
Collaborator

sounds good. Heads up other work has picked up substantially, so it might take me a few weeks to get to this. At least you can use it in the meantime for your coarse-grain models. Let me know if there is some aspect that is time-sensitive and I
l'll try to get to that sooner.

@znicolaou
Copy link
Collaborator Author

Ok, sounds good! I will keep using it and may make a few changes to this branch as I go.

@znicolaou
Copy link
Collaborator Author

@akaptano I saw all those issues with the generalized and tensor libraries on the issues -- looks like there's some organizing to do! Also, noticed the branch on the temporal derivatives enhancement, which seems like a good idea. This last week got busy, but hopefully I can get back to finishing up this enhancement soon. I'll try to sort out the places in the tensor/generalized libraries where n_samples needs to be set, and make sure it works with multiple_trajectories too!

@akaptano
Copy link
Collaborator

akaptano commented Feb 5, 2022

Thanks a lot, and I will try to help in a week or two. Just thought we might as well combine those changes with this pull request, and make a new release when we are confident we have resolved those errors + the new weak form stuff is working well.

@akaptano
Copy link
Collaborator

akaptano commented Apr 24, 2022

First of all, amazing job and thanks very much for these additions!!!

Pushed some minor changes to the notebook and the weak library.

Some notes and things to do for both of us (although I will leave better documenting the new techniques for you):

  1. set_up_grids has gotten much more substantial. Could you add some documentation in set_up_grids and elsewhere, adding comments and describing the high-level view of what you are doing with the gridding?
  2. Add descriptions for _left_weights and _right_weights. Add comments above nested for loops around line 868 (over dx_k_j) describing what is done here.
  3. WeakPDELibrary class description should be extended significantly to detail exactly how you are doing the weak form now.
  4. Fix Issue model.score does not work with WeakPDELibrary #155
  5. Fix Issue [BUG] Iteration problem when using ensemble SINDy and GenerlizedLibrary #158 (looks like this was already done?)
  6. Double check addressed the problem brought in Issue Weak SINDY for higher order ODEs #159 "Just a heads up for the weak formulation update... currently the TensoredLibrary does not allow for weak terms (and therefore neither does the GeneralizedLibrary if any of the libraries are tensored together) and this would be a good fix. The fix is straightforward... basically just need to make sure "n_samples" is correct, since this number changes in the weak form. I made a hack-ish fix to this branch so that it works for Bhumika's application."
  7. Maybe save this for a different pull request... get rid of SINDy-PI and incorporate a flag that decides whether or not to include the temporal derivatives in the PDELibrary and WeakPDELibrary (and deals with avoiding the trivial fit u_dot = d/dt u). This could also address Issue Extrapolating 1D PDE with discovered PDE and constraint of known timepoints #168.
  8. Double check this is compatible/fixed Issue Request for advice about shape of control_features when using PDELibrary #164
  9. Fix or example that addresses Issue Including derivatives of non-linear terms in PDELibrary #170

Please update the pull request as you check off items here

Alan Kaptanoglu added 2 commits April 24, 2022 11:19
Alan Kaptanoglu added 2 commits April 24, 2022 16:14
… now is more stringent about linting. This commit attempts to get the pysindy code up to the new standards.
@akaptano
Copy link
Collaborator

Number 8 on the list completed, although other issues with generalizedlibraries + ensembling + control inputs probably are lingering.

@znicolaou
Copy link
Collaborator Author

Thanks for all the organizing, Alan! I just finished up adding comments and descriptions for numbers 1-3, and pushing in a moment. I think I'll check out number 4 and 9 either later tonight or tomorrow!

@akaptano
Copy link
Collaborator

Awesome job, I'll be done with the changes getting 6 and 8 working in a moment. I would put items 7 and 9 on the low-priority list, although I think item 7 is big payoff for minimal extra work.

@znicolaou
Copy link
Collaborator Author

I took a first pass at 4 above, implementing the score and predict in the weak form. I've only done the weak form equivalents (not using the coefficients in the derivative form to make a score more comparable to the PDELibrary score, as discussed in #155 ). I'm sure some more work needs to be done for the GeneralizedLibrary case and the case with control inputs (I'm not sure if the WeakPDELibrary has been tested at all with control yet). The SINDy.simulate also needs updating.

@akaptano
Copy link
Collaborator

Cool I'll try to tie up some of these loose ends tomorrow night. Go team!

@znicolaou
Copy link
Collaborator Author

Okay, I did a pretty big push to organized points 1-4 above!

The score and predict functions are working adequately in the weak case and pass the unit tests, but pysindy.py is a bit disorganized in general. In particular, to evaluate the derivatives x_dot in the fit and score functions if it is not provided by the user, there are many checks and reshapings for weak and generalized libraries, discrete time, and multiple trajectories that are almost identical in various functions. It would probably be better to expand the validate_input and validate_control_variables functions to consolidate all these checks and reshapings. The cases where the input has only one input_feature and the user provides a 1d array of shape n_t rather than an array of shape (nt, 1) is also handled with various hacks now. Updating the validate input functions and performing them at the beginning of each function would make things much cleaner.

@akaptano
Copy link
Collaborator

akaptano commented Apr 26, 2022

Completely agree with what you said that at some point we will need to invest the work to get all the validation/reshaping fixed. Model.score looks good but model.predict is predicting only the weak form of x_dot (which is fine for model.predict but not helpful if someone wants to take their fitted model and predict a new trajectory. Same issue with model.simulate).

I think we could get more useful versions of model.predict and model.simulate (which relies on model.predict) working by building a non-weak model from weak form (defaulting to finite differences -- this is basically the "hack" I do in the example 12 notebook but we would be hiding everything in the backend). What is annoying about this is that the PySINDy predict function calls the sklearn predict function through self.model.predict. So somehow we have to fix this sklearn "Pipeline" object to return the right thing (or circumnavigate it if the weak form is being used). I am worried about this issue because we recommend users use this method for noisy data but it is quite unwieldy to take it and run with it. Of course if PDEs are being identified, users have to go elsewhere for numerical solvers anyways.

This might be more than we bargained for, so maybe let's fix item 5 if it is still an issue, and consider doing item 7 before the merge. But I think we are getting very close!

@znicolaou
Copy link
Collaborator Author

Yeah, sounds good to defer some reorganizing beyond this pull request. I think we've made good progress, and as long as we get things functional, we should move forward.

I also agree predict and simulate in the weak form should be expanded. Note that score in the weak form is currently relying on predict as well. In the score case, it may make sense to report the score based on the weak features, since it will reflect the "denoised" score that the weak form is designed for. But using the weak form for simulate would be confusing indeed. We may want flags to enable multiple behaviors in the weak case, but maybe we should deal with that in the future.

@akaptano
Copy link
Collaborator

This sounds good, although I'll give the SINDy-PI incorporation into the PDE libraries a go. The SINDy-PI part of the code is really underdeveloped and think I'll see how far I can push things in a few hours this week.

Anything else you would like to do/check before the merge? Looks like code coverage has decreased slightly so might be worth some more unit tests of all the validation/reshaping.

@znicolaou
Copy link
Collaborator Author

I think I am pretty much done for the moment-I just added a little more detail to the class description yesterday. Some more unit tests would probably be good for the reshaping in the weak and control cases, but not sure I have bandwidth in the next week or two for it. Maybe we organize a few pending tasks and start a new branch after merging to work in?

@akaptano akaptano merged commit d4e64c4 into master Apr 30, 2022
@akaptano akaptano deleted the weak_optimization branch April 30, 2022 20:14
jpcurbelo pushed a commit to jpcurbelo/pysindy_fork that referenced this pull request Apr 30, 2024
Enhanced subdomain integration for the weak form library
jpcurbelo pushed a commit to jpcurbelo/pysindy_fork that referenced this pull request May 9, 2024
* Fix metrics df function

* Fix docstring
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants