-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhanced subdomain integration for the weak form library #153
Conversation
…ed sums and vectorized somewhat
Codecov Report
@@ Coverage Diff @@
## master #153 +/- ##
==========================================
- Coverage 94.96% 93.79% -1.17%
==========================================
Files 33 33
Lines 3178 3322 +144
==========================================
+ Hits 3018 3116 +98
- Misses 160 206 +46
Continue to review full report at Codecov.
|
Hey Zach, are you free for a quick zoom call today? I'm free all day and think it will be easier for me to understand exactly what you did here. Looks amazing though! |
Sure, happy to chat after lunch today-how about we say 2? |
sounds good! I'll shoot you a zoom link on the Slack if that works |
FYI, after benchmarking a little, at least half of the runtime is spent calculating the weights now, which is done in _set_up_grids when you create the WeakPDELibrary object. This is currently accomplished by multiplying the weights along each axis with python loops, and I expect there is a lot of opportunity for vectorization there, but those ragged arrays make it a bit hard. |
sounds good. Heads up other work has picked up substantially, so it might take me a few weeks to get to this. At least you can use it in the meantime for your coarse-grain models. Let me know if there is some aspect that is time-sensitive and I |
Ok, sounds good! I will keep using it and may make a few changes to this branch as I go. |
@akaptano I saw all those issues with the generalized and tensor libraries on the issues -- looks like there's some organizing to do! Also, noticed the branch on the temporal derivatives enhancement, which seems like a good idea. This last week got busy, but hopefully I can get back to finishing up this enhancement soon. I'll try to sort out the places in the tensor/generalized libraries where n_samples needs to be set, and make sure it works with multiple_trajectories too! |
Thanks a lot, and I will try to help in a week or two. Just thought we might as well combine those changes with this pull request, and make a new release when we are confident we have resolved those errors + the new weak form stuff is working well. |
First of all, amazing job and thanks very much for these additions!!! Pushed some minor changes to the notebook and the weak library. Some notes and things to do for both of us (although I will leave better documenting the new techniques for you):
Please update the pull request as you check off items here |
…ed some of the timing and extras in the notebook.
…g issue with click package. See discussion here https://stackoverflow.com/questions/71673404/importerror-cannot-import-name-unicodefun-from-click
… now is more stringent about linting. This commit attempts to get the pysindy code up to the new standards.
Number 8 on the list completed, although other issues with generalizedlibraries + ensembling + control inputs probably are lingering. |
Thanks for all the organizing, Alan! I just finished up adding comments and descriptions for numbers 1-3, and pushing in a moment. I think I'll check out number 4 and 9 either later tonight or tomorrow! |
Awesome job, I'll be done with the changes getting 6 and 8 working in a moment. I would put items 7 and 9 on the low-priority list, although I think item 7 is big payoff for minimal extra work. |
I took a first pass at 4 above, implementing the score and predict in the weak form. I've only done the weak form equivalents (not using the coefficients in the derivative form to make a score more comparable to the PDELibrary score, as discussed in #155 ). I'm sure some more work needs to be done for the GeneralizedLibrary case and the case with control inputs (I'm not sure if the WeakPDELibrary has been tested at all with control yet). The SINDy.simulate also needs updating. |
Cool I'll try to tie up some of these loose ends tomorrow night. Go team! |
Okay, I did a pretty big push to organized points 1-4 above! The score and predict functions are working adequately in the weak case and pass the unit tests, but pysindy.py is a bit disorganized in general. In particular, to evaluate the derivatives x_dot in the fit and score functions if it is not provided by the user, there are many checks and reshapings for weak and generalized libraries, discrete time, and multiple trajectories that are almost identical in various functions. It would probably be better to expand the validate_input and validate_control_variables functions to consolidate all these checks and reshapings. The cases where the input has only one input_feature and the user provides a 1d array of shape n_t rather than an array of shape (nt, 1) is also handled with various hacks now. Updating the validate input functions and performing them at the beginning of each function would make things much cleaner. |
Completely agree with what you said that at some point we will need to invest the work to get all the validation/reshaping fixed. Model.score looks good but model.predict is predicting only the weak form of x_dot (which is fine for model.predict but not helpful if someone wants to take their fitted model and predict a new trajectory. Same issue with model.simulate). I think we could get more useful versions of model.predict and model.simulate (which relies on model.predict) working by building a non-weak model from weak form (defaulting to finite differences -- this is basically the "hack" I do in the example 12 notebook but we would be hiding everything in the backend). What is annoying about this is that the PySINDy predict function calls the sklearn predict function through self.model.predict. So somehow we have to fix this sklearn "Pipeline" object to return the right thing (or circumnavigate it if the weak form is being used). I am worried about this issue because we recommend users use this method for noisy data but it is quite unwieldy to take it and run with it. Of course if PDEs are being identified, users have to go elsewhere for numerical solvers anyways. This might be more than we bargained for, so maybe let's fix item 5 if it is still an issue, and consider doing item 7 before the merge. But I think we are getting very close! |
Yeah, sounds good to defer some reorganizing beyond this pull request. I think we've made good progress, and as long as we get things functional, we should move forward. I also agree predict and simulate in the weak form should be expanded. Note that score in the weak form is currently relying on predict as well. In the score case, it may make sense to report the score based on the weak features, since it will reflect the "denoised" score that the weak form is designed for. But using the weak form for simulate would be confusing indeed. We may want flags to enable multiple behaviors in the weak case, but maybe we should deal with that in the future. |
This sounds good, although I'll give the SINDy-PI incorporation into the PDE libraries a go. The SINDy-PI part of the code is really underdeveloped and think I'll see how far I can push things in a few hours this week. Anything else you would like to do/check before the merge? Looks like code coverage has decreased slightly so might be worth some more unit tests of all the validation/reshaping. |
I think I am pretty much done for the moment-I just added a little more detail to the class description yesterday. Some more unit tests would probably be good for the reshaping in the weak and control cases, but not sure I have bandwidth in the next week or two for it. Maybe we organize a few pending tasks and start a new branch after merging to work in? |
Enhanced subdomain integration for the weak form library
* Fix metrics df function * Fix docstring
I implemented significant improvements to the weak form library! The performance increase depends on the number of spatial dimensions and the space-time resolution, but the example notebook 12_weakform_SINDy_examples.ipynb achieves better results than the old version in about 1/10th the runtime.
The old implementation interpolated the input onto a regular grid of points within each domain cell, as specified by
num_pts_per_domain
. These interpolated values were multiplied by corresponding derivatives of the polynomial test functions and integrated using the trapezoid rule to produce the output features.The crucial observation here is that the linear interpolation and trapezoid rule approximates an integral of a piecewise polynomial in the large
num_pts_per_domain
limit. There is no need to interpolate the data--the integral of the polynomials can be performed exactly on each interval separating space-time points. By deriving the exact expressions for the integrals of the derivatives of the test functions at the space-time points of the input data, we can convert the integral into a weighted sum over the input data. This reduces the number of evaluation to the number of space-time points in the domain cells while achieving the accuracy of thenum_pts_per_domain
to infinity limit. It also allows for vectorized evaluation of each domain-cell integral.In the new code, we find the indices
self.inds_k[k]
of the spatiotemporal grid that lie within the domain cellk
(which was recentered and resized fromself.H_xt
toself.H_xt_k[k]
, as determined by the first a last points along each axis). The weightsweights[k]
are determined from the derivative orders of each term andself.xtilde_k[k]
, which is a rescaled version of the coordinatesXT_k[k]=self.spatiotemporal_grid[np.ix_(*self.inds[k])]
. The values ofx_k[k]=x_shaped[np.ix_(*self.inds[k])]
are weighted and summed to evaluate the integralsnp.sum(x_k[k]*weights[k])
.I am pretty happy with the code now, but there are opportunities for improvement still. Since the size of
self.inds_k[k]
varies withk
, it's challenging to further vectorize, but it's probably possible to do so. If we performedcumsum
on a flattened version of the ragged arrayx_k*weights
, then the sum within each cell could be determined by differences with some clever indexing. This is a little challenging to write, so I haven't seen if it would help. Also, the current implementation uses randomly placed domain cells, but it may be worth looking into slicing the whole spatiotemporal grid into a regular grid of cells.