-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove research experiments from pysindy #461
Comments
@Jacob-Stevens-Haas Thanks for the very thoughtful post -- I agree with essentially everything here. I wonder if there is a way to have a pysindy repository that is stable (we can strip out all the researchy-topics like the trapping functionality) and a pysindy repository that is for development by researchers. My worry with this is that if every person who makes a new change to some SINDy method creates a new fork/repo of PySINDy, it will be hard to keep PySINDy up-to-date with the latest and best variants. But you're right that we also don't want every person with a half-baked idea and okay coding skills (talking about myself here :)) making updates that break a bunch of other things in the code. For a historical perspective, the state of the Kutz/Brunton lab before PySINDy was that every new grad student and post-doc wrote their own code full of bugs and conflicts with other codes, so it has been a real boon to getting these SINDy-based methods working without reinventing the wheel (and to me, the "wheel" nowadays should be using weak form SINDy with ensembling...). On a related noted, I have been wanting to get a version of PySINDy implemented that uses the multi-step prediction error as the loss term, but this requires totally circumnavigating the whole sklearn pipeline, so I have been considering writing a repo based on PySINDy but with all the sklearn stuff stripped out (as well as some of the other "extras" that I think are unnecessary... like multiple trajectory concatenation or non-PDE libraries since the PDELibrary does everything now anyways). This is anyways what @KKahirman and Joe Bakarji had to do for their projects with the multi-step prediction error, but their codes are not compatible currently with PySINDy functionalities. I would really like to do multi-step prediction error with ensembling and weak form and all the other advances. Happy to discuss these topics more here if anyone has more thoughts on this! |
+1 from me on the proposal. From my end, the main notebook I contributed was the cavity flow one. I'm not sure if a (stripped down) version of that would be good to keep as a "neato" fluid dynamics case (no fancy features or dependencies, just basic SINDy). Should I open a new ticket to discuss that? In brief, that notebook is not really serving any of the three main purposes you highlighted, so I'd propose either
|
Thanks for your comments guys. To clarify, @akaptano I do think the advanced research methods like Trapping SINDy have value in the repo... best case in point is that I only discovered its issues because it was an attractive method for me as a user in another project 😅 I think we can provide better stability by bumping versions more frequently and faithfully to SemVer. Personally, I'm wary of bumping and releasing a version because I feel uncomfortable with a version release that breaks notebooks, which is part of the reason for this post. As for multi-step prediction error: I also would love to have that too, and currently feel awkward writing about the heuristic score of "how the discovered system looks". We could have an issue dedicated to that so we can link to use cases and people who have rolled their own implementations. I couldn't see what @KKahirman did, if it's in his public repo. @jcallaham I think that's a great example in its full glory. It might have been confusing in the issue post, but I can handle a more robust version of (1) that still builds the notebook into the documentation, but as part of a separate repo that depends upon a pinned pysindy version. (2) has a simpler option: Save it as examples/14_cavity_flow/example.ipynb, save it again as a python file examples/14_cavity_flow/example.py. Then, have a cell up front that determines input data and important constants, e.g. integration tolerances, integration times, data source. if __name__ == "testing":
data = {
"t": np.arange(0, 1, .1),
"x": np.random.normal(size=(<small array of correct ndim>)
}
t_sim = np.arange(0, .5, .1)
...
else:
data = loadmat("data/cavityPOD.mat")
t_sim = np.arange(0, 300, dt_rom) The repo currently configures pytest to skip notebook scripts locally, but include them in CI. |
Hi, thanks for the pin. I agree with @Jacob-Stevens-Haas: complex examples should be move to a different repository and each example should have a set of pinned dependencies. |
An aside - for those who have multiple notebooks in a single directory, sphinx chooses a notebook to build into the documentation randomly (well, sphinx's sorting of filenames). |
I'm very late to the party, but I also agree with the solution that @Jacob-Stevens-Haas has proposed. |
Is your feature request related to a problem? Please describe.
Documenting the second part of @znicolaou and my discussion on #433, of which #449 was the first part.
This repository has served two purposes: one, as a central point for members of the research group to post jupyter notebooks of their published research papers, and two, to allow others to rapidly reuse innovations from SINDy research for comparison, method extension, or for actual system discovery. I refer to these desiderata as Extensibility, Reproducibility, and Reusability.
Ideally, there is no inherent conflict between these aspirations: as all packages from time to time need to be able to make breaking changes in order to keep the code maintainable, and one can achieve reproducibility for an actual research experiment by pinning all dependencies/using a lockfile. However, the experiments also serve as documentation about how to use the code in interesting ways (a mix of tutorials and how-to-guides). Further, they aspire to a more expansive definition of reproducibility, allowing someone to verify experimental results using the most up-to-date syntax. But because they are slow and not in CI, they're frequently broken by changes (most obviously by changing the contract between caller and callee - see #347, which removed the
quiet
argument fromSINDy.fit()
, but also occur more subtlely when a default value or return information changes (e.g. #458)).As a side note, examples/ is huge, which makes any improvements to the build process or dependencies pointing to this github repo very slow, e.g.
The situation is harder, as current notebooks do not have pinned, reproducible dependencies, so research isn't, strictly speaking, reproducible. It also becomes difficult to troubleshoot what change caused an error in a notebook as we do not CI the notebooks. Since running the notebooks takes ages, it's infeasible to promote small PRs and also run the notebooks every single PR. As a result, notebooks quietly break. That could hurt the careers of researchers who want reviewers and collaborators to be able to reproduce their research and don't expect version changes or are unfamiliar with software concepts like semantic versioning and refactoring.
Any long-term solution needs to be maintainable. It needs to allow researchers to keep their experiment up to date with pysindy major version bumps, if they want to put the work in. On the other hand, pysindy maintainers need to know whether a change breaks any calling conventions used in documentation.
Describe the solution you'd like
Notebooks that describe how to use a feature stay part of this repository, but under the README guidelines that ensure maintainability. Notebooks that are a complete research paper go to their own repo, but can still get linked in the documentation (or even built fully into the documentation). A good heuristic is whether an author is OK with maintainers changing the ODEs/PDEs, alternate methods, and parameters presented in the notebook. E.g. a demonstration of how to use a pysindy feature and where it should be used is a good how-to guide/tutorial that this package should maintain. An in-depth comparison of how different pysindy methods perform in bounding some property of some particular PDE system is a research notebook.
Implementation
Describe alternatives you've considered
Keep the notebooks but let them break. Checking compatibility of any patch to an ever-growing list of notebooks is infeasible, and maintaining the advertisement of someone else's research is the job of only the author. At the same time, setting up the infrastructure for researchers to plug into pysindy development is a bit of work as well.
Fix it fast. I am also ok jumping from 3 to 4 without any waiting period, leaving the addition of
how-to
notebooks for future documentation PRs.Additional context
CCing everyone who's been involved @akaptano @znicolaou @eigensteve @briandesilva @jcallaham @MPeng5 @kpchamp @OliviaZ0826 @yb6599 @Ohjeah @nathankutz
I'm not sure the propsed solution is correct, and I want to get people's input.
The text was updated successfully, but these errors were encountered: