-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-index issue: simulations batches and xarray recent versions #193
Comments
@benbovy, do you have a timeline for updating the dependencies to the latest versions of xarray and Python? @xaviernogueira and I are considering using this repo as a foundation for a model refactoring effort. We unfortunately have a short timeline on this first round so need immediate success, but we could see ourselves contributing to this repo. |
@aufdenkampe, I'm glad you are considering using xarray-simlab! I have no particular timeline but I'd like to fix this issue and release version 0.5.1 anyway. What is your timeline? I can try to find some time within the next two weeks. Contributions are welcome too! Regarding this specific issue, there is an easy workaround with the recent versions of Xarray (fastscape-lem/fastscape#26 (comment)) so it shouldn't be a blocker for using xarray-simlab. Apart from this issue, all the tests are passing in my local environment that I've just updated to python 3.11 and the last versions of Xarray, Dask, Zarr and Attrs. |
@benbovy, we have a month to pull off a pilot of the refactor, so we may need to use your work-around. Thanks! |
@benbovy, This repo really looks impressive and has great documentation, so I'm back to reconsidering whether to use My main hesitation at this point is that So my question is whether you see a future where could be actively maintained by you and a community of contributors and users (which might include our team)? I see that you and your team are using |
There were other active users back when I was active here and submitted my pull requests. In the meantime, I've been doing different things and haven't kept up, so I don't know... My PR's were rather complex with a lot of DFS's though, so I think that's why they are still open: Was this level of complexity really desired, since it reduces maintainability? |
@aufdenkampe, Indeed this repo has been very quiet for a while but is still actively used at least via Apart from this minor & non-blocking issue with mutli-indexes and @feefladder's open PRs there hasn't been (much) bug reports or feature requests so it seems to work mostly fine. Xarray-simlab actually tries not to implement too many things but instead leverage as much as possible the capabilities of xarray, dask and zarr, which are mature libraries. I have a few small-ish improvement ideas in mind (mainly focused around better & interactive repr and visualization for
Yes that's right. Those PRs tackle (at least partially) some important limitations that I still find annoying, i.e., the processes in a model cannot be executed in a user-defined order and foreign variables cannot have an "inout" intent. I'm not super happy with the hacks that I used "fastscape" to circumvent this... I think I've made some design mistakes when adding "inout" variables and the "run_step" / "finalize_step" simulation stages. Those PRs are complex indeed and really target the core of xarray-simlab. Although I really appreciate that @feefladder submitted them, it requires a in-depth review that I haven't done yet and also possibly some additional design iterations.
Yes sure! I cannot promise to spend myself a lot of time on maintaining this repository but I'd be happy to see contributors and maintainers getting onboard (including your team!), as long as we can keep xarray-simlab both maintainable and working for its current users (deprecation cycles are OK). |
@benbovy and @feefladder, thanks for your rapid and detailed responses! Together they increase my interest in adopting Our particular use case (i.e. simulating aquatic temperature and nutrient dynamics in a flooding river, using a 2D irregular grid of flows from hydrodynamic models) will likely need both of the limitations that are addressed by @feefladder's PRs ( Should I then perhaps fork and build from @feefladder's fork? Or can we work toward merging those PRs? Fortunately the first module for temperature is much simpler, and doesn't need those capabilities, so we can get started with the upstream master. Also, our long-term goal is to couple with many existing models (using BMI), and one of those models has been implemented in Landlab, which I know was one of your inspirations. That's going to lead some to ask us why we don't just implement these water nutrient processes in Landlab. My thinking is that |
It would be nice to eventually have this functionality available here.
I had in mind of eventually implementing a BMI class that can wrap an
So far I've only used numpy arrays inside xarray-simlab processes and I've used dask through xarray-simlab mainly for running multiple simulations in parallel. However, given how xarray, dask and zarr are tightly integrated between each other there is a good chance that using dask arrays within model processes and/or as model input variables will "just" work (or after small adjustments). That said, xarray-simlab provides a fairly high-level of abstraction that sometimes makes it also difficult to track down performance or other issues, which requires to be careful when setting-up and running simulations. For example, it happened that we were struggling with the very slow execution of some batches of simulations before figuring out that xarray-simlab was just writing too many (and large) model output snapshots at the same time via dask and zarr (IO bottleneck). Also, using dask for both computing process variables and running models in parallel will likely perform badly. Using xarray DataArrays inside processes is discussed in #141. (This discussion is drifting away from the original topic of this issue... shall we move it to a new |
@benbovy, thanks for all these very helpful responses! I agree that this discussion is probably distracting from the named issue. I should have created a new issue from the start! I like the idea of moving it to a new discussion topic, but I think you need to enable it for this repo. By the way, thanks for pointing me to #141. I'm very interested in keeping/leveraging the xarray metadata as much as possible, and trying to move toward a zero-copy approach as much as possible in all my processes. Our current implementation has lots of memory over-runs because we didn't keep track of that very well. |
xref: fastscape-lem/fastscape#26
The text was updated successfully, but these errors were encountered: