Adopt a reader for live-streamed Interactive MD (IMD) version 3 trajectories #4827

ljwoods2 · 2024-12-11T20:39:50Z

Is your feature request related to a problem?

Streaming trajectories directly into MDAnalysis is one of the MDA 3.0 milestones. Specifically, streaming a trajectory directly from a simulation engine into MDA enables a few keys things that aren't otherwise possible:

Performing high time resolution analyses without generating massive trajectory files (for example, measuring ions crossing a membrane)
Writing only a sub-selection of the system into a trajectory file with an MDA writer (not a built-in feature in all simulation engines), removing an intermediate step
(In the future) Inputting forces calculated in Python with MDAnalysis back into the simulation engine to be applied to atoms in the simulation

The Interactive MD (IMD) version 3 protocol, which builds on the IMDv2 protocol already implemented in GROMACS, NAMD, and LAMMPS, is the most effective way to achieve this "in-situ" analysis since it requires only minimal changes from the working IMDv2 implementations made to work with VMD.

MDAnalysis can quickly gain the ability to read a trajectory via this protocol using the reader API by adopting the code in imdclient, an MDAKit built in a collaboration between @orbeckst and @HeydenLabASU.

Already, we have a working implementation in GROMACS, LAMMPS, and NAMD (private) which are integration tested against the IMDClient in the imdclient repo:

Describe the solution you'd like

MDAnalysis adopts the IMDReader, brings on imdclient as an optional dependency

I propose that the core MDAnalysis codebase adopt the IMDReader as a trajectory reader with imdclient (which contains the IMDClient) as an optional dependency for MDAnalysis exactly in the same way h5py is an optional dependency that allows using the H5MDReader

The IMDClient handles the protocol communication at a binary level, wrapping it with a simple API:

get_imdframe(): returns a singular IMDFrame (similar to a Timestep) of data from the simulation engine
get_imdsessioninfo(): returns the data the user can expect in each IMDFrame
stop(): ensures socket cleanup occurs

The IMDReader is just essentially a wrapper/adapter for the IMDClient which makes it compatible with the MDAnalysis reader API.

This way of structuring the code allows offloading the complicated simulation engine integration tests (which include docker containers and take quite a while to run) to the imdclient MDAKit dependency and allows other people outside of MDAnalysis to use the IMDv3 protocol without needing the IMDReader.

Currently, both the IMDReader and IMDClient are in the imdclient MDAKit for development purposes.

Protocol + integration tests go in the imdclient MDAKit, but reader API tests go in MDAnalysis

I also propose splitting up the tests into lightweight reader api tests that work with a "dummy" simulation engine (https://github.com/Becksteinlab/imdclient/blob/main/imdclient/tests/test_imdreader.py) and the more complicated protocol and simulation engine integration tests (protocol https://github.com/Becksteinlab/imdclient/blob/main/imdclient/tests/test_imdclient.py) (integration https://github.com/Becksteinlab/imdclient/blob/main/imdclient/tests/base.py).

MDAnalysis can adopt the lightweight tests that are relevant to MDAnalysis, and the imdclient MDAKit can keep everything else.

Summary

Things that move to MDAnalysis:

Stream reader base class, which makes MDAnalysis broadly compatible with one-way streams
IMDReader, which inherits from the Stream Reader base class
IMDReader tests, which use a dummy server to allow running the standard Reader API tests.

Things that stay in imdclient, the proposed dependency:

IMDClient, which handles the protocol
All other heavyweight testing

Describe alternatives you've considered

Keep the IMDReader in the imdclient repository, never adopting it in the main codebase.

The text was updated successfully, but these errors were encountered:

orbeckst · 2024-12-11T21:56:05Z

Thank you for the detailed proposal.

I am in favor of moving the light-weight IMDReader into MDAnalysis.coordinates as part of our effort to make MDAnalysis fully streaming capable.

What are additional dependencies that would come with IMDReader, i.e., which packages outside the standard dependencies that MDAnalysis already pulls in?

Do we already have a conda-forge package for IMDClient?

orbeckst · 2024-12-11T22:21:38Z

For quick overview, see @ljwoods2 's presentation 03-Streaming_MDAnalysis_Functionality-Woods.ipynb from the IMD Streaming Workshop.

ljwoods2 · 2024-12-12T00:13:15Z

@orbeckst We wouldn't be introducing any new dependencies- the imdclient package uses nothing except numpy arrays and the python standard library. In the future, we may want to rewrite parts of the IMDClient in cython (if needed after benchmarking), but that's nothing new.

@hcho38 is currently working on conda, will check in with him

hcho38 · 2024-12-12T20:50:33Z

Here is the link to the conda-forge recipe:

https://github.com/hcho38/imd-staged-recipes/tree/develop

Here is the link to the pull request:

conda-forge/staged-recipes#28554

@ljwoods2 @amruthesht can you guys leave a comment on the PR confirming you are willing to be listed as one of the maintainers?

orbeckst · 2024-12-18T18:41:34Z

Thank you for getting the conda-forge package up, @hcho38 !

jaclark5 · 2024-12-20T02:16:33Z

It looks like with the IMDReader you'll have to pull utils.parse_host_port along with it, but nothing else in the MDAKit is referencing it so it shouldn't be an issue, so relatively clean break! I expect you'll need this function in MDAnalysis for testing a dummy server (but I could be wrong). Although, since the idea is that IMDClient could be used independently from MDAnalysis, do you also need a copy in IMDClient for those users?

It seems like parse_host_port is specific to the client address for the IMDReader. Is it's worth considering how alternatives to IMDClient could leverage the IMDReader in MDAnalysis and what the requirements for that interface looks like? Or is making this aspect generalizable not a priority?

yuxuanzhuang · 2024-12-20T03:33:42Z

Just a side note, I love InThreadIMDServer from the tests! It’s making testing so much smoother for me on macOS without running simulations :) I think it’d be a good idea to add some documentation for other developers to ultilize it---or even rescuing it from tests?

ljwoods2 · 2024-12-20T20:35:51Z

@jaclark5 Good point with parse_host_port- I would suggest keeping it in the imdclient package and importing it into MDA, though since it is a fairly trivial method I'd be okay with just duplicating it as well.

As for generalizing the IMDReader to use another client other than IMDClient (if I'm understanding you right), I don't think that's flexibility I see us needing. Is that what you meant?

ljwoods2 · 2024-12-20T20:44:42Z

@yuxuanzhuang I like the idea of being able to serve any (even prewritten) trajectories via IMD, so I'm open to this idea. As-is, it's fairly test-specific (with methods like expect_packet) and a little janky with hardcoded placeholder data, so I'd ideally want to make a similar class for that use case.

Opened Becksteinlab/imdclient#48 for this

hmacdope · 2025-01-05T20:45:53Z

Hi all, sorry it has taken me so long to get to this, and thanks for the comments @jaclark5 and @yuxuanzhuang.

I agree with basically everything said so far. The rapid iteration part of developing the IMDReader is largely over and I think adopting a base class + IMDReader into core is the right move, allowing generalisation, tight integration and make streaming a first class citizen.

As already pointed out I think it seems to be a fairly clean break, with minimal deps introduced. I am happy to go ahead with this as the plan. I think the main work for core will be in deciding on a path forward for #4828

A few points

Once we merge into core we probably want to avoid thrashing on the interface between the two codes, as this will involve repeated PRs to core, which can be a bit slow. This means that we should sort out known issues or changes we would like to make before making the transition.
Compatibility should be marked with a release.
Aim to keep testing burden low, the core testsuite is already large.
Lets wait for a bit of discussion on Make the AnalysisBase compatible with live simulation streams #4828 first, as we may need to make some changes to the way reader works for streaming / AnalysisBase compatibility.

IAlibay · 2025-01-06T13:00:43Z

Apologies I'm just coming off the holiday break and have not had a lot of time to think this through yet.

I agree that handling trajectory streams is a v3.0 target for MDAnalysis, but I'm also starting to feel like this is exactly the use case for having MDAKits - nothing presented here seems to indicate that we need to have this in core, and everthing is somewhat optional.

If it's not too much to ask, could someone possibly lay out what advantages we gain from having this in core over just having this as an optional MDAKit that folks need to install alongisde the core library?

ljwoods2 mentioned this issue Dec 11, 2024

Make the AnalysisBase compatible with live simulation streams #4828

Open

orbeckst added enhancement Component-Readers streaming labels Dec 11, 2024

orbeckst added the decision needed requires input from developers before moving further label Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adopt a reader for live-streamed Interactive MD (IMD) version 3 trajectories #4827

Adopt a reader for live-streamed Interactive MD (IMD) version 3 trajectories #4827

ljwoods2 commented Dec 11, 2024 •

edited

Loading

orbeckst commented Dec 11, 2024

orbeckst commented Dec 11, 2024

ljwoods2 commented Dec 12, 2024 •

edited

Loading

hcho38 commented Dec 12, 2024 •

edited

Loading

orbeckst commented Dec 18, 2024

jaclark5 commented Dec 20, 2024

yuxuanzhuang commented Dec 20, 2024

ljwoods2 commented Dec 20, 2024

ljwoods2 commented Dec 20, 2024

hmacdope commented Jan 5, 2025

IAlibay commented Jan 6, 2025

Adopt a reader for live-streamed Interactive MD (IMD) version 3 trajectories #4827

Adopt a reader for live-streamed Interactive MD (IMD) version 3 trajectories #4827

Comments

ljwoods2 commented Dec 11, 2024 • edited Loading

Is your feature request related to a problem?

Describe the solution you'd like

MDAnalysis adopts the IMDReader, brings on imdclient as an optional dependency

Protocol + integration tests go in the imdclient MDAKit, but reader API tests go in MDAnalysis

Summary

Describe alternatives you've considered

orbeckst commented Dec 11, 2024

orbeckst commented Dec 11, 2024

ljwoods2 commented Dec 12, 2024 • edited Loading

hcho38 commented Dec 12, 2024 • edited Loading

orbeckst commented Dec 18, 2024

jaclark5 commented Dec 20, 2024

yuxuanzhuang commented Dec 20, 2024

ljwoods2 commented Dec 20, 2024

ljwoods2 commented Dec 20, 2024

hmacdope commented Jan 5, 2025

IAlibay commented Jan 6, 2025

ljwoods2 commented Dec 11, 2024 •

edited

Loading

ljwoods2 commented Dec 12, 2024 •

edited

Loading

hcho38 commented Dec 12, 2024 •

edited

Loading