Real time analysis information file #561

ijpulidos · 2022-03-18T02:26:40Z

Description

The purpose of this PR is to have a way to export a human readable file with real time analysis information.

The main idea is taking the information straight from the reporter to avoid redundancy and synchronizing problems.

So far I've settled into using YAML for this file but I am still open for suggestions. With a yaml file being easy to append/update without having to read the whole file contents (as opposed to JSON).

Todos

Implement feature / fix bug
Add tests
Update documentation as needed
Update changelogNotable points that this PR has either accomplished or will accomplish.

Status

Ready to go

…/openmmtools into realtime-analysis-file-output

ijpulidos · 2022-03-18T22:36:14Z

So far we have a basic/direct way of storing the information in the YAML file, I tried doing it from the reporter but I couldn't find an easy way to do it, so it's just a direct write to YAML file now. I wanted to go through the reporter machinery to avoid synchronization issues with it, I think that's still desired.

We have to double check if the variables we are storing are actually the ones we want, according to choderalab/perses#916.

I have to fix the tests, there are some serialization/deserialization things that have to be changed in the tests to handle the new _timing_info attribute of the MultiStateSampler.

mikemhenry · 2022-03-18T22:56:58Z

@ijpulidos Good timing! I've simulated this here https://github.com/choderalab/perses/pull/972/files#diff-2381ec1f5bdefb7f1bcd38a72d62f7e4d01cd05a04fde359f1c6d782058161c4R60-R72

mikemhenry · 2022-03-18T22:57:48Z

But I agree, we will want to make sure we don't have to handel file locking and race conditions, so we should figure out a way to use a reporter

mikemhenry · 2022-03-23T19:45:33Z

@ijpulidos can you post an example of what the real time analysis fie will look like? Then we can tag John in and see what he thinks. One thought I had is that we probably want to append to this file each iteration instead of overwriting it.

mikemhenry · 2022-03-23T19:47:17Z

Oh and we should also have a section of the yaml document that has the units of each value so we can programmatically read what the units are instead of assuming them

jchodera · 2022-03-23T22:28:57Z

Oh and we should also have a section of the yaml document that has the units of each value so we can programmatically read what the units are instead of assuming them

Right now, the units are baked into the names of the fields. In a future update, it would be useful to use a scheme similar to YANK or what OpenFF is moving toward so that we can read/write units programmatically.

ijpulidos · 2022-03-23T22:50:57Z

@mikemhenry Sure, an example of it can be found here

There are a couple of issues with codeclimate that I might spend a bit of thinking to fix them, but this is good for review and for comments/suggestions. @jchodera @mikemhenry can you review it? Thanks!

jchodera

Looks great! Just have some minor requests for changes, but good to merge after that.

jchodera · 2022-03-23T23:51:42Z

openmmtools/multistate/multistatesampler.py

+        )
+        estimated_finish_date = datetime.datetime.now() + estimated_timedelta_remaining
+        self._timing_data["estimated_time_remaining"] = str(estimated_timedelta_remaining)  # Putting it in dict as str
+        self._timing_data["estimated_iso_finish_date"] = estimated_finish_date.strftime("%Y-%b-%d-%H:%M:%S")


The time zone is not specified. This should be something like estimated_localtime_finish_date if you're using local time.

jchodera · 2022-03-23T23:51:55Z

openmmtools/multistate/multistatesampler.py

+        total_time_in_seconds = datetime.timedelta(
+            seconds=self._timing_data["average_seconds_per_iteration"] * iteration_limit
+        )
+        self._timing_data["estimated_total_iso_time"] = str(total_time_in_seconds)


I don't think you need iso in these tags.

jchodera · 2022-03-23T23:53:10Z

openmmtools/multistate/multistatesampler.py

+                     "percent_complete": self._iteration*100/self.number_of_iterations,
+                     "mbar_analysis": {"free_energy_in_kT": float(free_energy),
+                                       "standard_error_in_kT": float(self._last_err_free_energy),
+                                       "uncorrelated_samples": float(analysis._equilibration_data[-1])


I'd use number_of_uncorrelated_samples.

Is there anything else in _equilibration_data we could report too, like the initial production iteration or statistical inefficiency?

jchodera · 2022-03-23T23:53:33Z

openmmtools/multistate/multistatesampler.py

+                                       "uncorrelated_samples": float(analysis._equilibration_data[-1])
+                                       },
+                     "timing_data": self._timing_data,
+                     "ns_per_day": performance


Shouldn't ns_per_day appear under the timing_data block?

jchodera · 2022-03-23T23:54:43Z

openmmtools/tests/test_sampling.py


        """
+        # We don't want to restore reporter and timing data attributes
+        __NON_RESTORABLE_ATTRIBUTES__ = ("_reporter", "_timing_data")


This is an elegant solution! Thanks for adding this!

jchodera · 2022-03-23T23:55:45Z

openmmtools/multistate/multistatereporter.py

+            Dictionary with the key, value pairs to store in YAML format.
+        """
+        reporter_dir, _ = os.path.split(self._storage_analysis_file_path)
+        output_filepath = f"{reporter_dir}/real_time_analysis.yaml"


Can you list the path of the YAML file that will be generated in the docstring as well?

mikemhenry · 2022-03-24T20:34:59Z

Don't worry about code climate @ijpulidos, didn't get a ton of input on this question https://openforcefieldgroup.slack.com/archives/CM9BA4AVA/p1647459523274309 but I'm in favor dropping things like code climate in favor for some CLI tool we can wire into CI like pyflakes

mikemhenry

This looks great! Maybe add a test that runs a few iterations then opens the document to make sure that the number of iterations match

mikemhenry · 2022-03-24T20:58:12Z

@ijpulidos Oh and somewhere can you add an example yaml that this produces? Maybe somewhere here https://openmmtools.readthedocs.io/en/latest/devtutorial.html

…its structure.

ijpulidos · 2022-03-25T22:36:03Z

The windows test is failing but it seems like it has nothing to do with the changes in this PR, as far as I can tell. I'm merging this as discussed.

ijpulidos added 7 commits March 17, 2022 22:20

Storing timing information and transmiting to reporter.

1c30366

Merge branch 'main' into realtime-analysis-file-output

15eb81b

Attempt at serializing/deserializing timing data. Not working in tests.

8eea0e3

Writing offline analysis yaml file. WIP.

102a0da

Merge branch 'main' into realtime-analysis-file-output

fe53417

Including performance information in yaml file.

5c25a1b

Merge branch 'realtime-analysis-file-output' of github.com:choderalab…

febbb4e

…/openmmtools into realtime-analysis-file-output

ijpulidos added 4 commits March 18, 2022 19:23

Minor naming changes. Adding std error in free energy computation.

62492a4

Better timing information format.

0d5d602

Simplifying update analysis algorithm. Handling corner cases exceptions.

6991672

Output file is overwritten.

d1b9ee0

ijpulidos added 2 commits March 23, 2022 17:54

Not [de]serializing timing data. Fixing from storage test.

444a6b5

Merge branch 'main' into realtime-analysis-file-output

8dd2a71

ijpulidos requested review from mikemhenry and jchodera March 23, 2022 22:50

jchodera approved these changes Mar 23, 2022

View reviewed changes

mikemhenry approved these changes Mar 24, 2022

View reviewed changes

ijpulidos added 4 commits March 24, 2022 23:55

Performance in timing data. Minor changes based on review suggestions.

a5008ca

Basic testing for new real time analysis yaml file.

9a4e794

Removing useless/debug print.

9ba6c83

Documentation mentioning the real time yaml analysis output file and …

1837ed1

…its structure.

ijpulidos merged commit 91a4b98 into main Mar 25, 2022

ijpulidos deleted the realtime-analysis-file-output branch March 25, 2022 22:37

ijpulidos mentioned this pull request Aug 2, 2022

Realtime analysis output failing when output has no subdirectory #615

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Real time analysis information file #561

Real time analysis information file #561

ijpulidos commented Mar 18, 2022 •

edited

Loading

ijpulidos commented Mar 18, 2022

mikemhenry commented Mar 18, 2022

mikemhenry commented Mar 18, 2022

mikemhenry commented Mar 23, 2022

mikemhenry commented Mar 23, 2022 •

edited

Loading

jchodera commented Mar 23, 2022

ijpulidos commented Mar 23, 2022

jchodera left a comment

jchodera Mar 23, 2022

jchodera Mar 23, 2022

jchodera Mar 23, 2022

jchodera Mar 23, 2022

jchodera Mar 23, 2022

jchodera Mar 23, 2022

mikemhenry commented Mar 24, 2022

mikemhenry left a comment

mikemhenry commented Mar 24, 2022 •

edited

Loading

ijpulidos commented Mar 25, 2022

Real time analysis information file #561

Real time analysis information file #561

Conversation

ijpulidos commented Mar 18, 2022 • edited Loading

Description

Todos

Status

ijpulidos commented Mar 18, 2022

mikemhenry commented Mar 18, 2022

mikemhenry commented Mar 18, 2022

mikemhenry commented Mar 23, 2022

mikemhenry commented Mar 23, 2022 • edited Loading

jchodera commented Mar 23, 2022

ijpulidos commented Mar 23, 2022

jchodera left a comment

Choose a reason for hiding this comment

jchodera Mar 23, 2022

Choose a reason for hiding this comment

jchodera Mar 23, 2022

Choose a reason for hiding this comment

jchodera Mar 23, 2022

Choose a reason for hiding this comment

jchodera Mar 23, 2022

Choose a reason for hiding this comment

jchodera Mar 23, 2022

Choose a reason for hiding this comment

jchodera Mar 23, 2022

Choose a reason for hiding this comment

mikemhenry commented Mar 24, 2022

mikemhenry left a comment

Choose a reason for hiding this comment

mikemhenry commented Mar 24, 2022 • edited Loading

ijpulidos commented Mar 25, 2022

ijpulidos commented Mar 18, 2022 •

edited

Loading

mikemhenry commented Mar 23, 2022 •

edited

Loading

mikemhenry commented Mar 24, 2022 •

edited

Loading