Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Real time analysis information file #561

Merged
merged 17 commits into from
Mar 25, 2022
Merged

Conversation

ijpulidos
Copy link
Contributor

@ijpulidos ijpulidos commented Mar 18, 2022

Description

The purpose of this PR is to have a way to export a human readable file with real time analysis information.

CC choderalab/perses#916

The main idea is taking the information straight from the reporter to avoid redundancy and synchronizing problems.

So far I've settled into using YAML for this file but I am still open for suggestions. With a yaml file being easy to append/update without having to read the whole file contents (as opposed to JSON).

Todos

  • Implement feature / fix bug
  • Add tests
  • Update documentation as needed
  • Update changelogNotable points that this PR has either accomplished or will accomplish.

Status

  • Ready to go

@ijpulidos
Copy link
Contributor Author

So far we have a basic/direct way of storing the information in the YAML file, I tried doing it from the reporter but I couldn't find an easy way to do it, so it's just a direct write to YAML file now. I wanted to go through the reporter machinery to avoid synchronization issues with it, I think that's still desired.

We have to double check if the variables we are storing are actually the ones we want, according to choderalab/perses#916.

I have to fix the tests, there are some serialization/deserialization things that have to be changed in the tests to handle the new _timing_info attribute of the MultiStateSampler.

@mikemhenry
Copy link
Contributor

@mikemhenry
Copy link
Contributor

But I agree, we will want to make sure we don't have to handel file locking and race conditions, so we should figure out a way to use a reporter

@mikemhenry
Copy link
Contributor

@ijpulidos can you post an example of what the real time analysis fie will look like? Then we can tag John in and see what he thinks. One thought I had is that we probably want to append to this file each iteration instead of overwriting it.

@mikemhenry
Copy link
Contributor

mikemhenry commented Mar 23, 2022

Oh and we should also have a section of the yaml document that has the units of each value so we can programmatically read what the units are instead of assuming them

@jchodera
Copy link
Member

Oh and we should also have a section of the yaml document that has the units of each value so we can programmatically read what the units are instead of assuming them

Right now, the units are baked into the names of the fields. In a future update, it would be useful to use a scheme similar to YANK or what OpenFF is moving toward so that we can read/write units programmatically.

@ijpulidos
Copy link
Contributor Author

@mikemhenry Sure, an example of it can be found here

There are a couple of issues with codeclimate that I might spend a bit of thinking to fix them, but this is good for review and for comments/suggestions. @jchodera @mikemhenry can you review it? Thanks!

Copy link
Member

@jchodera jchodera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Just have some minor requests for changes, but good to merge after that.

)
estimated_finish_date = datetime.datetime.now() + estimated_timedelta_remaining
self._timing_data["estimated_time_remaining"] = str(estimated_timedelta_remaining) # Putting it in dict as str
self._timing_data["estimated_iso_finish_date"] = estimated_finish_date.strftime("%Y-%b-%d-%H:%M:%S")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The time zone is not specified. This should be something like estimated_localtime_finish_date if you're using local time.

total_time_in_seconds = datetime.timedelta(
seconds=self._timing_data["average_seconds_per_iteration"] * iteration_limit
)
self._timing_data["estimated_total_iso_time"] = str(total_time_in_seconds)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need iso in these tags.

"percent_complete": self._iteration*100/self.number_of_iterations,
"mbar_analysis": {"free_energy_in_kT": float(free_energy),
"standard_error_in_kT": float(self._last_err_free_energy),
"uncorrelated_samples": float(analysis._equilibration_data[-1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use number_of_uncorrelated_samples.

Is there anything else in _equilibration_data we could report too, like the initial production iteration or statistical inefficiency?

"uncorrelated_samples": float(analysis._equilibration_data[-1])
},
"timing_data": self._timing_data,
"ns_per_day": performance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't ns_per_day appear under the timing_data block?


"""
# We don't want to restore reporter and timing data attributes
__NON_RESTORABLE_ATTRIBUTES__ = ("_reporter", "_timing_data")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an elegant solution! Thanks for adding this!

Dictionary with the key, value pairs to store in YAML format.
"""
reporter_dir, _ = os.path.split(self._storage_analysis_file_path)
output_filepath = f"{reporter_dir}/real_time_analysis.yaml"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you list the path of the YAML file that will be generated in the docstring as well?

@mikemhenry
Copy link
Contributor

Don't worry about code climate @ijpulidos, didn't get a ton of input on this question https://openforcefieldgroup.slack.com/archives/CM9BA4AVA/p1647459523274309 but I'm in favor dropping things like code climate in favor for some CLI tool we can wire into CI like pyflakes

Copy link
Contributor

@mikemhenry mikemhenry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Maybe add a test that runs a few iterations then opens the document to make sure that the number of iterations match

@mikemhenry
Copy link
Contributor

mikemhenry commented Mar 24, 2022

@ijpulidos Oh and somewhere can you add an example yaml that this produces? Maybe somewhere here https://openmmtools.readthedocs.io/en/latest/devtutorial.html

@ijpulidos
Copy link
Contributor Author

The windows test is failing but it seems like it has nothing to do with the changes in this PR, as far as I can tell. I'm merging this as discussed.

@ijpulidos ijpulidos merged commit 91a4b98 into main Mar 25, 2022
@ijpulidos ijpulidos deleted the realtime-analysis-file-output branch March 25, 2022 22:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants