-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Real time analysis information file #561
Conversation
…/openmmtools into realtime-analysis-file-output
So far we have a basic/direct way of storing the information in the YAML file, I tried doing it from the reporter but I couldn't find an easy way to do it, so it's just a direct write to YAML file now. I wanted to go through the reporter machinery to avoid synchronization issues with it, I think that's still desired. We have to double check if the variables we are storing are actually the ones we want, according to choderalab/perses#916. I have to fix the tests, there are some serialization/deserialization things that have to be changed in the tests to handle the new |
@ijpulidos Good timing! I've simulated this here https://github.com/choderalab/perses/pull/972/files#diff-2381ec1f5bdefb7f1bcd38a72d62f7e4d01cd05a04fde359f1c6d782058161c4R60-R72 |
But I agree, we will want to make sure we don't have to handel file locking and race conditions, so we should figure out a way to use a reporter |
@ijpulidos can you post an example of what the real time analysis fie will look like? Then we can tag John in and see what he thinks. One thought I had is that we probably want to append to this file each iteration instead of overwriting it. |
Oh and we should also have a section of the yaml document that has the units of each value so we can programmatically read what the units are instead of assuming them |
Right now, the units are baked into the names of the fields. In a future update, it would be useful to use a scheme similar to YANK or what OpenFF is moving toward so that we can read/write units programmatically. |
@mikemhenry Sure, an example of it can be found here There are a couple of issues with codeclimate that I might spend a bit of thinking to fix them, but this is good for review and for comments/suggestions. @jchodera @mikemhenry can you review it? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Just have some minor requests for changes, but good to merge after that.
) | ||
estimated_finish_date = datetime.datetime.now() + estimated_timedelta_remaining | ||
self._timing_data["estimated_time_remaining"] = str(estimated_timedelta_remaining) # Putting it in dict as str | ||
self._timing_data["estimated_iso_finish_date"] = estimated_finish_date.strftime("%Y-%b-%d-%H:%M:%S") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The time zone is not specified. This should be something like estimated_localtime_finish_date
if you're using local time.
total_time_in_seconds = datetime.timedelta( | ||
seconds=self._timing_data["average_seconds_per_iteration"] * iteration_limit | ||
) | ||
self._timing_data["estimated_total_iso_time"] = str(total_time_in_seconds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need iso
in these tags.
"percent_complete": self._iteration*100/self.number_of_iterations, | ||
"mbar_analysis": {"free_energy_in_kT": float(free_energy), | ||
"standard_error_in_kT": float(self._last_err_free_energy), | ||
"uncorrelated_samples": float(analysis._equilibration_data[-1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use number_of_uncorrelated_samples
.
Is there anything else in _equilibration_data
we could report too, like the initial production iteration or statistical inefficiency?
"uncorrelated_samples": float(analysis._equilibration_data[-1]) | ||
}, | ||
"timing_data": self._timing_data, | ||
"ns_per_day": performance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't ns_per_day
appear under the timing_data
block?
|
||
""" | ||
# We don't want to restore reporter and timing data attributes | ||
__NON_RESTORABLE_ATTRIBUTES__ = ("_reporter", "_timing_data") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an elegant solution! Thanks for adding this!
Dictionary with the key, value pairs to store in YAML format. | ||
""" | ||
reporter_dir, _ = os.path.split(self._storage_analysis_file_path) | ||
output_filepath = f"{reporter_dir}/real_time_analysis.yaml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you list the path of the YAML file that will be generated in the docstring as well?
Don't worry about code climate @ijpulidos, didn't get a ton of input on this question https://openforcefieldgroup.slack.com/archives/CM9BA4AVA/p1647459523274309 but I'm in favor dropping things like code climate in favor for some CLI tool we can wire into CI like pyflakes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Maybe add a test that runs a few iterations then opens the document to make sure that the number of iterations match
@ijpulidos Oh and somewhere can you add an example yaml that this produces? Maybe somewhere here https://openmmtools.readthedocs.io/en/latest/devtutorial.html |
The windows test is failing but it seems like it has nothing to do with the changes in this PR, as far as I can tell. I'm merging this as discussed. |
Description
The purpose of this PR is to have a way to export a human readable file with real time analysis information.
CC choderalab/perses#916
The main idea is taking the information straight from the reporter to avoid redundancy and synchronizing problems.
So far I've settled into using YAML for this file but I am still open for suggestions. With a yaml file being easy to append/update without having to read the whole file contents (as opposed to JSON).
Todos
Status