-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Support multiple file writers writing at once. #1063
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Just jotting down what we discussed while it's fresh in my head. We keep the essential structure of DirectoryWatcher - Load() spools events from the current _loader until it's exhausted, and then moves on to a new file. However, we change the algorithm for moving on to a new file to do the following:
We're proposing that MAX_INACTIVE_AGE be set to 24 hours (or perhaps slightly more to allow for files written exactly once every 24 hours). OOO_WRITE_CHECK_COUNT is currently 20 but I think it could be lower (like 1 or 2, really). The goal of this new algorithm is that we still proceed with loading files by timestamp order, but once we get to the end of that order, we loop back to the beginning and re-read files again. This handles the case where there are multiple active streams of data in different files. To keep from having to re-read every file, we mark files as "dead" in two different ways: 1) if a file hasn't been written to in MAX_INACTIVE_AGE (as determined by last even time) and 2) if a file is preempted by a newer-timestamped file with the same suffix. However, files that are dead of type 2 we still keep around to optionally check if they have been modified in case we want to detect out-of-order writes. The end result should be that for a static set of files (e.g. a terminated experiment), we do read all the data, but that once we've gone through the initial set of files, we only reload data from an set of active "head" files of the streams that might still be open, which is one stream per suffix for files written to in the last 24 hours. This keeps us from having to keep event file readers open and querying for new data for every single event file, even for very old logdirs. |
@nfelt, your plan as outlined above and then adding tests sounds great. We probably want to merge this change soon to unblock the migration to tf.contrib summaries. Do you want to take over this effort because I'm OOO? |
@chihuahua oh shoot, I forgot you were out starting tomorrow... hmm we probably should have started the code in a separate file so we could check it in and then keep working on tests, etc. I guess I'll just grab a copy of the code from your PR and make a new PR? Unless you have time to quickly change the current PR to target a separate file, and then we could merge it as work-in-progress and I could take over from there. |
Ah yes. Within #1064, I moved the changes to a new suffix_based_directory_watcher.py file (the name is definitely tentative). We can check it in and iterate. Or, you could merge the changes into your own new PR. Thanks for the idea. |
Is this feature still planned to be supported? |
|
I've merged #1867 which introduces the experimental If you've been waiting on this functionality, thanks for your patience, and please try out the new flag in |
Just in time! |
Since in tf-2.0 beta1 the tensorflow.keras.callbacks.TensorBoard() callback seems to relay on that feature, it might be a good idea to enable it by default or (if that is not an option) give a hint to that feature in the warning E0821. |
Tensorboard does not work well when multiple file writers write into the same directory, see [related issue](tensorflow/tensorboard#1063). The monitor creates a file writer with every summary. This pull request resolves this issue by creating a single writer per run and depending on that.
Closing this issue since PR is merged. Please feel free to reopen if this still exist. Thanks |
This issue should stay open, since the flag is still experimental and not on by default yet. |
Background
TensorBoard assigns one
DirectoryWatcher
per run. TheDirectoryWatcher
watches for new events being written to the directory and loads it into TensorBoard's multiplexer. Today, theDirectoryWatcher
iterates through events files (for a run) in lexicographic order. After it has finished reading from an events file, it never reads from it again (and moves on to a subsequent events file).This behavior means that TensorBoard does not support multiple file writers writing to the same run directory: TensorBoard would move on to a different events file B once it finishes reading events file A and would never go back to reading A despite how A is updated.
In turn, that problem blocks TensorFlow+TensorBoard's transition towards migrating
Estimator
, tflearn hooks, and other high-level TensorFlow constructs to using thetf.contrib.summary
-style summaries.tf.contrib.summary
-style summaries are written via a type ofSummaryWriter
different from the one that writes traditional-style (tf.summary.*
) summaries, so users that intermingle the summaries will necessarily have to concurrently use 2 summary writers. Many users thoughout google3 intermingle the summaries, and the transition must happen soon.The text was updated successfully, but these errors were encountered: