-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
The following event file (posted with permission) renders incorrectly in
the graphs dashboard:
graph_failure.tfevents.zip
The /info route thinks that a graph exists, but the /graph route
404s when trying to fetch it.
The salient property of this event file is that it has a graph_def
event followed by a session_log { status: START } event. The latter
event indicates to purge all preceding events across all tags.
This docstring is twice wrong:
- it says “purge all previously seen events with larger steps”, but
the code actually purges events with larger-or-equal steps, which
matters because both thegraph_defand thesession_logare at
step 0; and - it says “all” events, but actually only purges tensors.
The fact that this purge can happen at all means that it is possible for
a time series to have summary metadata but not actually have any data,
which is something that we assumed could not happen (since a normal
preemption event always leaves at least one point in the reservoir). And
the fact that it only purges tensors means that this only started
affecting graphs after #4470, which changed the read path from Graph()
to Tensors("__run_graph__"). These have equivalent input streams due
to dataclass_compat, but only Tensors(...) gets purged.
We should fix the data provider implementation to not list blob
sequences that do not actually have tensor data, so that the graphs
dashboard is internally consistent. But we should maybe also consider
the implications of this more broadly.
(This does not affect --load_fast, because RustBoard does not care
about session_log events.)