Skip to content

Conversation

@wchargin
Copy link
Contributor

Summary:
Fixes #4743. A supervisor may use SessionLog.START events whenever it
starts, so the first such event does not indicate a restart and does
not require preemption. Events like graphs sometimes(?) appear before
the first SessionLog.START. This patch changes the accumulator to only
preempt on non-initial SessionLog.START events to retain that data.

Test Plan:
The updated unit test fails before this patch and passes after it. Also,
the dataset in #4743 now shows a graph in the graphs dashboard rather
than an internal 404. For posterity, the salient structure of that event
file comprises this sequence of events, all implicitly at step 0:

{ file_version: "brain.Event:2" }
{ graph_def: "..." }
{ meta_graph_def: "..." }
{ session_log { status: START } }

wchargin-branch: prune-session-restarts-only

Summary:
Fixes #4743. A supervisor may use `SessionLog.START` events whenever it
starts, so the first such event does not indicate a *restart* and does
not require preemption. Events like graphs sometimes(?) appear before
the first `SessionLog.START`. This patch changes the accumulator to only
preempt on non-initial `SessionLog.START` events to retain that data.

Test Plan:
The updated unit test fails before this patch and passes after it. Also,
the dataset in #4743 now shows a graph in the graphs dashboard rather
than an internal 404. For posterity, the salient structure of that event
file comprises this sequence of events, all implicitly at step 0:

    { file_version: "brain.Event:2" }
    { graph_def: "..." }
    { meta_graph_def: "..." }
    { session_log { status: START } }

wchargin-branch: prune-session-restarts-only
wchargin-source: 0c227f6206bd9cdee29217d1b8cece02ec9bfdc8
Because of supervisor threading, it is possible that this logic will
cause the first few event messages to be discarded since supervisor
threading does not guarantee that the START message is deterministically
written first.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be mistaken, but isn't this 2nd paragraph no longer relevant? The "first few event messages" can still be discarded, but that is because of multiple START messages appearing, not because of a START message appearing after the first few messages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I’m not sure that I follow. I interpreted this as meaning, “at the
start of a supervisor invocation, the order of the START event and the
initial summaries may not be deterministic”. So if we have

file_version
graph_def [step: 0]
START [step: 0]
scalar [step: 0]
scalar [step: 1]
// preempted!
scalar [step: 2]
START [step: 2]
scalar [step: 3]

where the step-2 START should have been written before the step-2
scalar, but wasn’t because of “threading reasons”, then I think that
this paragraph is still relevant.

That said, I did some light digging and wasn’t able to find more details
about the circumstances that this refers to, so I could well be missing
something.

@wchargin wchargin merged commit 0f4c2b3 into master Mar 11, 2021
@wchargin wchargin deleted the wchargin-prune-session-restarts-only branch March 11, 2021 02:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Graph events inconsistently evicted after session log START event

2 participants