-
Notifications
You must be signed in to change notification settings - Fork 100
Persist on event may persist events again in case of disaster recovery #385
Comments
A possible fix for this bug is to use the combination emitter-id and emitter sequence number as delivery id and store this in a dedicated field of a |
This fix sounds reasonable to me as it uses a unique event identifier and not the non-unique local sequence number for referencing events that caused a Instead of using the We still need to make sure to preserve the proper emission order of We therefore need to store the new identifier in addition to the currently used The new identifier must be stored in a new For backwards compatibility we have to deal with the following two situations:
|
Sounds good to me. |
Instead of using the in case of disaster recovery potentially unstable sequence number as id for persist on event requests use a stable EventId that is composed of the sequence number of the emitter of the event and the corresponding process id. Closes #385
Instead of using the in case of disaster recovery potentially unstable sequence number as id for persist on event requests use a stable EventId that is composed of the sequence number of the emitter of the event and the corresponding process id. Closes #385
Eventuate support event driven communication through the
PersistOnEvent
trait. Its implementation is based on the ideas of reliable delivery and essentially it uses the sequence number of the currently handled event as delivery id. However the sequence number of events might not be stable in case of disaster recovery.Imagine for example two locations A and B replicating all events. A emits E1 (in A seqNo 1), B emits E2 (in B seqNo 1). Once B receives E1 (in B seqNo 2) it emits (persistOnEvent) E3 with persistOnEventSequenceNr = 2. B's log contains now: E2 E1 E3 If B is restarted, replaying E1 leads to another persistOnEvent request with persistOnEventSequenceNr = 2 that is confirmed by the replayed E3 (and thus E3 is not emitted again).
In case of a disaster on B, B recovers events from A. However in A's log events are in a different order and disaster recovery for B ends with the log E1 E2 E3. When E1 is replayed this time a persistOnEvent with persistOnEventSequenceNr = 1 is requested and this request is not confirmed by the replayed E3 (as it just confirms a persistOnEvent request with persistOnEventSequenceNr = 2) and thus E3 is emitted again (as E4). Note that the next replay will not emit E3 again as its persistOnEvent request is confirmed by E4.
The text was updated successfully, but these errors were encountered: