Improve state consistency around observers and next/handled event sequence number for when multiple partitions are working and for some reason the server goes down #1682

einari · 2025-01-29T09:59:18Z

Today we store the NextSequenceNumber on the ObserverState. If you have something using AppendMany() with multiple partitions and for some reason the server dies in the middle of handling the events, it will not know what events it has handled.

To remedy this situation we should do a couple of things:

State

Introduce a separate state (not inside ObserverState) were we store the EventSequenceNumber of any events being handled while handling them. This means, we store the events sequence number before we handle it. After we've handled it, we can then remove this state.

Special state for the StateMachine

During startup / subscription of an observer, we would need to look at the specialized Observer state and see if we have any of these. We would then enter a CatchUpPartitions (or similar) type of state where we catch up the partitions that needs to be caught up. Once they are caught up, we would enter the Routing state.

The text was updated successfully, but these errors were encountered:

einari added reliability Capabilities related to guaranteeing reliability in a running system typically related to up-time observers Issues related to event sequence observers labels Jan 29, 2025

einari moved this to Todo in Current Work Jan 31, 2025

einari added this to Current Work Jan 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve state consistency around observers and next/handled event sequence number for when multiple partitions are working and for some reason the server goes down #1682

Improve state consistency around observers and next/handled event sequence number for when multiple partitions are working and for some reason the server goes down #1682

einari commented Jan 29, 2025

Improve state consistency around observers and next/handled event sequence number for when multiple partitions are working and for some reason the server goes down #1682

Improve state consistency around observers and next/handled event sequence number for when multiple partitions are working and for some reason the server goes down #1682

Comments

einari commented Jan 29, 2025

State

Special state for the StateMachine