Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve state consistency around observers and next/handled event sequence number for when multiple partitions are working and for some reason the server goes down #1682

Open
einari opened this issue Jan 29, 2025 · 0 comments
Labels
observers Issues related to event sequence observers reliability Capabilities related to guaranteeing reliability in a running system typically related to up-time

Comments

@einari
Copy link
Contributor

einari commented Jan 29, 2025

Today we store the NextSequenceNumber on the ObserverState. If you have something using AppendMany() with multiple partitions and for some reason the server dies in the middle of handling the events, it will not know what events it has handled.

To remedy this situation we should do a couple of things:

State

Introduce a separate state (not inside ObserverState) were we store the EventSequenceNumber of any events being handled while handling them. This means, we store the events sequence number before we handle it. After we've handled it, we can then remove this state.

Special state for the StateMachine

During startup / subscription of an observer, we would need to look at the specialized Observer state and see if we have any of these. We would then enter a CatchUpPartitions (or similar) type of state where we catch up the partitions that needs to be caught up. Once they are caught up, we would enter the Routing state.

@einari einari added reliability Capabilities related to guaranteeing reliability in a running system typically related to up-time observers Issues related to event sequence observers labels Jan 29, 2025
@einari einari moved this to Todo in Current Work Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
observers Issues related to event sequence observers reliability Capabilities related to guaranteeing reliability in a running system typically related to up-time
Projects
Status: Todo
Development

No branches or pull requests

1 participant