Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archiving Event Streams #1612

Closed
jeremydmiller opened this issue Nov 18, 2020 · 4 comments
Closed

Archiving Event Streams #1612

jeremydmiller opened this issue Nov 18, 2020 · 4 comments
Milestone

Comments

@jeremydmiller
Copy link
Member

More soon to come...

@jeremydmiller
Copy link
Member Author

Per conversation w/ @oskardudycz & @babuannamalai -->

  • Have "mt_streams_archive" and "mt_events_archive" tables
  • On a MaybeArchive kind of action on the projection support
  • Instead, what about using Postgresql partitioning to shove it into a different partition. Flip a status to "archived"?
  • If using the partition support, this is gonna require changes to the Linq support for IEvent

@oskardudycz
Copy link
Collaborator

oskardudycz commented Apr 8, 2021

@jeremydmiller I think that still, we'll need to provide an option to mark events as deleted. Besides removing the whole stream, someone may want to just archive events older than specific events. It's a common practice to send a summary "end of booking day" event that checkpoints the current state and allows to move the events to the "cold storage". Such events should be available for moving to some other storage (e.g. for compliance reasons).

I think that it'd be good to have such metadata both on the events and stream information.

Using partitioning, it's tempting to simplify operational behaviour, as it allows out of the box detaching partition. However, I need to verify at first how this will behave for already existing data.

Depending on how we approach that, this might not be a breaking change (if we add it as an opt-in).

@jeremydmiller
Copy link
Member Author

Assumptions, Questions, & Given Tasks

  • This has to be done at the stream level, and not at the event by event level
  • We'll provide an explicit IEventStore.ArchiveStream(Guid/string) method, and a matching IStorageOperation for both string- and Guid-identified streams. That part is easy.
  • Should there should be a way in the AggregateProjection to trip off that an entire stream should be archived? Or do we make that an explicit thing through IEventStore only?. We need to seriously consider how we'd delete any related aggregates. Does archiving the stream mean a related projected view should be deleted?
  • Do we have some kind of IEventStore.ArchiveWhere(Expression<StreamSTate>) functionality? Or maybe just document how to do it manually

Do this through partitioning or by a separate table?

If by partitioning...

  • Add partitioning to Weasel (in flight regardless)
  • One time, "special" migration to rename, recreate, and copy the event data into the new partitioned tables. This might be an opt-in feature because of the partitioning disruption. Might do this as part of the Marten.CommandLine. Have a new upgrade command? Or marten-migrate command?
  • Default filter on the Linq querying against raw events to exclude archived events
  • Default filter on the Async daemon shards to exclude archived events.
  • Default filter on the Async daemon high water detection??? Not sure that's necessary, but might make the process faster by telling it it can ignore the potentially much larger archived event table
  • Linq extension methods to explicitly query for archived events or for archived + not archived events similar to what we already do for soft-deleted documents

If by a separate table

  • New mt_streams_archive and mt_events_archive, lock the structure to the parent tables, or use table inheritance
  • IStorageOperation to move an event stream to the new tables. With all the variability of the event & stream tables, I think this would be codegen'd. I think I'd vote to add a generated Stored Procedure to do the moving events around
  • New Linq methods to query archived events & streams

Evaluation

I've gone back and forth quite a bit. At this point, I think I'm back to leaning partitioning, but that's dependent upon us believing that the one time data migration to enable the partitioning is okay

@jeremydmiller
Copy link
Member Author

jeremydmiller commented May 11, 2021

We're going w/ the partition strategy. So, tasking:

  • Add is_archived column to streams table w/ default false
  • Add is_archived column to events table
  • New IsNotArchivedFilter
  • IsNotArchivedFilter is used by the async daemon shards
  • IsNotArchivedFilter is automatically used by the Linq querying of the event table when querying all events
  • IsNotArchivedFilter is automatically used by the Linq querying of the event table when querying a specific type of events
  • IsNotArchivedFilter is used by the AggregateStream() methods
  • Explicit operation for archiving a stream by string stream key
  • Explicit operation for archiving a stream by string stream id
  • Linq operator to search for archived filters, so ArchivedEventFilter
  • Linq operator to search for all events
  • Event.IsArchived
  • Can read IsArchived when selecting events
  • Do a test that the new column can be added compared to the v3 schema
  • New mt_archive_stream function
  • Map StreamState.IsArchived in FetchStreamState
  • StreamState.IsArchived

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants