You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As we start to plan for user-contributed changes to Macrostrat, it will become important to have reversible change logs for some tables.
Possible approaches
Build "versions" into the relevant tables/schemas
We could essentially make versioning a first-class part of Macrostrat's data model, by adding columns denoting versioning metadata to all relevant tables. The following should just about cover it:
user
modified_time
superseded_by -> link to another version
This would allow lots of control, but continuous replacement of records would make maintaining foreign key relationships complicated. Since all table structures would have to change, we'd probably have to port all Macrostrat data to new schemas, and replace the current tables with views that mimic the current structure. Triggers may be needed to keep foreign keys 'fresh'
It will be possible to make this change while preserving current table structures in a view-only manner, but it may increase the complexity of Macrostrat's database storage system substantially, at least for tables like unit, strat_name_concept, legend, etc. which would likely all need to become versioned tables.
Use an external audit/logging trail system
There are many systems designed to create a (sometimes reversible) audit trail for PostgreSQL databases. As a rule these work by adding triggers to tables and dumping a JSON record encapsulating changes for every table operation on the targeted tables, keeping the actual record at the most current version. A few potential systems:
PGAudit: This seems to be the industry leader and has a cute logo 😄 It might be a lot for our needs though.
PGMemento seems fairly targeted to our use case; we've used it for Sparrow but haven't built many features on top of it.
No-frills systems like Supabase's Audit system might be easier to understand and implement. (As a rule I trust the Supabase team's architectural intuitions.)
These systems generally have the advantage of presenting a single-timepoint schema, rather than lots of versioned tables. This will probably make interacting with Macrostrat's schema considerably easier.
There might be less flexibility or integration with user access control.
Some systems require PostgreSQL extensions to function properly, which might reduce portability of Macrostrat's database or add complexity to backups.
This is a major external dependency that may open us to compatibility issues and supply-chain risk
Summary
Key features:
Should be able to re-apply changesets to tables in order to revert changes.
Should be able to attribute changes to specific users.
Should be able to build APIs to report change history
Should be straightforward to back up and restore change history
Maybe features:
Omit versioning system for simple installations of Macrostrat
Easy querying of previous versions of data
General considerations:
This is a major decision impacting Macrostrat's future capabilities, system complexity, and dependencies. It should be approached with care!
It might be possible to use a mix of approaches, if both are implemented in a straightforward way.
Perhaps there are other approaches I am missing.
Any input is appreciated!
The text was updated successfully, but these errors were encountered:
This is an important consideration. I'm not sure it is necessary to implement full-on reversibility for all data in Macrostrat.
One option, that I tried to implement a bit in v1, is to have working and essentially final versions of columns. This was one of the main motivations for cols.status_code. This allows anything with the right code to be explicitly change-prone and "untracked" in that sense, some to be explicitly legacy, and some to be the "official" working version. This doesn't satisfy what you are talking about here with real versioning, but I fear implementing versioning could become a major resource sink that will get in way of rapidly getting to point where new data can flow.
I agree that full reversibility is not important. And this is definitely only relevant for a certain subset of tables. But I would argue it is incredibly important to at least have a sense of when things were changed, broadly what was updated, and by whom.
- As you know there are plenty of instances of actual errors that need correcting in Macrostrat’s core dataset
- If more than a handful of contributors are making changes, having people overwriting each others’ work without knowledge is a real risk
- If we are ingesting existing datasets (e.g., maps) but also correcting typos etc. it will be useful to have a sense of who is making what change.
- Some things don’t have an obvious single answer and there is potential down the line for edit wars over geologically significant details
Figuring out how to have some level of change tracking without making the system unwieldy is an important balance to strike.
As we start to plan for user-contributed changes to Macrostrat, it will become important to have reversible change logs for some tables.
Possible approaches
Build "versions" into the relevant tables/schemas
We could essentially make versioning a first-class part of Macrostrat's data model, by adding columns denoting versioning metadata to all relevant tables. The following should just about cover it:
user
modified_time
superseded_by
-> link to another versionThis would allow lots of control, but continuous replacement of records would make maintaining foreign key relationships complicated. Since all table structures would have to change, we'd probably have to port all Macrostrat data to new schemas, and replace the current tables with views that mimic the current structure. Triggers may be needed to keep foreign keys 'fresh'
It will be possible to make this change while preserving current table structures in a view-only manner, but it may increase the complexity of Macrostrat's database storage system substantially, at least for tables like
unit
,strat_name_concept
,legend
, etc. which would likely all need to become versioned tables.Use an external audit/logging trail system
There are many systems designed to create a (sometimes reversible) audit trail for PostgreSQL databases. As a rule these work by adding triggers to tables and dumping a JSON record encapsulating changes for every table operation on the targeted tables, keeping the actual record at the most current version. A few potential systems:
These systems generally have the advantage of presenting a single-timepoint schema, rather than lots of versioned tables. This will probably make interacting with Macrostrat's schema considerably easier.
Summary
Key features:
Maybe features:
General considerations:
Any input is appreciated!
The text was updated successfully, but these errors were encountered: