Audit logging #59

davenquinn · 2024-07-12T18:15:59Z

As we start to plan for user-contributed changes to Macrostrat, it will become important to have reversible change logs for some tables.

Possible approaches

Build "versions" into the relevant tables/schemas

We could essentially make versioning a first-class part of Macrostrat's data model, by adding columns denoting versioning metadata to all relevant tables. The following should just about cover it:

user
modified_time
superseded_by -> link to another version

This would allow lots of control, but continuous replacement of records would make maintaining foreign key relationships complicated. Since all table structures would have to change, we'd probably have to port all Macrostrat data to new schemas, and replace the current tables with views that mimic the current structure. Triggers may be needed to keep foreign keys 'fresh'

It will be possible to make this change while preserving current table structures in a view-only manner, but it may increase the complexity of Macrostrat's database storage system substantially, at least for tables like unit, strat_name_concept, legend, etc. which would likely all need to become versioned tables.

Use an external audit/logging trail system

There are many systems designed to create a (sometimes reversible) audit trail for PostgreSQL databases. As a rule these work by adding triggers to tables and dumping a JSON record encapsulating changes for every table operation on the targeted tables, keeping the actual record at the most current version. A few potential systems:

PGAudit: This seems to be the industry leader and has a cute logo 😄 It might be a lot for our needs though.
PGMemento seems fairly targeted to our use case; we've used it for Sparrow but haven't built many features on top of it.
No-frills systems like Supabase's Audit system might be easier to understand and implement. (As a rule I trust the Supabase team's architectural intuitions.)

These systems generally have the advantage of presenting a single-timepoint schema, rather than lots of versioned tables. This will probably make interacting with Macrostrat's schema considerably easier.

There might be less flexibility or integration with user access control.
Some systems require PostgreSQL extensions to function properly, which might reduce portability of Macrostrat's database or add complexity to backups.
This is a major external dependency that may open us to compatibility issues and supply-chain risk

Summary

Key features:

Should be able to re-apply changesets to tables in order to revert changes.
Should be able to attribute changes to specific users.
Should be able to build APIs to report change history
Should be straightforward to back up and restore change history

Maybe features:

Omit versioning system for simple installations of Macrostrat
Easy querying of previous versions of data

General considerations:

This is a major decision impacting Macrostrat's future capabilities, system complexity, and dependencies. It should be approached with care!
It might be possible to use a mix of approaches, if both are implemented in a straightforward way.
Perhaps there are other approaches I am missing.

Any input is appreciated!

The text was updated successfully, but these errors were encountered:

cambro · 2024-07-12T19:02:05Z

This is an important consideration. I'm not sure it is necessary to implement full-on reversibility for all data in Macrostrat.

One option, that I tried to implement a bit in v1, is to have working and essentially final versions of columns. This was one of the main motivations for cols.status_code. This allows anything with the right code to be explicitly change-prone and "untracked" in that sense, some to be explicitly legacy, and some to be the "official" working version. This doesn't satisfy what you are talking about here with real versioning, but I fear implementing versioning could become a major resource sink that will get in way of rapidly getting to point where new data can flow.

davenquinn · 2024-07-12T20:56:31Z

I agree that full reversibility is not important. And this is definitely only relevant for a certain subset of tables. But I would argue it is incredibly important to at least have a sense of when things were changed, broadly what was updated, and by whom. - As you know there are plenty of instances of actual errors that need correcting in Macrostrat’s core dataset - If more than a handful of contributors are making changes, having people overwriting each others’ work without knowledge is a real risk - If we are ingesting existing datasets (e.g., maps) but also correcting typos etc. it will be useful to have a sense of who is making what change. - Some things don’t have an obvious single answer and there is potential down the line for edit wars over geologically significant details Figuring out how to have some level of change tracking without making the system unwieldy is an important balance to strike.

davenquinn · 2024-07-12T21:49:53Z

Topical: restoring the SGP database on my local machine reminds me that SGP uses pg_audit, apparently

This could be a good opportunity to understand that solution a little bit and potentially use something that's been widely adopted.

davenquinn assigned cambro, mmcclenn, brianaydemir, amyfromandi and davenquinn Jul 12, 2024

davenquinn added enhancement New feature or request v2-transition labels Jul 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audit logging #59

Audit logging #59

davenquinn commented Jul 12, 2024 •

edited

Loading

cambro commented Jul 12, 2024

davenquinn commented Jul 12, 2024 via email •

edited

Loading

davenquinn commented Jul 12, 2024

Audit logging #59

Audit logging #59

Comments

davenquinn commented Jul 12, 2024 • edited Loading

Possible approaches

Build "versions" into the relevant tables/schemas

Use an external audit/logging trail system

Summary

cambro commented Jul 12, 2024

davenquinn commented Jul 12, 2024 via email • edited Loading

davenquinn commented Jul 12, 2024

davenquinn commented Jul 12, 2024 •

edited

Loading

davenquinn commented Jul 12, 2024 via email •

edited

Loading