Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audit logging #59

Open
davenquinn opened this issue Jul 12, 2024 · 3 comments
Open

Audit logging #59

davenquinn opened this issue Jul 12, 2024 · 3 comments
Assignees
Labels
enhancement New feature or request v2-transition

Comments

@davenquinn
Copy link
Member

davenquinn commented Jul 12, 2024

As we start to plan for user-contributed changes to Macrostrat, it will become important to have reversible change logs for some tables.

Possible approaches

Build "versions" into the relevant tables/schemas

We could essentially make versioning a first-class part of Macrostrat's data model, by adding columns denoting versioning metadata to all relevant tables. The following should just about cover it:

  • user
  • modified_time
  • superseded_by -> link to another version

This would allow lots of control, but continuous replacement of records would make maintaining foreign key relationships complicated. Since all table structures would have to change, we'd probably have to port all Macrostrat data to new schemas, and replace the current tables with views that mimic the current structure. Triggers may be needed to keep foreign keys 'fresh'

It will be possible to make this change while preserving current table structures in a view-only manner, but it may increase the complexity of Macrostrat's database storage system substantially, at least for tables like unit, strat_name_concept, legend, etc. which would likely all need to become versioned tables.

Use an external audit/logging trail system

There are many systems designed to create a (sometimes reversible) audit trail for PostgreSQL databases. As a rule these work by adding triggers to tables and dumping a JSON record encapsulating changes for every table operation on the targeted tables, keeping the actual record at the most current version. A few potential systems:

  • PGAudit: This seems to be the industry leader and has a cute logo 😄 It might be a lot for our needs though.
  • PGMemento seems fairly targeted to our use case; we've used it for Sparrow but haven't built many features on top of it.
  • No-frills systems like Supabase's Audit system might be easier to understand and implement. (As a rule I trust the Supabase team's architectural intuitions.)

These systems generally have the advantage of presenting a single-timepoint schema, rather than lots of versioned tables. This will probably make interacting with Macrostrat's schema considerably easier.

  • There might be less flexibility or integration with user access control.
  • Some systems require PostgreSQL extensions to function properly, which might reduce portability of Macrostrat's database or add complexity to backups.
  • This is a major external dependency that may open us to compatibility issues and supply-chain risk

Summary

Key features:

  • Should be able to re-apply changesets to tables in order to revert changes.
  • Should be able to attribute changes to specific users.
  • Should be able to build APIs to report change history
  • Should be straightforward to back up and restore change history

Maybe features:

  • Omit versioning system for simple installations of Macrostrat
  • Easy querying of previous versions of data

General considerations:

  • This is a major decision impacting Macrostrat's future capabilities, system complexity, and dependencies. It should be approached with care!
  • It might be possible to use a mix of approaches, if both are implemented in a straightforward way.
  • Perhaps there are other approaches I am missing.

Any input is appreciated!

@cambro
Copy link
Member

cambro commented Jul 12, 2024

This is an important consideration. I'm not sure it is necessary to implement full-on reversibility for all data in Macrostrat.

One option, that I tried to implement a bit in v1, is to have working and essentially final versions of columns. This was one of the main motivations for cols.status_code. This allows anything with the right code to be explicitly change-prone and "untracked" in that sense, some to be explicitly legacy, and some to be the "official" working version. This doesn't satisfy what you are talking about here with real versioning, but I fear implementing versioning could become a major resource sink that will get in way of rapidly getting to point where new data can flow.

@davenquinn
Copy link
Member Author

davenquinn commented Jul 12, 2024 via email

@davenquinn
Copy link
Member Author

Topical: restoring the SGP database on my local machine reminds me that SGP uses pg_audit, apparently

image

This could be a good opportunity to understand that solution a little bit and potentially use something that's been widely adopted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request v2-transition
Projects
None yet
Development

No branches or pull requests

5 participants