Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adopt the same proces as https://greenmask.io ? #302

Open
jensenbox opened this issue Jul 23, 2024 · 2 comments
Open

Adopt the same proces as https://greenmask.io ? #302

jensenbox opened this issue Jul 23, 2024 · 2 comments

Comments

@jensenbox
Copy link

It seems that while the backup file still contains the unsanitzed data, their process is significantly faster.

Any chance of adopting their methodology instead of the change the data while in flight? Theirs is to mutate the data once it lands in the destination database.

@evoxmusic
Copy link
Contributor

Hi @jensenbox that looks quite interesting. I think we can fix the performance issues by working on the lexer parser to have low memory footprints. I've got some hints, but it's a matter of time. Did you try GreenMask? Are the performances much faster?

@vchervanev
Copy link

vchervanev commented Nov 9, 2024

@evoxmusic As I understand their solution completely excludes SQL parsing bc their data payloads are coming from the Postgres COPY command, meaning for a transformation it only needs to split the input string and the input value is ready to be deserialized and transformed.

Also they use a 3-step approach

  • pg_dump schema-only --section pre-data & restore -- create empty tables with no indexes, triggers, etc
  • custom COPY-based export & restore -- arguably that's the fastest possible way to restore. Low parsing overhead, lowest possible insert overhead.
  • pg_dump --section post-data & restore -- finalize import by restoring indexes, constraints, foreign-keys(?), etc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants