Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CiviCRM: Figure out how to sanitize the CiviCRM database #828

Closed
jenlampton opened this issue Sep 24, 2021 · 25 comments
Closed

CiviCRM: Figure out how to sanitize the CiviCRM database #828

jenlampton opened this issue Sep 24, 2021 · 25 comments
Assignees

Comments

@jenlampton
Copy link
Member

We'll need to establish a process for working with the backdropcms.org site locally, that will include disabling civi in the normal daily sanitized backups, as well as coming up with a way to sanitize the Civi database for those who do need to do work with Civi.

@laryn
Copy link

laryn commented Sep 24, 2021

One possibility, from the GDPR extension:

"The right to be forgotten, allowing users of CiviCRM to easily anonymise a contact record, hiding any person details but keeping the financial and other history. The action also exists as an API and therefore can be bolted into other processes."

https://civicrm.org/extensions/gdpr

This code/process can probably be adapted to copy the database, run the anonymization process, and dump it. (Process confirmed by @mikeymjco on the CiviCRM Mattermost chat)

@jackaponte
Copy link

On #789 @bugfolder flagged that we'll need to figure this out before we enable CiviCRM on b.org for the first time:

one of the tasks before we start populating Civi with data on the live b.org site is a strategy for sanitization of CiviCRM data for people building local versions of b.org from sanitized data/files. If we distribute sanitized Civi dbs, we'll need to be sure to sanitize any custom data fields we create (e.g., Full Name), in addition to Civi's built-in First/Middle/Last, etc.

I'm unfamiliar with the current process for generating and distributing sanitized b.org data and files; perhaps the CiviCRM side of this can be handled the same way?

@jenlampton
Copy link
Member Author

In today's meeting we discussed how we could move forward without sanitizing the civi database, since that seems like it may be a monumental task, since pretty much everything in civi is Personally Identifying Information.

We decided that we could just not make the Civi database available to anyone who wants to work on the Backdrop site, and only grant it to those who need to work on Civi -- or who are working on parts of the site that integrate with civi.

In order to do this we may need to disable the Civi module in the sanitized backups for b.org, so that everyone else who's working on the main b.org site won't have any issues when they set things up locally.

@bugfolder
Copy link
Contributor

Is it possible to share the script that's used to sanitize the b.org db? (a) I'm curious, (b) this might provide a template for doing the same for Civi.

@laryn
Copy link

laryn commented Dec 7, 2021

Was this also considered? (sorry I couldn't make the meeting today)

#828 (comment)

@jenlampton
Copy link
Member Author

jenlampton commented Dec 7, 2021

@laryn we only discussed "how to make this not a blocker" today. We didn't get into possible options of how to get it done yet :)

Is it possible to share the script that's used to sanitize the b.org db?

@bugfolder I think it might already be out there somewhere? I do think we could probably use it as a template for doing the same for Civi. @larsdesigns would know more.

@bugfolder
Copy link
Contributor

it [sanitization script] might already be out there somewhere?...

Yeah, I was hoping that someone who knows where it is could share it. Presumably either privately or also appropriately sanitized (since it would necessarily contain db &/or other credentials).

My thinking was that since it sounds like we're going to start fairly small in what we collect, there will be relatively few fields that need sanitization, but then as we add more functionality to our Civi install, we can just add the newly affected (or, for custom fields, created) tables to the script.

@larsdesigns
Copy link
Contributor

Well, perhaps we do not make the civicrm database available for download? We could instead sanitize (remove) configuration that requires it from the backdrop.org sanitized database and files.

Unless it is deemed necessary to provide a sanitized CiviCRM database for development reasons. I cannot think of any development reasons though.

@larsdesigns
Copy link
Contributor

larsdesigns commented Dec 8, 2021

@bugfolder This is the project repository that is being used for the sanitization: https://github.com/serundeputy/sql-dump-sanitize edit: now https://github.com/backdrop-ops/sql-dump-sanitize

@bugfolder
Copy link
Contributor

Unless it is deemed necessary to provide a sanitized CiviCRM database for development reasons. I cannot think of any development reasons though.

Initially, there probably won't be. However, a plausible scenario where it would be is if we're collecting any CiviCRM fields on the user registration page via a CiviCRM Profile, and we want to develop something else on the user registration page (like anti-spam checks). Then we'd probably need Civi working to provide the profile form on the page.

This isn't a blocker for getting Civi up and running by any means (there's no immediate need for it). We decided at today's meeting that initially we could either disable Civi for local builds or make the actual Civi db available to the small number of devs. Rather, I'm just looking ahead to the time when we will need a sanitized db to do local development.

This is the project repository that is being used for the sanitization...

Thanks, that's what I was looking for!

@jackaponte
Copy link

Getting back in the loop here -- we use https://github.com/scoobird/org.civicrm.contrib.anonymize at Palante to sanitize our Civi databases; perhaps that can be used and/or adapted for our purposes here?

@jackaponte
Copy link

Let's try to get this issue to resolution and/or to a point where it's not blocking progress on #789!

It seems like we've got two proposals so far:

  1. Only make the Civi database available to people working on Civi or parts of the site that interact with Civi; this would likely mean disabling Civi in the automated sanitized backups for b.org.
  2. Use a script like https://github.com/scoobird/org.civicrm.contrib.anonymize or something similar to sanitize the Civi database and make it available alongside the automated sanitized b.org Backdrop backups.

Any thoughts on which approach seems best, either in the short term to remove the blocker or in the long term?

@jenlampton
Copy link
Member Author

jenlampton commented Feb 24, 2022

if we have someone available / interested in working on the script, I'd prefer #2. If not, we should go with #1, and updated documentation on how to get a local copy of b.org up and running without the civi database. (And I can help with that documentation)

@jenlampton
Copy link
Member Author

We're hoping to have a closer look at the script to see if that would work nicely with sanitize.backdropcms.org, and if so, make the civi database available there too. @larsdesigns has volunteered to review the script for is. Thank you!

@larsdesigns larsdesigns self-assigned this Mar 14, 2022
@larsdesigns
Copy link
Contributor

@BWPanda, would you be interested in collaborating with me on this?

@ghost
Copy link

ghost commented Mar 14, 2022

@larsdesigns Possibly. What do you need?

@larsdesigns
Copy link
Contributor

larsdesigns commented Mar 14, 2022

@BWPanda, Add you as a reviewer when I open a PR?

@bugfolder bugfolder changed the title Figure out how to sanitize the CiviCRM database CiviCRM: Figure out how to sanitize the CiviCRM database Nov 12, 2022
@bugfolder
Copy link
Contributor

Backing up and sanitizing are both done by functions in /home/backdrop/sanitized_databases, so sanitization should probably be addressed together with backing up, which is #963.

@larsdesigns larsdesigns assigned bugfolder and unassigned larsdesigns Nov 17, 2022
@larsdesigns
Copy link
Contributor

Robert, thank you so much for taking this on.

Handing this off to @bugfolder.

@bugfolder
Copy link
Contributor

@larsdesigns, @jenlampton, I have created a PR to the sql-dump-sanitize repo that adds both backing up and sanitization of the CiviCRM db.

It uses four new config.ini values of the form *_CIVI, which you can see in the current config.ini on b.org (which I've also updated). The current (old) script on b.org ignores those new values, so should still run. But if we update b.org to the new script, it should pick up those values and backup/sanitize the CiviCRM database, putting its sanitized backups in a new folder, sanitized_civi (parallel to the existing folder sanitized).

I have tested this script on a local setup, and it works. So, after you've reviewed the code, I'd like to try out the new script on b.org.

I think the script will still work on the non-CiviCRM properties (e.g., docs, forum, events); we just don't include the *_CIVI values in their respective config.ini files, and no CiviCRM backups will be attempted.

A note on sanitization strategy. I modified the sanitization of Backdrop account emails to be "user+$uid@localhost", so that I could easily ensure that sanitized CiviCRM email addresses in the civicrm_address and civicrm_uf_match tables were the same where appropriate.

So when you get a chance, please take a look and let me know what you think. (And happy 4th day of post-solstice!)

@jenlampton
Copy link
Member Author

@bugfolder this looks fantastic. I added one request for a change to the PR (just to update or remove an inline code comment) but that can safely be ignored :) Thank you for working on this!

@bugfolder
Copy link
Contributor

Change made. A higher power than me is needed to merge the PR ;o).

@jenlampton
Copy link
Member Author

PR merged :D

@bugfolder
Copy link
Contributor

Sanitized dbs are being created and are exposed on the sanitize.backdropcms.org site. Calling this one done.

@larsdesigns
Copy link
Contributor

@bugfolder ++, Nice work! Thank you so much for getting this done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants