Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deleting / anonymizing data #1674

Closed
3 of 5 tasks
TheSlimvReal opened this issue Jan 20, 2023 · 7 comments
Closed
3 of 5 tasks

Deleting / anonymizing data #1674

TheSlimvReal opened this issue Jan 20, 2023 · 7 comments
Assignees
Labels
released on @master managed by CI (semantic-release) released managed by CI (semantic-release) Type: Feature new user-facing feature Type: Security

Comments

@TheSlimvReal
Copy link
Collaborator

TheSlimvReal commented Jan 20, 2023

Goal

It must be possible to completely delete personal data from the system and the database.
It should be possible to only delete personal data while keeping statistically relevant data in the system.

Status quo

When deleting an entity the personal data is not directly visible in the app anymore. However, the old revisions of this document still exist. This makes is possible to receive the already deleted data. By calling the _compact endpoint of the CouchDB the non-leave revisions are fully deleted.

Requirements

  • anonymization method where predefined set of values are set to undefined; make this configurable?
  • UI button that allows "Anonymization" of a record (as an alternative next to deleting it completely), make the difference intuitively clear to users
  • permissions (distinct between delete and anonymize)
  • delete attachments (if property is cleared)
  • _compact endpoint needs to be called on a regular basis to properly clear the deleted data
  • do a cascading delete / anonymization; see Cascading delete (remove related entities / references when deleting an entity) #220
@sleidig sleidig moved this from Triage to Postponed / On Hold in All Tasks & Issues Apr 24, 2023
@sleidig sleidig moved this from Postponed / On Hold to Priority (Core Team) in All Tasks & Issues Sep 5, 2023
@sleidig sleidig changed the title Deleting/annonymizing data Deleting / anonymizing data Sep 21, 2023
@sleidig sleidig self-assigned this Sep 21, 2023
@sleidig sleidig moved this from Priority (Core Team) to In Progress in All Tasks & Issues Sep 21, 2023
@sleidig
Copy link
Member

sleidig commented Sep 22, 2023

UX Drafts:
image

"Anonymize" = removing (most) properties of the entity + "Archive"
An anonymized record is always "archived" (i.e. hidden from lists by default) and cannot be reactivated also.

@sleidig
Copy link
Member

sleidig commented Sep 22, 2023

Config format:

Entities need to be configurable regarding what data is removed / retained when "anonymizing" a record.

additional flag in entity config anonymize
with possible options:

  • remove
  • retain
  • retain-anonymized (e.g. keep year of birth but remove day + month to remove personal identifiable details)

What should be the default (i.e. if this is not defined for a property in the config)?

Consider all properties by default to be removed? This would be along the lines of data protection by default. It may be easy to miss / forget to flag a personal data field as such.

Would we normally have more properties to remove or to retain?
Looking at the actual use cases and setups, there often are a lot of fields for contact details that have to be deleted. Also, often only some selected properties are at all relevant for reports, so other details could be removed without loosing much.

--> remove by default, flag explicitly to retain after anonymization?

example config:

"entity:Person": {
  "attributes": [
    {
      "name": "lastname",
      "schema": {
        "label": "Lastname",
        "dataType": "string",
        "anonymize": "delete" // flag a property as "personal data" to be removed during anonymization (could also be omitted as default)
      }
    },
    {
      "name": "dateOfBirth",
      "schema": {
        "label": "Date of Birth",
        "dataType": "date-only",
        "anonymize": "retain-anonymized" // do not delete completely but anonymize by keeping only the year
      }
    },
    {
      "name": "city",
      "schema": {
        "label": "City",
        "dataType": "string"
        "anonymize": "retain" // flag to keep and not remove during anonymization
      }
    }
  ]
}

Implementation

  • Update entity, setting all existing properties to "undefined" and saving the entity.
    • does this delete attachment files also?
  • Make sure it is readonly and "isActive = false" from now on (?) (anonymized entities are always "archived" and cannot be reactivated manually)
  • introduce a new permission action "anonymize" to give fine-grained control who can delete vs anonymize?
  • special logic for cascading anonymization of related records (see below comment)

@sleidig
Copy link
Member

sleidig commented Sep 22, 2023

GDPR: Personal Data, Pseudonomyzation, Anonymization

  • GDPR is not applicable to anonymous data: "The principles of data protection should therefore not apply to [...] personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable." GDPR Recital 26
    • "To determine whether a natural person is identifiable, account should be taken of all the means reasonably likely to be used, such as singling out, either by the controller or by another person to identify the natural person directly or indirectly."
    • "To ascertain whether means are reasonably likely to be used to identify the natural person, account should be taken of all objective factors, such as the costs of and the amount of time required for identification, taking into consideration the available technology at the time of the processing and technological developments."
  • "Pseudonymisation enables the personal data to become unidentifiable unless more information is available whereas anonymization allows the processing of personal data to irreversibly prevent re-identification." source
  • good overview of anonymization misunderstandings and considerations

In the case of records being retained "anonymized" in Aam Digital, we provide a context that makes re-identification even harder:

  • only authorized users of the system can access even the anonymized record (where only a few properties have been retained). Unless the organisation actively shares the data, it remains as securely protected as the personal data managed in Aam Digital.
  • those authorized users with access to the anonymized records (and therefor a theoretical chance to attempt re-identification) are team members of an organization. They have been screened to be responsible persons and are usually legally bound to keep information confidential.
  • by default only a few, explicitly selected properties in anonymized records are retained (data minimization by default). As such, both re-identification likelihood and the impact in case of re-identification are reduced as far as possible.

--> If our anonymization process is configured thoughfully on a case by case basis to only retain a few data fields that are not easy indirect identifiers, it seems reasonably unlikely that the person can be identified after the anonymization process. Therefore, GDPR should not apply to these records and it is legitimate to retain these for statistical reporting.

@sleidig sleidig moved this to In Progress in Feature Roadmap Sep 25, 2023
@sleidig sleidig added the Type: Feature new user-facing feature label Sep 25, 2023
@TheSlimvReal
Copy link
Collaborator Author

I like the approach and I think the default -> delete is very sensible. Especially as most retain properties are entity references that we anyway hardcoded. In my understanding I imagine the anonymisation as archive + delete properties?

What I am still missing here is how the cascading anonymisation would work. Can we just have the same rules as for deleting there? As we shouldn't remove any entity references to still allow retrieval of participation details.

@sleidig
Copy link
Member

sleidig commented Sep 25, 2023

Very good points, the cascade indeed doesn't seem to work exactly like cascading delete out of the box.
I'll start writing unit test cases for all this.

Cascading anonymization

Related entities will also have to be delete or anonymized to meet GDPR requirements (e.g. Notes about the person whose entity gets anonymized).
The implementation is using the logic of #220 (for related entity S , linking to entity P that is anonymized by the user) - only with the "anonymize" function instead of "delete" is triggered cascadingly:

  • if the reference to the anonymized records at the related entity has the role primary --> anonymize (or delete) the related (secondary) entity
    • example: a Child is anonymized. The ChildSchoolRelation is cascadingly anonymized because its ref to the Child is "primary". This in turn triggers the anonymization of the School, if that property of ChildSchoolRelation is marked as "anonymize": "retain-anonymized"
  • if it has the role secondary --> keep the reference to our anonymized entity and leave the related entity unchanged
    • example: a Note gets anonymized. If some Task or Participant entity references the note (although in our model the ref is usually on the Note), that link to the (now anonymized) Note entity remains intact.
  • if it has the role review --> the same manual review / blocking gets triggered as for cascading delete action

The possible scenarios for cascading delete work out for anonymization as follows:

  • P (the entity being initially anonymized) has a property that references another entity

    • --> schema defines if ref is removed completely, retained (no cascading action) or retain-anonymized (trigger cascadingly anonymization of the referenced entity)
  • P is referenced by another ("secondary") entity S1 (exclusively, i.e. that secondary entity does not reference any further entities)

    • depending on the ref type ("secondary" / "primary"; see #220): if "primary", the cascading anonymize goes on; if "secondary" the related entity remains unchanged
    • this relationship does not appear in P's schema config, so we can't configure which anonymization mode (remove/retain) should happen for this related record overall ...
      • configure on the schema of S1?
      • or add a (not written to db) property to P for this?
      • or add it to the schema of S1 with something like `anonymizeReverse: "remove"?
    • example: Note.children role: "primary", anonymize: "retain", anonymizeReverse: "remove"
      • if the Note gets anonymized, keep the references to Child entities
      • if the Child gets anonymized, delete the Note entity
  • P is referenced by another entity S2, where P is the only reference in that property but S2 has another property that also references some other entity

    • --> same logic/config to define whether cascading action is triggered (#220)
    • example: a ChildSchoolRelation referencing only P in children but another entity in schools would get anonymized if either the child or school is anonymized (assuming both are marked as "primary")
  • P is referenced by another entity S3 which also references other primary entities in the same property

    • example: a Note referencing multiple children, one of which is P that is being anonymized --> ❓
    • remove the id to P from the Note? (but that will lose some statistics)
    • anonymize S3? (but that might delete important details for the other children)
    • infer the "review" mode requiring manual action?

sleidig added a commit that referenced this issue Sep 25, 2023
in preparation of implementing anonymization (#1674)

---------
This functionality has been developed for the project “codo”.
codo is developed under the projects “Landungsbrücken – Patenschaften in Hamburg stärken” and “openTransfer Patenschaften”. It is funded through the program “Menschen stärken Menschen” by the German Federal Ministry of Family Affairs, Senior Citizens, Women and Youth.
More information at https://github.com/codo-mentoring

“Landungsbrücken – Patenschaften in Hamburg stärken” is a project of BürgerStiftung Hamburg in cooperation with the Mentor.Ring Hamburg. With a mix of networking opportunities, capacity building and financial support the project strengthens Hamburg’s scene of mentoring projects since its founding in 2016.

The “Stiftung Bürgermut” foundation since 2007 supports the digital and real exchange of experiences and connections of active citizens. Within the federal program “Menschen stärken Menschen” the foundation as part of its program “openTransfer Patenschaften” offers support services for connecting, spreading and upskilling mentoring organisations across Germany.

Diese Funktion wurde entwickelt für das Projekt codo.
codo wird entwickelt im Rahmen der Projekte Landungsbrücken – Patenschaften in Hamburg stärken und openTransfer Patenschaften. Er ist gefördert durch das Bundesprogramm Menschen stärken Menschen des Bundesministeriums für Familie, Senioren, Frauen und Jugend.
Mehr Informationen unter https://github.com/codo-mentoring

“Landungsbrücken – Patenschaften in Hamburg stärken” ist ein Projekt der BürgerStiftung Hamburg in Kooperation mit dem Mentor.Ring Hamburg. Mit einer Mischung aus Vernetzungsangeboten, Qualifizierungsmaßnahmen und finanzieller Förderung stärkt das Projekt die Hamburger Szene der Patenschaftsprojekte seit der Gründung im Jahr 2016.

Die Stiftung Bürgermut fördert seit 2007 den digitalen und realen Erfahrungsaustausch und die Vernetzung von engagierten Bürger:innen. Innerhalb des Bundesprogramms „Menschen stärken Menschen” bietet die Stiftung im Rahmen ihres Programms openTransfer Patenschaften Unterstützungsleistungen zur Vernetzung, Verbreitung und Qualifizierung von Patenschafts- und Mentoringorganisationen bundesweit.

Co-authored-by: codo-mentoring <117934638+codo-mentoring@users.noreply.github.com>
@sleidig sleidig moved this from In Progress to In Review in All Tasks & Issues Oct 12, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done in Feature Roadmap Oct 18, 2023
@github-project-automation github-project-automation bot moved this from In Review to Done in All Tasks & Issues Oct 18, 2023
@aam-digital-ci
Copy link
Collaborator

🎉 This issue has been resolved in version 3.26.0-master.1 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

@aam-digital-ci aam-digital-ci added the released on @master managed by CI (semantic-release) label Oct 18, 2023
@aam-digital-ci
Copy link
Collaborator

🎉 This issue has been resolved in version 3.26.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

@aam-digital-ci aam-digital-ci added the released managed by CI (semantic-release) label Nov 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
released on @master managed by CI (semantic-release) released managed by CI (semantic-release) Type: Feature new user-facing feature Type: Security
Projects
Archived in project
Archived in project
Development

No branches or pull requests

3 participants