Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow hashing or dropping PII from source connectors #1758

Closed
artefactop opened this issue Jan 21, 2021 · 11 comments
Closed

Allow hashing or dropping PII from source connectors #1758

artefactop opened this issue Jan 21, 2021 · 11 comments
Labels
team/compose type/enhancement New feature or request

Comments

@artefactop
Copy link

artefactop commented Jan 21, 2021

Tell us about the problem you're trying to solve

I want to export my data from PostgreSQL to Bigtable but avoiding some PII information, specifically transforming it to some kind of hash or partially hashed.

Describe the solution you’d like

I see that PipelineWise support that by adding a yaml configuration https://transferwise.github.io/pipelinewise/user_guide/transformations.html, I'm actually researching for a solution and Airbyte looks amazing but I couldn't see if it support this feature.

Would be great if I can configure this transform on the selected column source.

Describe the alternative you’ve considered or used

My alternative is choose PipelineWise as ELT because it already support it.

┆Issue is synchronized with this Asana task by Unito

@michel-tricot
Copy link
Contributor

Hey @artefactop. Thanks for the ticket. We are actually brainstorming this feature. It a slippery slope if we want to be a pure EL player but I see the need for very strong guarantees when it comes to privacy and security.

Let us get back to you on it.

@unoexperto
Copy link

@michel-tricot Hey Michel! Do you guys have any update on this?

@michel-tricot
Copy link
Contributor

Not at the moment. @unoexperto are you encountering that need?

@archaean
Copy link
Contributor

archaean commented Sep 20, 2021

@michel-tricot I too am curious about an update on this. I have seen some conversation about handling this at the transform layer (DBT), but that doesn't prevent issues related to PII now being migrated to the destination system and GDPR concerns about data residency (e.g. certain data can't leave the source system - As you potentially alluded to).

I can see a strong argument to put the specification for hashing or nulling out columns on the ConfiguredAirbyteStream in the source connection configuration since that would potentially prevent critical information from leaving a source location that has GDPR residency concerns.

Do you have a better idea about how the design may work from an architectural standpoint, or even what options you are considering? A documented proposal or RFC to that effect?

In the long term I think this would be a fundamental need for us, as the only temporary work around would be to make views in the source system that hashed or nulled out fields containing sensitive information. Though we haven't load tested view in this respect, but replicating views potentially would have operational and load concerns that table and simple transforms would not.

@sherifnada sherifnada changed the title Apply Transformations between Source and Destination (PostgreSQL/BigQuery) Allow hashing or dropping PII from source connectors Nov 15, 2021
@sherifnada sherifnada added the area/connectors Connector related issues label Nov 15, 2021
@misteryeo
Copy link
Contributor

Issue was linked to Harvestr Discovery: Hashing PII fields

@HaithemSouala
Copy link

Any updates?

@misteryeo
Copy link
Contributor

No currently, @HaithemSouala!

@sherifnada
Copy link
Contributor

@malikdiarra FYI moved this to oss team's backlog

@cesar-loadsmart
Copy link

Hey Folks!

Any updates regarding this issue?

@natikgadzhi
Copy link
Contributor

@Hesperide safe to say it's shipped, no?

@nataliekwong
Copy link
Contributor

As shared in our roadmap, column hashing is now available as of Airbyte v1.1.0!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team/compose type/enhancement New feature or request
Projects
None yet
Development

No branches or pull requests